This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

Difference between revisions of "Benchmark"

From OWASP
Jump to: navigation, search
(Code Repo)
Line 22: Line 22:
 
   3. Tool correctly ignores a false alarm (TRUE negative)
 
   3. Tool correctly ignores a false alarm (TRUE negative)
 
   4. Tool fails to ignore a false alarm (FALSE positive)
 
   4. Tool fails to ignore a false alarm (FALSE positive)
 
For each test:
 
* Every test case is a servlet or JSP
 
* Every test case is either a true vulnerability or a false positive for a single issue
 
* Metadata for each test case, including expected result, is contained in a matching XML file
 
 
==Test Coverage==
 
 
For the test suite, we plan to determine, does the tool:
 
 
* Find HTTP request and response problems?
 
* Handle scenarios like:
 
  - Simple and complex data flow?
 
  - Simple and complex control flow?
 
  - Popular frameworks?
 
  - Inversion of control?
 
  - Reflection? Class loading? Annotations?
 
  - Popular UI technologies (particularly JavaScript frameworks)
 
 
Future enhancements could cover:
 
* All vulnerability types in the [[Top10 | OWASP Top 10]]
 
* Does the tool find flaws in libraries?
 
* Does the tool find flaws spanning custom code and libraries?
 
* Does tool handle web services? REST, XML, GWT, etc…
 
* Does tool work with different app servers? Java platforms?
 
  
 
==Scoring and Reporting Results==
 
==Scoring and Reporting Results==
Line 119: Line 94:
  
 
|}
 
|}
 +
 +
= Test Cases =
 +
 +
This initial release of the WBE has 20,983 test cases. The test case areas and quantities for the April 15, 2015 release are:
 +
 +
{| class="wikitable nowraplinks"
 +
|-
 +
! Vulnerability Area
 +
! Number of Tests
 +
! CWE Number
 +
|-
 +
| [[Command Injection]]
 +
| 2902
 +
| [https://cwe.mitre.org/data/definitions/78.html 78]
 +
|-
 +
| Weak Cryptography
 +
| 668
 +
| [https://cwe.mitre.org/data/definitions/327.html 327]
 +
|-
 +
| Weak Hashing
 +
| 689
 +
| [https://cwe.mitre.org/data/definitions/328.html 328]
 +
|-
 +
| Header Injection
 +
| 1069
 +
| [https://cwe.mitre.org/data/definitions/113.html 113]
 +
|-
 +
| [[LDAP injection | LDAP Injection]]
 +
| 890
 +
| [https://cwe.mitre.org/data/definitions/90.html 90]
 +
|-
 +
| [[Path Traversal]]
 +
| 2974
 +
| [https://cwe.mitre.org/data/definitions/22.html 22]
 +
|-
 +
| Secure Cookie Flag
 +
| 450
 +
| [https://cwe.mitre.org/data/definitions/614.html 614]
 +
|-
 +
| [[SQL Injection]]
 +
| 4105
 +
| [https://cwe.mitre.org/data/definitions/89.html 89]
 +
|-
 +
| [[Trust Boundary Violation]]
 +
| 838
 +
| [https://cwe.mitre.org/data/definitions/501.html 501]
 +
|-
 +
| Weak Randomness
 +
| 1923
 +
| [https://cwe.mitre.org/data/definitions/330.html 330]
 +
|-
 +
| [[XPATH Injection]]
 +
| 436
 +
| [https://cwe.mitre.org/data/definitions/643.html 643]
 +
|-
 +
| [[XSS]] (Cross-Site Scripting)
 +
| 4039
 +
| [https://cwe.mitre.org/data/definitions/79.html 79]
 +
|-
 +
| Total Test Cases
 +
| 20,983
 +
|}
 +
 +
Every test case is:
 +
* a servlet or JSP (currently they are all servlets, but we plan to add JSPs soon)
 +
* either a true vulnerability or a false positive for a single issue
 +
 +
Metadata for each test case, including the expected result, is contained in a matching XML file
 +
 +
==Test Coverage==
 +
 +
For the test suite, we plan to determine, does the tool:
 +
 +
* Find HTTP request and response problems?
 +
* Handle scenarios like:
 +
  - Simple and complex data flow?
 +
  - Simple and complex control flow?
 +
  - Popular frameworks?
 +
  - Inversion of control?
 +
  - Reflection? Class loading? Annotations?
 +
  - Popular UI technologies (particularly JavaScript frameworks)
 +
 +
Future enhancements could cover:
 +
* All vulnerability types in the [[Top10 | OWASP Top 10]]
 +
* Does the tool find flaws in libraries?
 +
* Does the tool find flaws spanning custom code and libraries?
 +
* Does tool handle web services? REST, XML, GWT, etc…
 +
* Does tool work with different app servers? Java platforms?
 +
 +
== Example Test Case ==
 +
 +
Each test case is a simple Java EE servlet. BenchmarkTest00001 is an LDAP Injection test with the following metadata in the accompanying BenchmarkTest00001.xml file:
 +
 +
  <test-metadata>
 +
    <category>ldapi</category>
 +
    <test-number>00001</test-number>
 +
    <vulnerability>true</vulnerability>
 +
    <cwe>90</cwe>
 +
  </test-metadata>
 +
 +
BenchmarkTest00001.java simply reads in all the cookie values, looks for a cookie named "foo" and uses the value of this cookie when performing an LDAP query. Here's the code for BenchmarkTest00001.java:
 +
 +
  package org.owasp.webgoat.benchmark.testcode;
 +
 
 +
  import java.io.IOException;
 +
 
 +
  import javax.servlet.ServletException;
 +
  import javax.servlet.annotation.WebServlet;
 +
  import javax.servlet.http.HttpServlet;
 +
  import javax.servlet.http.HttpServletRequest;
 +
  import javax.servlet.http.HttpServletResponse;
 +
 
 +
  @WebServlet("/BenchmarkTest00001")
 +
  public class BenchmarkTest00001 extends HttpServlet {
 +
 
 +
  private static final long serialVersionUID = 1L;
 +
 
 +
  @Override
 +
  public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
 +
  doPost(request, response);
 +
  }
 +
 
 +
  @Override
 +
  public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
 +
  // some code
 +
 
 +
  javax.servlet.http.Cookie[] cookies = request.getCookies();
 +
 
 +
  String param = null;
 +
  boolean foundit = false;
 +
  if (cookies != null) {
 +
  for (javax.servlet.http.Cookie cookie : cookies) {
 +
  if (cookie.getName().equals("foo")) {
 +
  param = cookie.getValue();
 +
  foundit = true;
 +
  }
 +
  }
 +
  if (!foundit) {
 +
  // no cookie found in collection
 +
  param = "";
 +
  }
 +
  } else {
 +
  // no cookies
 +
  param = "";
 +
  }
 +
 
 +
  try {
 +
  javax.naming.directory.DirContext dc = org.owasp.webgoat.benchmark.helpers.Utils.getDirContext();
 +
  Object[] filterArgs = {"a","b"};
 +
  dc.search("name", param, filterArgs, new javax.naming.directory.SearchControls());
 +
  } catch (javax.naming.NamingException e) {
 +
  throw new ServletException(e);
 +
  }
 +
  }
 +
  }
  
 
= Roadmap =
 
= Roadmap =

Revision as of 17:15, 15 April 2015

Incubator big.jpg

OWASP WebGoat Benchmark

The OWASP WebGoat Benchmark Edition (WBE) is a test suite designed to evaluate the speed, coverage, and accuracy of vulnerability detection tools. Without the ability to measure these tools, it is difficult to understand their value or interpret vendor claims. The WBE contains over 20,000 test cases that are fully runnable and exploitable. The project goals is to measure the capabilities of any kind of vulnerability detection tool.

You can use this initial version with Static Analysis Security Testing Tools (SAST) and Interactive Analysis Security Testing Tools (IAST). A future release (this year hopefully) will support Dynamic Analysis Security Testing Tools (DAST), like OWASP ZAP. The current version if implemented in Java. Future versions may expand to include other languages.

Project Philosophy

Security tools (SAST, DAST, and IAST) are amazing when they find a complex vulnerability in your code. But they can drive everyone crazy with complexity, false alarms, and missed vulnerabilities. We are on a quest to measure just how good these tools are at discovering and properly diagnosing security problems in applications.

One important lesson that WBE takes from NSA's Juliet test suite is the idea of measuring BOTH true vulnerabilities and false positives. Unlike Juliet, in the WBE, true vulnerabilities and false positives are not combined in a single test case. This allows each test case to verify a single aspect of vulnerability detection. For example, one test might check to see if a tool properly handles data flow propagation when a string is split into pieces using a regular expression. Another might be the same test, with a seemingly plausible but fake propagation.

There are four kinds of test results in the WBE:

 1. Tool correctly identifies a real vulnerability (TRUE positive)
 2. Tool fails to identify a real vulnerability (FALSE negative)
 3. Tool correctly ignores a false alarm (TRUE negative)
 4. Tool fails to ignore a false alarm (FALSE positive)

Scoring and Reporting Results

We encourage both vendors, open source tools, and end users to verify their application security tools using the WBE. We encourage everyone to contribute their results to the project. In order to ensure that the results are fair and useful, we ask that you follow a few simple rules when publishing results.

1. Provide an easily reproducible procedure (script preferred) to run the tool on the WBE, including:

a) A description of the default “out-of-the-box” installation, version numbers, etc…
b) All configuration, tailoring, onboarding, etc… performed to make the tool run
c) All changes to default security rules, tests, or checks to achieve the results
d) Easily reproducible steps for achieving the result 

2. Summary results should be in the following table format

a) The Accuracy column calculates the true positive rate by ( (FALSE pass + TRUE pass) / Grand Total )
b) The FALSE and TRUE columns calculate their rate by (FALSE pass / FALSE total) / (TRUE pass / TRUE total) 

3. The overall results for a tool should be calculated by:

a) Overall Accuracy: AVERAGE( Accuracy ) column
b) Overall False Alarm (FA) Rate: 1 - AVERAGE( FALSE ) column
c) Overall Missed Negative (MN) Rate: 1 - AVERAGE( TRUE ) column
d) Total clock time for the tests to complete

Code Repo

The code for this project is hosted at the OWASP Git repository. Along with the code comes a Maven pom.xml file so you can download all the dependencies and build the entire project with ease using Maven.

Using the pom, it should be easy to verify all the code compiles correctly. To download and build everything, if you already have GIT and Maven installed, all you have to do is:

 $ git clone https://github.com/OWASP/webgoat-benchmark
 $ cd webgoat-benchmark
 $ mvn compile

A future version will support packaging up the results into a WAR file that you can then run in whatever JEE app server you want.

Licensing

The OWASP WebGoat Benchmark is free to use under the GNU General Public License v2.0.

Mailing List

OWASP WebGoat Benchmark Mailing List

Project Leaders

Dave Wichers @

Related Projects

Quick Download

  • TBD

News and Events

  • [Apr 2015] Initial Release


Classifications

Owasp-incubator-trans-85.png Owasp-builders-small.png
Owasp-defenders-small.png
GNU General Public License v2.0
Project Type Files CODE.jpg

This initial release of the WBE has 20,983 test cases. The test case areas and quantities for the April 15, 2015 release are:

Every test case is:

  • a servlet or JSP (currently they are all servlets, but we plan to add JSPs soon)
  • either a true vulnerability or a false positive for a single issue

Metadata for each test case, including the expected result, is contained in a matching XML file

Test Coverage

For the test suite, we plan to determine, does the tool:

  • Find HTTP request and response problems?
  • Handle scenarios like:
 - Simple and complex data flow?
 - Simple and complex control flow?
 - Popular frameworks?
 - Inversion of control?
 - Reflection? Class loading? Annotations?
 - Popular UI technologies (particularly JavaScript frameworks)

Future enhancements could cover:

  • All vulnerability types in the OWASP Top 10
  • Does the tool find flaws in libraries?
  • Does the tool find flaws spanning custom code and libraries?
  • Does tool handle web services? REST, XML, GWT, etc…
  • Does tool work with different app servers? Java platforms?

Example Test Case

Each test case is a simple Java EE servlet. BenchmarkTest00001 is an LDAP Injection test with the following metadata in the accompanying BenchmarkTest00001.xml file:

 <test-metadata>
   <category>ldapi</category>
   <test-number>00001</test-number>
   <vulnerability>true</vulnerability>
   <cwe>90</cwe>
 </test-metadata>

BenchmarkTest00001.java simply reads in all the cookie values, looks for a cookie named "foo" and uses the value of this cookie when performing an LDAP query. Here's the code for BenchmarkTest00001.java:

 package org.owasp.webgoat.benchmark.testcode;
 
 import java.io.IOException;
 
 import javax.servlet.ServletException;
 import javax.servlet.annotation.WebServlet;
 import javax.servlet.http.HttpServlet;
 import javax.servlet.http.HttpServletRequest;
 import javax.servlet.http.HttpServletResponse;
 
 @WebServlet("/BenchmarkTest00001")
 public class BenchmarkTest00001 extends HttpServlet {
 	
 	private static final long serialVersionUID = 1L;
 	
 	@Override
 	public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
 		doPost(request, response);
 	}
 
 	@Override
 	public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
 		// some code
 
 		javax.servlet.http.Cookie[] cookies = request.getCookies();
 		
 		String param = null;
 		boolean foundit = false;
 		if (cookies != null) {
 			for (javax.servlet.http.Cookie cookie : cookies) {
 				if (cookie.getName().equals("foo")) {
 					param = cookie.getValue();
 					foundit = true;
 				}
 			}
 			if (!foundit) {
 				// no cookie found in collection
 				param = "";
 			}
 		} else {
 			// no cookies
 			param = "";
 		}
 		
 		try {
 			javax.naming.directory.DirContext dc = org.owasp.webgoat.benchmark.helpers.Utils.getDirContext();
 			Object[] filterArgs = {"a","b"};
 			dc.search("name", param, filterArgs, new javax.naming.directory.SearchControls());
 		} catch (javax.naming.NamingException e) {
 			throw new ServletException(e);
 		}
 	}
 }

2015 Roadmap

  • [June 2015] TBD
  • Analysis tool integration: So you can automatically run tools against the benchmark. We want to build test harnesses for tools like:
 * OWASP's ZAP
 * Findbugs and plugins for it like FindSecurityBugs
 * Commercial SAST, DAST, and IAST tools
  • FUTURE: Expand to include attack test cases to verify whether defenses (WAF, IDS/IPS, RASP) can identify and protect against them