This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

Difference between revisions of "OWASP Benchmark Project"

From OWASP
Jump to: navigation, search
(Created page with "= Main = <div style="width:100%;height:100px;border:0,margin:0;overflow: hidden;">link=OWASP_Project_Stages#tab=Incubator_Projects</div> {| style="...")
 
(Redirected page to Benchmark)
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Main =
+
#redirect [[Benchmark]]
<div style="width:100%;height:100px;border:0,margin:0;overflow: hidden;">[[File:Incubator_big.jpg|link=OWASP_Project_Stages#tab=Incubator_Projects]]</div>
 
{| style="padding: 0;margin:0;margin-top:10px;text-align:left;" |-
 
| valign="top"  style="border-right: 1px dotted gray;padding-right:25px;" |
 
 
 
== OWASP Benchmark Project ==
 
 
 
The OWASP Benchmark for Security Automation (OWASP Benchmark) is a test suite designed to evaluate the speed, coverage, and accuracy of automated vulnerability detection tools and services (henceforth simply referred to as 'tools'). Without the ability to measure these tools, it is difficult to understand their value or interpret vendor claims. The OWASP Benchmark contains over 20,000 test cases that are fully runnable and exploitable.
 
 
 
You can use the OWASP Benchmark with Static Application Security Testing (SAST) tools. A future goal is to support the evaluation of Dynamic Application Security Testing (DAST) tools like OWASP [[ZAP]] and Interactive Application Security Testing (IAST) tools. The current version of the Benchmark is implemented in Java.  Future versions may expand to include other languages.
 
 
 
==Benchmark Project Philosophy==
 
 
 
Security tools (SAST, DAST, and IAST) are amazing when they find a complex vulnerability in your code.  But they can drive everyone crazy with complexity, false alarms, and missed vulnerabilities.  Using these tools without understanding their strengths and weaknesses can lead to a dangerous false sense of security.
 
 
 
We are on a quest to measure just how good these tools are at discovering and properly diagnosing security problems in applications. We rely on the [http://en.wikipedia.org/wiki/Receiver_operating_characteristic long history] of military and medical evaluation of detection technology as a foundation for our research. Therefore, the test suite tests both real and fake vulnerabilities.
 
 
 
There are four possible test outcomes in the Benchmark:
 
 
 
# Tool correctly identifies a real vulnerability (True Positive - TP)
 
# Tool fails to identify a real vulnerability (False Negative - FN)
 
# Tool correctly ignores a false alarm (True Negative - TN)
 
# Tool fails to ignore a false alarm (False Positive - FP)
 
 
 
We can learn a lot about a tool from these four metrics. A tool that simply flags every line of code as vulnerable will perfectly identify all vulnerabilities in an application, but will also have 100% false positives.  Similarly, a tool that reports nothing will have zero false positives, but will also identify zero real vulnerabilities.  Imagine a tool that flips a coin to decide whether to report each vulnerability for every test case. The result would be 50% true positives and 50% false positives.  We need a way to distinguish valuable security tools from these trivial ones.
 
 
 
If you imagine the line that connects all these points, from 0,0 to 100,100 establishes a line that roughly translates to "random guessing." The ultimate measure of a security tool is how much better it can do than this line.  The diagram below shows how we will evaluate security tools against the Benchmark.
 
 
 
[[File:Wbe guide.png]]
 
 
 
==Benchmark Validity==
 
 
 
The Benchmark tests are not exactly like real applications. The tests are derived from coding patterns observed in real applications, but many of them are considerably simpler than real applications. Other tests may have coding patterns that don't occur frequently in real code.  It's best to imagine the Benchmark as a continuum of tests from very simple all the way up to pretty difficult.
 
 
 
Remember, we are trying to test the capabilities of the tools and make them explicit, so that *users* can make informed decisions about what tools to use, how to use them, and what results to expect.  This is exactly aligned with the OWASP mission to make application security visible.
 
 
 
==Benchmark Scoring and Reporting Results==
 
 
 
We encourage both vendors, open source tools, and end users to verify their application security tools against the Benchmark.  We encourage everyone to contribute their results to the project.  In order to ensure that the results are fair and useful, we ask that you follow a few simple rules when publishing results. We won't recognize any results that aren't easily reproducible.
 
 
 
# Provide an easily reproducible procedure (script preferred) to run the tool on the Benchmark, including:
 
## A description of the default “out-of-the-box” installation, version numbers, etc…
 
## All configuration, tailoring, onboarding, etc… performed to make the tool run
 
## All changes to default security rules, tests, or checks used to achieve the results
 
## Easily reproducible steps to run the tool
 
## Scripts to generate results in the format below
 
# Results should be in the following table format and provide details for each category of tests, as well as overall statistics.
 
 
{| class="wikitable nowraplinks"
 
|-
 
! style="background:#DDDDDD" | Security Category
 
! TP
 
! FN
 
! TN
 
! FP
 
! style="background:#DDDDDD" | Total
 
! TPR
 
! FPR
 
! style="background:#DDDDDD" | Score
 
|-
 
! style="background:#DDDDDD" valign="top" width="12%"| General security category for test cases
 
| valign="top" width="12%"| '''True Positives''': Tests with real vulnerabilities that were correctly reported as vulnerable by the tool
 
| valign="top" width="12%"| '''False Negative''': Tests with real vulnerabilities that were not correctly reported as vulnerable by the tool
 
| valign="top" width="12%"| '''True Negative''': Tests with fake vulnerabilities that were correctly not reported as vulnerable by the tool
 
| valign="top" width="12%"| '''False Positive''':Tests with fake vulnerabilities that were incorrectly reported as vulnerable by the tool
 
| style="background:#DDDDDD" valign="top" width="12%"| Total test cases for this category
 
| valign="top" width="12%"| '''True Positive Rate''': TP / ( TP + FN )
 
| valign="top" width="12%"| '''False Positive Rate''': FP / ( FP + TN )
 
| style="background:#DDDDDD" valign="top" width="12%"| Normalized distance from the “guess line” TPR - FPR
 
|-
 
! style="background:#DDDDDD" | Command Injection
 
| ...
 
| ...
 
| ...
 
| ...
 
| style="background:#DDDDDD" | ...
 
| ...
 
| ...
 
| style="background:#DDDDDD" | ...
 
|-
 
! style="background:#DDDDDD" | Etc...
 
| ...
 
| ...
 
| ...
 
| ...
 
| style="background:#DDDDDD" | ...
 
| ...
 
| ...
 
| style="background:#DDDDDD" | ...
 
|-
 
! style="background:#DDDDDD" |
 
! Total TP
 
! Total FN
 
! Total TN
 
! Total FP
 
! style="background:#DDDDDD" | Total TC
 
! Average TPR
 
! Average FPR
 
! style="background:#DDDDDD" | Average Score
 
|}
 
 
 
==Code Repo==
 
 
 
The code for this project is hosted at the [https://github.com/OWASP/webgoat-benchmark OWASP Git repository]. Along with the code comes a Maven pom.xml file so you can download all the dependencies and build the entire project with ease using Maven.
 
 
 
Using the pom, it should be easy to verify all the code compiles correctly. To download and build everything, if you already have git and maven installed, all you have to do is:
 
 
 
  $ git clone https://github.com/OWASP/webgoat-benchmark
 
  $ cd webgoat-benchmark
 
  $ mvn compile
 
 
 
==Licensing==
 
 
 
The OWASP Benchmark is free to use under the [http://choosealicense.com/licenses/gpl-2.0/ GNU General Public License v2.0].
 
 
 
== Mailing List ==
 
 
 
[https://lists.owasp.org/mailman/listinfo/owasp-benchmark-project OWASP Benchmark Mailing List]
 
 
 
== Project Leaders ==
 
 
 
[https://www.owasp.org/index.php/User:Wichers Dave Wichers] [mailto:[email protected] @]
 
 
 
== Project References ==
 
* [https://www.mir-swamp.org/#packages/public Software Assurance Marketplace (SWAMP) - set of curated packages to test tools against]
 
* [http://samate.nist.gov/Other_Test_Collections.html SAMATE List of Test Collections]
 
 
 
== Related Projects ==
 
 
 
* [http://samate.nist.gov/SARD/testsuite.php NSA's Juliet for Java]
 
* [https://code.google.com/p/wavsep/ WAVESEP]
 
 
 
| valign="top"  style="padding-left:25px;width:200px;" |
 
 
 
== Quick Download ==
 
 
 
All test code and project files can be downloaded from [https://github.com/OWASP/webgoat-benchmark OWASP GitHub].
 
 
 
== News and Events ==
 
 
 
* April 15, 2015 - Benchmark Version 1.0 Released
 
* May 23, 2015 - Benchmark Version 1.1 Released
 
 
 
==Classifications==
 
 
 
  {| width="200" cellpadding="2"
 
  |-
 
  | align="center" valign="top" width="50%" rowspan="2"| [[File:Owasp-incubator-trans-85.png|link=https://www.owasp.org/index.php/OWASP_Project_Stages#tab=Incubator_Projects]]
 
  | align="center" valign="top" width="50%"| [[File:Owasp-builders-small.png|link=]] 
 
  |-
 
  | align="center" valign="top" width="50%"| [[File:Owasp-defenders-small.png|link=]]
 
  |-
 
  | colspan="2" align="center"  | [http://choosealicense.com/licenses/gpl-2.0/ GNU General Public License v2.0]
 
  |-
 
  | colspan="2" align="center"  | [[File:Project_Type_Files_CODE.jpg|link=]]
 
  |}
 
 
 
|}
 
 
 
= Test Cases =
 
 
 
Version 1.0 of the Benchmark was published on April 15, 2015 and had 20,983 test cases. On May 23, 2015, version 1.1 of the Benchmark was released. The 1.1 release improves on the previous version by making sure that there are both true positives and false positives in every vulnerability area. The test case areas and quantities for the 1.1 release are:
 
 
 
{| class="wikitable nowraplinks"
 
|-
 
! Vulnerability Area
 
! Number of Tests
 
! CWE Number
 
|-
 
| [[Command Injection]]
 
| 2708
 
| [https://cwe.mitre.org/data/definitions/78.html 78]
 
|-
 
| Weak Cryptography
 
| 1440
 
| [https://cwe.mitre.org/data/definitions/327.html 327]
 
|-
 
| Weak Hashing
 
| 1421
 
| [https://cwe.mitre.org/data/definitions/328.html 328]
 
|-
 
| [[LDAP injection | LDAP Injection]]
 
| 736
 
| [https://cwe.mitre.org/data/definitions/90.html 90]
 
|-
 
| [[Path Traversal]]
 
| 2630
 
| [https://cwe.mitre.org/data/definitions/22.html 22]
 
|-
 
| Secure Cookie Flag
 
| 416
 
| [https://cwe.mitre.org/data/definitions/614.html 614]
 
|-
 
| [[SQL Injection]]
 
| 3529
 
| [https://cwe.mitre.org/data/definitions/89.html 89]
 
|-
 
| [[Trust Boundary Violation]]
 
| 725
 
| [https://cwe.mitre.org/data/definitions/501.html 501]
 
|-
 
| Weak Randomness
 
| 3640
 
| [https://cwe.mitre.org/data/definitions/330.html 330]
 
|-
 
| [[XPATH Injection]]
 
| 347
 
| [https://cwe.mitre.org/data/definitions/643.html 643]
 
|-
 
| [[XSS]] (Cross-Site Scripting)
 
| 3449
 
| [https://cwe.mitre.org/data/definitions/79.html 79]
 
|-
 
| Total Test Cases
 
| 21,041
 
|}
 
 
 
To download a spreadsheet that lists every test case, the vulnerability category, the CWE number, and the expected result (true finding/false positive), click [https://github.com/OWASP/webgoat-benchmark/blob/master/expectedresults-1.1.csv?raw=true here].
 
 
 
Every test case is:
 
* a servlet or JSP (currently they are all servlets, but we plan to add JSPs soon)
 
* either a true vulnerability or a false positive for a single issue
 
 
 
The benchmark is intended to help determine how well analysis tools correctly analyze a broad array of application and framework behavior, including:
 
 
 
* HTTP request and response problems
 
* Simple and complex data flow
 
* Simple and complex control flow
 
* Popular frameworks
 
* Inversion of control
 
* Reflection
 
* Class loading
 
* Annotations
 
* Popular UI technologies (particularly JavaScript frameworks)
 
 
 
Not all of these are yet tested by the Benchmark but future enhancements intend to provide more coverage of these issues.
 
 
 
Additional future enhancements could cover:
 
* All vulnerability types in the [[Top10 | OWASP Top 10]]
 
* Does the tool find flaws in libraries?
 
* Does the tool find flaws spanning custom code and libraries?
 
* Does tool handle web services? REST, XML, GWT, etc…
 
* Does tool work with different app servers? Java platforms?
 
 
 
== Example Test Case ==
 
 
 
Each test case is a simple Java EE servlet. BenchmarkTest00001 in version 1.0 of the Benchmark was an LDAP Injection test with the following metadata in the accompanying BenchmarkTest00001.xml file:
 
 
 
  <test-metadata>
 
    <category>ldapi</category>
 
    <test-number>00001</test-number>
 
    <vulnerability>true</vulnerability>
 
    <cwe>90</cwe>
 
  </test-metadata>
 
 
 
BenchmarkTest00001.java in the OWASP Benchmark 1.0 simply reads in all the cookie values, looks for a cookie named "foo", and uses the value of this cookie when performing an LDAP query. Here's the code for BenchmarkTest00001.java:
 
 
 
  package org.owasp.benchmark.testcode;
 
 
 
  import java.io.IOException;
 
 
 
  import javax.servlet.ServletException;
 
  import javax.servlet.annotation.WebServlet;
 
  import javax.servlet.http.HttpServlet;
 
  import javax.servlet.http.HttpServletRequest;
 
  import javax.servlet.http.HttpServletResponse;
 
 
 
  @WebServlet("/BenchmarkTest00001")
 
  public class BenchmarkTest00001 extends HttpServlet {
 
 
 
  private static final long serialVersionUID = 1L;
 
 
 
  @Override
 
  public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
 
  doPost(request, response);
 
  }
 
 
 
  @Override
 
  public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
 
  // some code
 
 
 
  javax.servlet.http.Cookie[] cookies = request.getCookies();
 
 
 
  String param = null;
 
  boolean foundit = false;
 
  if (cookies != null) {
 
  for (javax.servlet.http.Cookie cookie : cookies) {
 
  if (cookie.getName().equals("foo")) {
 
  param = cookie.getValue();
 
  foundit = true;
 
  }
 
  }
 
  if (!foundit) {
 
  // no cookie found in collection
 
  param = "";
 
  }
 
  } else {
 
  // no cookies
 
  param = "";
 
  }
 
 
 
  try {
 
  javax.naming.directory.DirContext dc = org.owasp.benchmark.helpers.Utils.getDirContext();
 
  Object[] filterArgs = {"a","b"};
 
  dc.search("name", param, filterArgs, new javax.naming.directory.SearchControls());
 
  } catch (javax.naming.NamingException e) {
 
  throw new ServletException(e);
 
  }
 
  }
 
  }
 
 
 
= Tool Results =
 
 
 
As of this time, we don't have any vulnerability detection tool results to publish. We can generate results for PMD (which really has no security rules), Findbugs, and FindBugs with the FindSecurityBugs plugin, and we are working on more, particularly commercial static analysis tools. If you would like to contribute to this project by running a tool against the benchmark and producing a set of results in the format described in the --Scoring and Reporting Results-- section on the main project tab, please contact the project lead.
 
 
 
Our vision for this project is that we will develop automated test harnesses for lots of vulnerability detection tools where we can repeatably run the tools against each version of the benchmark and automatically produce results in our desired format.
 
 
 
We want to test as many tools as possible against the Benchmark. If you are:
 
 
 
* A tool vendor and want to participate in the project
 
* Someone who wants to help score a free tool agains the project
 
* Someone who has a license to a commercial tool and the terms of the license allow you to publish tool results, and you want to participate
 
 
 
please let [mailto:[email protected] me] know!
 
 
 
= Acknowledgements =
 
 
 
The following people have contributed to this project and their contributions are much appreciated!
 
 
 
* Juan Gama - Development of initial release and continued support
 
* Ken Prole - Assistance with automated scorecard development using CodeDx
 
* Nick Sanidas - Development of initial release
 
 
 
We are looking for volunteers. Please contact [mailto:[email protected] Dave Wichers] if you are interested in contributing new test cases, tool results run against the benchmark, or anything else.
 
 
 
__NOTOC__ <headertabs />
 
 
 
[[Category:OWASP_Project]]
 

Latest revision as of 16:54, 11 July 2015

Redirect to: