This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

OWASP Benchmark Project

From OWASP
Revision as of 03:46, 11 July 2015 by Wichers (talk | contribs) (Tool Results)

Jump to: navigation, search
Incubator big.jpg

OWASP Benchmark Project

The OWASP Benchmark for Security Automation (OWASP Benchmark) is a test suite designed to evaluate the speed, coverage, and accuracy of automated vulnerability detection tools and services (henceforth simply referred to as 'tools'). Without the ability to measure these tools, it is difficult to understand their value or interpret vendor claims. The OWASP Benchmark contains over 20,000 test cases that are fully runnable and exploitable.

You can use the OWASP Benchmark with Static Application Security Testing (SAST) tools. A future goal is to support the evaluation of Dynamic Application Security Testing (DAST) tools like OWASP ZAP and Interactive Application Security Testing (IAST) tools. The current version of the Benchmark is implemented in Java. Future versions may expand to include other languages.

Benchmark Project Philosophy

Security tools (SAST, DAST, and IAST) are amazing when they find a complex vulnerability in your code. But they can drive everyone crazy with complexity, false alarms, and missed vulnerabilities. Using these tools without understanding their strengths and weaknesses can lead to a dangerous false sense of security.

We are on a quest to measure just how good these tools are at discovering and properly diagnosing security problems in applications. We rely on the long history of military and medical evaluation of detection technology as a foundation for our research. Therefore, the test suite tests both real and fake vulnerabilities.

There are four possible test outcomes in the Benchmark:

  1. Tool correctly identifies a real vulnerability (True Positive - TP)
  2. Tool fails to identify a real vulnerability (False Negative - FN)
  3. Tool correctly ignores a false alarm (True Negative - TN)
  4. Tool fails to ignore a false alarm (False Positive - FP)

We can learn a lot about a tool from these four metrics. A tool that simply flags every line of code as vulnerable will perfectly identify all vulnerabilities in an application, but will also have 100% false positives. Similarly, a tool that reports nothing will have zero false positives, but will also identify zero real vulnerabilities. Imagine a tool that flips a coin to decide whether to report each vulnerability for every test case. The result would be 50% true positives and 50% false positives. We need a way to distinguish valuable security tools from these trivial ones.

If you imagine the line that connects all these points, from 0,0 to 100,100 establishes a line that roughly translates to "random guessing." The ultimate measure of a security tool is how much better it can do than this line. The diagram below shows how we will evaluate security tools against the Benchmark.

Wbe guide.png

Benchmark Validity

The Benchmark tests are not exactly like real applications. The tests are derived from coding patterns observed in real applications, but many of them are considerably simpler than real applications. Other tests may have coding patterns that don't occur frequently in real code. It's best to imagine the Benchmark as a continuum of tests from very simple all the way up to pretty difficult.

Remember, we are trying to test the capabilities of the tools and make them explicit, so that *users* can make informed decisions about what tools to use, how to use them, and what results to expect. This is exactly aligned with the OWASP mission to make application security visible.

Benchmark Scoring and Reporting Results

We encourage both vendors, open source tools, and end users to verify their application security tools against the Benchmark. We encourage everyone to contribute their results to the project. In order to ensure that the results are fair and useful, we ask that you follow a few simple rules when publishing results. We won't recognize any results that aren't easily reproducible.

  1. Provide an easily reproducible procedure (script preferred) to run the tool on the Benchmark, including:
    1. A description of the default “out-of-the-box” installation, version numbers, etc…
    2. All configuration, tailoring, onboarding, etc… performed to make the tool run
    3. All changes to default security rules, tests, or checks used to achieve the results
    4. Easily reproducible steps to run the tool
    5. Scripts to generate results in the format below
  2. Results should be in the following table format and provide details for each category of tests, as well as overall statistics.

Code Repo

The code for this project is hosted at the OWASP Git repository. Along with the code comes a Maven pom.xml file so you can download all the dependencies and build the entire project with ease using Maven.

Using the pom, it should be easy to verify all the code compiles correctly. To download and build everything, if you already have git and maven installed, all you have to do is:

 $ git clone https://github.com/OWASP/benchmark
 $ cd benchmark
 $ mvn compile

Licensing

The OWASP Benchmark is free to use under the GNU General Public License v2.0.

Mailing List

OWASP Benchmark Mailing List

Project Leaders

Dave Wichers @

Project References

Related Projects

Quick Download

All test code and project files can be downloaded from OWASP GitHub.

News and Events

  • April 15, 2015 - Benchmark Version 1.0 Released
  • May 23, 2015 - Benchmark Version 1.1 Released

Classifications

Owasp-incubator-trans-85.png Owasp-builders-small.png
Owasp-defenders-small.png
GNU General Public License v2.0
Project Type Files CODE.jpg