This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

Difference between revisions of "Benchmark"

From OWASP
Jump to: navigation, search
(Checkmarx: changed a synonym)
(Added Kiuwan to the tool scanning tips section)
 
(23 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
= Main =  
 
= Main =  
  <div style="width:100%;height:100px;border:0,margin:0;overflow: hidden;">[[File:Incubator_big.jpg|link=OWASP_Project_Stages#tab.3DLab_Projects]]</div>
+
  <div style="width:100%;height:100px;border:0,margin:0;overflow: hidden;">[[File:Lab_big.jpg|link=OWASP_Project_Stages#tab.3DLab_Projects]]</div>
 
{| style="padding: 0;margin:0;margin-top:10px;text-align:left;" |-
 
{| style="padding: 0;margin:0;margin-top:10px;text-align:left;" |-
 
| valign="top"  style="border-right: 1px dotted gray;padding-right:25px;" |
 
| valign="top"  style="border-right: 1px dotted gray;padding-right:25px;" |
Line 7: Line 7:
 
The OWASP Benchmark for Security Automation (OWASP Benchmark) is a free and open test suite designed to evaluate the speed, coverage, and accuracy of automated software vulnerability detection tools and services (henceforth simply referred to as 'tools'). Without the ability to measure these tools, it is difficult to understand their strengths and weaknesses, and compare them to each other. Each version of the OWASP Benchmark contains thousands of test cases that are fully runnable and exploitable, each of which maps to the appropriate CWE number for that vulnerability.
 
The OWASP Benchmark for Security Automation (OWASP Benchmark) is a free and open test suite designed to evaluate the speed, coverage, and accuracy of automated software vulnerability detection tools and services (henceforth simply referred to as 'tools'). Without the ability to measure these tools, it is difficult to understand their strengths and weaknesses, and compare them to each other. Each version of the OWASP Benchmark contains thousands of test cases that are fully runnable and exploitable, each of which maps to the appropriate CWE number for that vulnerability.
  
You can use the OWASP Benchmark with [[Source_Code_Analysis_Tools | Static Application Security Testing (SAST)]] tools, [[:Category:Vulnerability_Scanning_Tools | Dynamic Application Security Testing (DAST)]] tools like OWASP [[ZAP]] and Interactive Application Security Testing (IAST) tools. The current version of the Benchmark is implemented in Java.  Future versions may expand to include other languages.
+
You can use the OWASP Benchmark with [[Source_Code_Analysis_Tools | Static Application Security Testing (SAST)]] tools, [[:Category:Vulnerability_Scanning_Tools | Dynamic Application Security Testing (DAST)]] tools like OWASP [[ZAP]] and Interactive Application Security Testing (IAST) tools. Benchmark is implemented in Java.  Future versions may expand to include other languages.
  
 
==Benchmark Project Scoring Philosophy==
 
==Benchmark Project Scoring Philosophy==
Line 51: Line 51:
  
 
Anyone can use this Benchmark to evaluate vulnerability detection tools. The basic steps are:
 
Anyone can use this Benchmark to evaluate vulnerability detection tools. The basic steps are:
# Download the Benchmark from github
+
# Download the Benchmark from GitHub
 
# Run your tools against the Benchmark
 
# Run your tools against the Benchmark
 
# Run the BenchmarkScore tool on the reports from your tools
 
# Run the BenchmarkScore tool on the reports from your tools
Line 393: Line 393:
 
'''Free Static Application Security Testing (SAST) Tools:'''
 
'''Free Static Application Security Testing (SAST) Tools:'''
  
* [http://pmd.sourceforge.net/ PMD] (which really has no security rules) - .xml results file
+
* [https://pmd.github.io/ PMD] (which really has no security rules) - .xml results file
* [http://findbugs.sourceforge.net/ Findbugs] - .xml results file
+
* [http://findbugs.sourceforge.net/ FindBugs] - .xml results file (Note: FindBugs hasn't been updated since 2015. Use SpotBugs instead (see below))
* FindBugs with the [http://h3xstream.github.io/find-sec-bugs/ FindSecurityBugs plugin] - .xml results file
+
* [https://www.sonarqube.org/downloads/ SonarQube] - .xml results file
* [http://www.sonarqube.org/downloads/ SonarQube] - .xml results file
+
* [https://spotbugs.github.io/ SpotBugs] - .xml results file. This is the successor to FindBugs.
* [http://www.rigs-it.net/index.php/product.html XANITIZER] - (Requires registration to download) - xml results file
+
* SpotBugs with the [http://find-sec-bugs.github.io/ FindSecurityBugs plugin] - .xml results file
  
 
Note: We looked into supporting [http://checkstyle.sourceforge.net/ Checkstyle] but it has no security rules, just like PMD. The [http://fb-contrib.sourceforge.net/ fb-contrib] FindBugs plugin doesn't have any security rules either. We did test [http://errorprone.info/ Error Prone], and found that it does report some use of [http://errorprone.info/bugpattern/InsecureCipherMode) insecure ciphers (CWE-327)], but that's it.
 
Note: We looked into supporting [http://checkstyle.sourceforge.net/ Checkstyle] but it has no security rules, just like PMD. The [http://fb-contrib.sourceforge.net/ fb-contrib] FindBugs plugin doesn't have any security rules either. We did test [http://errorprone.info/ Error Prone], and found that it does report some use of [http://errorprone.info/bugpattern/InsecureCipherMode) insecure ciphers (CWE-327)], but that's it.
Line 403: Line 403:
 
'''Commercial SAST Tools:'''
 
'''Commercial SAST Tools:'''
  
* [https://www.checkmarx.com/technology/static-code-analysis-sca/ Checkmarx CxSAST] - .xml results file
+
* [https://www.castsoftware.com/products/application-intelligence-platform CAST Application Intelligence Platform (AIP)] - .xml results file
* [http://www.coverity.com/products/code-advisor/ Coverity Code Advisor (On-Demand and stand-alone versions)] - .json results file
+
* [https://www.checkmarx.com/products/static-application-security-testing/ Checkmarx CxSAST] - .xml results file
* [http://www8.hp.com/us/en/software-solutions/static-code-analysis-sast/index.html HP Fortify (On-Demand and stand-alone versions)] - .fpr results file
+
* [https://www.ibm.com/us-en/marketplace/ibm-appscan-source IBM AppScan Source (Standalone and Cloud)] - .ozasmt or .xml results file
* [http://www-03.ibm.com/software/products/en/appscan-source IBM AppScan Source] - .ozasmt results file
+
* [https://juliasoft.com/solutions/julia-for-security/ Julia Analyzer] - .xml results file
* [http://www.juliasoft.com/eng/solutions/overview Julia Analyzer] - .xml results file
+
* [https://www.kiuwan.com/code-security-sast/ Kiuwan Code Security] - .threadfix results file
* [http://www.parasoft.com/product/jtest/ Parasoft Jtest] - .xml results file
+
* [https://software.microfocus.com/en-us/products/static-code-analysis-sast/overview Micro Focus (Formally HPE) Fortify (On-Demand and stand-alone versions)] - .fpr results file
 +
* [https://www.parasoft.com/products/jtest/ Parasoft Jtest] - .xml results file
 +
* [https://semmle.com/lgtm Semmle LGTM] - .sarif results file
 +
* [https://www.shiftleft.io/product/ ShiftLeft SAST] - .sl results file (Benchmark specific format. Ask vendor how to generate this)
 +
* [https://snappycodeaudit.com/category/static-code-analysis Snappycode Audit's SnappyTick Source Edition (SAST)] - .xml results file
 
* [https://www.sourcemeter.com/features/ SourceMeter] - .txt results file of ALL results from VulnerabilityHunter
 
* [https://www.sourcemeter.com/features/ SourceMeter] - .txt results file of ALL results from VulnerabilityHunter
* [http://www.veracode.com/products/binary-static-analysis-sast Veracode SAST] - .xml results file
+
* [https://www.synopsys.com/content/dam/synopsys/sig-assets/datasheets/SAST-Coverity-datasheet.pdf Synopsys Static Analysis (Formerly Coverity Code Advisor) (On-Demand and stand-alone versions)] - .json results file (You can scan Benchmark w/Coverity for free. See: https://scan.coverity.com/)
 +
* [https://www.defensecode.com/thunderscan.php Thunderscan SAST] - .xml results file
 +
* [https://www.veracode.com/products/binary-static-analysis-sast Veracode SAST] - .xml results file
 +
* [https://www.rigs-it.com/xanitizer/ XANITIZER] - xml results file ([https://www.rigs-it.com/wp-content/uploads/2018/03/howtosetupxanitizerforowaspbenchmarkproject.pdf Their white paper on how to setup Xanitizer to scan Benchmark.]) (Free trial available)
  
We are looking for results for other commercial static analysis tools like: [http://www.grammatech.com/codesonar Grammatech CodeSonar], [http://www.klocwork.com/products-services/klocwork Klocwork], etc. If you have a license for any static analysis tool not already listed above and can run it on the Benchmark and send us the results file that would be very helpful.  
+
We are looking for results for other commercial static analysis tools like: [https://www.grammatech.com/products/codesonar Grammatech CodeSonar], [https://www.roguewave.com/products-services/klocwork RogueWave's Klocwork], etc. If you have a license for any static analysis tool not already listed above and can run it on the Benchmark and send us the results file that would be very helpful.  
  
 
The free SAST tools come bundled with the Benchmark so you can run them yourselves. If you have a license for any commercial SAST tool, you can also run them against the Benchmark. Just put your results files in the /results folder of the project, and then run the BenchmarkScore script for your platform (.sh / .bat) and it will generate a scorecard in the /scorecard directory for all the tools you have results for that are currently supported.
 
The free SAST tools come bundled with the Benchmark so you can run them yourselves. If you have a license for any commercial SAST tool, you can also run them against the Benchmark. Just put your results files in the /results folder of the project, and then run the BenchmarkScore script for your platform (.sh / .bat) and it will generate a scorecard in the /scorecard directory for all the tools you have results for that are currently supported.
Line 422: Line 429:
 
* [http://www.arachni-scanner.com/ Arachni] - .xml results file
 
* [http://www.arachni-scanner.com/ Arachni] - .xml results file
 
** To generate .xml, run: ./bin/arachni_reporter "Your_AFR_Results_Filename.afr" --reporter=xml:outfile=Benchmark1.2-Arachni.xml
 
** To generate .xml, run: ./bin/arachni_reporter "Your_AFR_Results_Filename.afr" --reporter=xml:outfile=Benchmark1.2-Arachni.xml
* [https://www.owasp.org/index.php/ZAP OWASP ZAP] - .xml results file
+
* [https://www.owasp.org/index.php/ZAP OWASP ZAP] - .xml results file. To generate a complete ZAP XML results file so you can generate a valid scorecard, make sure you:
 +
** Tools > Options > Alerts - And set the Max alert instances to like 500.
 +
** Then: Report > Generate XML Report...
  
 
'''Commercial DAST Tools:'''
 
'''Commercial DAST Tools:'''
  
* [https://www.acunetix.com/vulnerability-scanner/ Acunetix Web Vulnerability Scanner (WVS)] - .xml results file (Generated using [http://www.acunetix.com/blog/docs/acunetix-wvs-cli-operation/ command line interface] /ExportXML switch)
+
* [https://www.acunetix.com/vulnerability-scanner/ Acunetix Web Vulnerability Scanner (WVS)] - .xml results file (Generated using [https://www.acunetix.com/resources/wvs7manual.pdf command line interface (see Chapter 10.)] /ExportXML switch)
* [https://portswigger.net/burp/ Burp Pro] - .xml results file
+
* [https://portswigger.net/burp Burp Pro] - .xml results file
**You must use Burp Pro v1.6.30+ to scan the Benchmark due to a limitation fixed in v1.6.30.
+
* [https://www.ibm.com/us-en/marketplace/appscan-standard IBM AppScan] - .xml results file
* [http://www8.hp.com/us/en/software-solutions/webinspect-dynamic-analysis-dast/ HP WebInspect] - .xml results file
+
* [https://software.microfocus.com/en-us/products/webinspect-dynamic-analysis-dast/overview Micro Focus (Formally HPE) WebInspect] - .xml results file
* [http://www-03.ibm.com/software/products/en/appscan IBM AppScan] - .xml results file
 
 
* [https://www.netsparker.com/web-vulnerability-scanner/ Netsparker] - .xml results file
 
* [https://www.netsparker.com/web-vulnerability-scanner/ Netsparker] - .xml results file
* [http://www.rapid7.com/products/appspider/ Rapid7 AppSpider] - .xml results file
+
* [https://www.qualys.com/apps/web-app-scanning/ Qualys Web App Scanner] - .xml results file
 
+
* [https://www.rapid7.com/products/appspider/ Rapid7 AppSpider] - .xml results file
* Qualys - We ran Qualys against v1.2 of the Benchmark and it found none of the vulnerabilities we test for as far as we could tell. So we haven't implemented a scorecard generator for it. If you get results where you think it does find some real issues, send us the results file and, if confirmed, we'll produce a scorecard generator for it.
 
  
 
If you have access to other DAST Tools, PLEASE RUN THEM FOR US against the Benchmark, and send us the results file so we can build a scorecard generator for that tool.
 
If you have access to other DAST Tools, PLEASE RUN THEM FOR US against the Benchmark, and send us the results file so we can build a scorecard generator for that tool.
Line 440: Line 447:
 
'''Commercial Interactive Application Security Testing (IAST) Tools:'''
 
'''Commercial Interactive Application Security Testing (IAST) Tools:'''
  
* [http://www.contrastsecurity.com/features Contrast] - .zip results file
+
* [https://www.contrastsecurity.com/interactive-application-security-testing-iast Contrast Assess] - .zip results file (You can scan Benchmark w/Contrast for free. See: https://www.contrastsecurity.com/contrast-community-edition)
 +
* [https://hdivsecurity.com/interactive-application-security-testing-iast Hdiv Detection (IAST)] - .hlg results file
 +
* [https://www.synopsys.com/software-integrity/security-testing/interactive-application-security-testing.html Seeker IAST] - .csv results file
  
 
'''Commercial Hybrid Analysis Application Security Testing Tools:'''
 
'''Commercial Hybrid Analysis Application Security Testing Tools:'''
Line 490: Line 499:
  
 
  GIT: http://git-scm.com/ or https://github.com/
 
  GIT: http://git-scm.com/ or https://github.com/
  Maven: https://maven.apache.org/  (Version: 3.2.3 or newer works. We heard that 3.0.5 throws an error.)
+
  Maven: https://maven.apache.org/  (Version: 3.2.3 or newer works.)
  Java: http://www.oracle.com/technetwork/java/javase/downloads/index.html (Java 7 or 8) (64-bit) - Takes ALOT of memory to compile the Benchmark.
+
  Java: http://www.oracle.com/technetwork/java/javase/downloads/index.html (Java 7 or 8) (64-bit)
  
 
==Getting, Building, and Running the Benchmark==
 
==Getting, Building, and Running the Benchmark==
Line 509: Line 518:
 
We have several preconstructed VMs or instructions on how to build one that you can use instead:
 
We have several preconstructed VMs or instructions on how to build one that you can use instead:
  
* Docker: A Dockerfile is checked into the project [https://github.com/OWASP/Benchmark/blob/master/VMs/Dockerfile here]. This Docker file should automatically produce a Docker VM that has the latest and greatest version of the Benchmark project files. After you have Docker installed, navigate to this directory and do the following:  
+
* Docker: A Dockerfile is checked into the project [https://github.com/OWASP/Benchmark/blob/master/VMs/Dockerfile here]. This Docker file should automatically produce a Docker VM with the latest Benchmark project files. After you have Docker installed, cd to /VMs then run:  
  docker build -t benchmark:v1.2 .   --> This builds the Docker Benchmark VM (This will take a WHILE)
+
  ./buildDockerImage.sh --> This builds the Docker Benchmark VM (This will take a WHILE)
  docker images   --> You should see this new image in the list provided
+
  docker images --> You should see the new benchmark:latest image in the list provided
  # The above 2 steps only have to be done once. Then, to run the Benchmark in your Docker VM, just do this:
+
  # The Benchmark Docker Image only has to be created once.  
docker run -p 8443:8443 -it benchmark:v1.2 /benchmark/bench.sh  --> Clones Benchmark from github, builds everything, and starts a remotely accessible Benchmark web app.
+
 
 +
To run the Benchmark in your Docker VM, just run:
 +
  ./runDockerImage.sh  --> This pulls in any updates to Benchmark since the Image was built, builds everything, and starts a remotely accessible Benchmark web app.
 
  If successful, you should see this at the end:
 
  If successful, you should see this at the end:
 
   [INFO] [talledLocalContainer] Tomcat 8.x started on port [8443]
 
   [INFO] [talledLocalContainer] Tomcat 8.x started on port [8443]
 
   [INFO] Press Ctrl-C to stop the container...
 
   [INFO] Press Ctrl-C to stop the container...
  docker-machine ls (in a different window) --> To get IP Docker VM is exporting (e.g., tcp://192.168.99.100:2376)
+
  Then simply navigate to: https://localhost:8443/benchmark from the machine you are running Docker
In a browser, navigate to: https://192.168.99.100:8443/benchmark (using the above IP as an example)
+
 +
Or if you want to access from a different machine:
 +
  docker-machine ls (in a different terminal) --> To get IP Docker VM is exporting (e.g., tcp://192.168.99.100:2376)
 +
  Navigate to: https://192.168.99.100:8443/benchmark in your browser (using the above IP as an example)
 +
 
 
* Amazon Web Services (AWS) - Here's how you set up the Benchmark on an AWS VM:
 
* Amazon Web Services (AWS) - Here's how you set up the Benchmark on an AWS VM:
 
 
  sudo yum install git
 
  sudo yum install git
 
  sudo yum install maven
 
  sudo yum install maven
Line 624: Line 638:
 
[http://h3xstream.github.io/find-sec-bugs/ FindSecurityBugs] is a great plugin for FindBugs that significantly increases the ability for FindBugs to find security issues. We include this free tool in the Benchmark and its all dialed in. Simply run the script: ./script/runFindSecBugs.(sh or bat). If you want to run a different version of FindSecBugs, just change the version number of the findsecbugs-plugin artifact in the Benchmark pom.xml file.
 
[http://h3xstream.github.io/find-sec-bugs/ FindSecurityBugs] is a great plugin for FindBugs that significantly increases the ability for FindBugs to find security issues. We include this free tool in the Benchmark and its all dialed in. Simply run the script: ./script/runFindSecBugs.(sh or bat). If you want to run a different version of FindSecBugs, just change the version number of the findsecbugs-plugin artifact in the Benchmark pom.xml file.
  
=== HP Fortify ===
+
=== Kiuwan Code Security ===
 +
 
 +
Kiuwan Code Security includes a predefined model for executing the OWASP benchmark. Refer to the [https://www.kiuwan.com/blog/owasp-benchmark-diy/ step-by-step instructions] on the Kiuwan website.
 +
 
 +
=== Micro Focus (Formally HP) Fortify ===
  
 
If you are using the Audit Workbench, you can give it more memory and make sure you invoke it in 64-bit mode by doing this:
 
If you are using the Audit Workbench, you can give it more memory and make sure you invoke it in 64-bit mode by doing this:
Line 636: Line 654:
 
   Translate Phase:
 
   Translate Phase:
 
   export JAVA_HOME=$(/usr/libexec/java_home)
 
   export JAVA_HOME=$(/usr/libexec/java_home)
   export PATH=$PATH:/Applications/HP_Fortify/HP_Fortify_SCA_and_Apps_4.10/bin
+
   export PATH=$PATH:/Applications/HP_Fortify/HP_Fortify_SCA_and_Apps_17.10/bin
 
   export SCA_VM_OPTS="-Xmx2G -version 1.7"
 
   export SCA_VM_OPTS="-Xmx2G -version 1.7"
 
   mvn sca:clean
 
   mvn sca:clean
Line 677: Line 695:
  
 
To run ZAP against Benchmark:
 
To run ZAP against Benchmark:
 +
# Because Benchmark uses Cookies and Headers as sources of attack for many test cases: Tools --> Options --> Active Scan Input Vectors: Then check the HTTP Headers, All Requests, and Cookie Data checkboxes and hit OK
 
# Click on Show All Tabs button (if spider tab isn't visible)
 
# Click on Show All Tabs button (if spider tab isn't visible)
 
# Go to Spider tab (the black spider) and click on New Scan button
 
# Go to Spider tab (the black spider) and click on New Scan button
 
# Enter: https://localhost:8443/benchmark/  into the 'Starting Point' box and hit 'Start Scan'
 
# Enter: https://localhost:8443/benchmark/  into the 'Starting Point' box and hit 'Start Scan'
 +
#* Do this again. For some reason it takes 2 passes with the Spider before it stops finding more Benchmark endpoints.
 
# When Spider completes, click on 'benchmark' folder in Site Map, right click and select: 'Attack --> Active Scan'
 
# When Spider completes, click on 'benchmark' folder in Site Map, right click and select: 'Attack --> Active Scan'
 
#* It will take several hours, like 3+ to complete (it's actually likely to simply freeze before completing the scan - see NOTE: below)
 
#* It will take several hours, like 3+ to complete (it's actually likely to simply freeze before completing the scan - see NOTE: below)
  
 
For faster active scan you can
 
For faster active scan you can
* Disable the ZAP DB log.
+
* Disable the ZAP DB log (in ZAP 2.5.0+):
**  In ZAP 2.5.0+ you can:
+
** Disable it via Options / Database / Recover Log
*** Disable it via Options / Database / Recover Log
+
** Set it on the command line using "-config database.recoverylog=false"
*** Set it on the command line using "-config database.recoverylog=false"
 
** In ZAP prior to 2.5.0, you need to edit the file zapdb.script (in "db" directory) and change the line:
 
*** SET FILES LOG TRUE    to:  SET FILES LOG FALSE
 
 
* Disable unnecessary plugins / Technologies: When you launch the Active Scan
 
* Disable unnecessary plugins / Technologies: When you launch the Active Scan
 
** On the Policy tab, disable all plugins except: XSS (Reflected), Path Traversal, SQLi, OS Command Injection
 
** On the Policy tab, disable all plugins except: XSS (Reflected), Path Traversal, SQLi, OS Command Injection
 
** Go the Technology Tab, disable everything and only enable: MySQL, YOUR_OS, Tomcat
 
** Go the Technology Tab, disable everything and only enable: MySQL, YOUR_OS, Tomcat
** Note: This 2nd performance improvement step is a bit like cheating as you wouldn't do this for a normal site scan. You'd want to leave all this on in case these other plugins/technologies are helpful in finding more issues. So a fair performance comparison of ZAP to other tools would leave all this one.
+
** Note: This 2nd performance improvement step is a bit like cheating as you wouldn't do this for a normal site scan. You'd want to leave all this on in case these other plugins/technologies are helpful in finding more issues. So a fair performance comparison of ZAP to other tools would leave all this on.
  
 
To generate the ZAP XML results file so you can generate its scorecard:
 
To generate the ZAP XML results file so you can generate its scorecard:
Line 710: Line 727:
 
Interactive Application Security Testing (IAST) tools work differently than scanners.  IAST tools monitor an application as it runs to identify application vulnerabilities using context from inside the running application. Typically these tools run continuously, immediately notifying users of vulnerabilities, but you can also get a full report of an entire application. To do this, we simply run the Benchmark application with an IAST agent and use a crawler to hit all the pages.
 
Interactive Application Security Testing (IAST) tools work differently than scanners.  IAST tools monitor an application as it runs to identify application vulnerabilities using context from inside the running application. Typically these tools run continuously, immediately notifying users of vulnerabilities, but you can also get a full report of an entire application. To do this, we simply run the Benchmark application with an IAST agent and use a crawler to hit all the pages.
  
=== Contrast ===
+
=== Contrast Assess ===
 
 
To use Contrast, we simply add the agent to the Benchmark environment and run the BenchmarkCrawler. The entire process should only take a few minutes. We provided a few scripts, which simply add the -javaagent:contrast.jar flag to the Benchmark launch configuration. We have tested on MacOS, Ubuntu, and Windows.  Be sure your VM has at least 4M of memory.
 
  
 +
To use Contrast Assess, we simply add the Java agent to the Benchmark environment and run the BenchmarkCrawler. The entire process should only take a few minutes. We provided a few scripts, which simply add the -javaagent:contrast.jar flag to the Benchmark launch configuration. We have tested on MacOS, Ubuntu, and Windows.  Be sure your VM has at least 4M of memory.
  
 
* Ensure your environment has Java, Maven, and git installed, then build the Benchmark project
 
* Ensure your environment has Java, Maven, and git installed, then build the Benchmark project
Line 720: Line 736:
 
   '''$ mvn compile'''
 
   '''$ mvn compile'''
  
* Download a licensed copy of the Contrast Agent (contrast.jar) from your Contrast TeamServer account and put it in the /Benchmark/tools/Contrast directory.
+
* Download a licensed copy of the Contrast Assess Java Agent (contrast.jar) from your Contrast TeamServer account and put it in the /Benchmark/tools/Contrast directory.
 
   '''$ cp ~/Downloads/contrast.jar tools/Contrast'''
 
   '''$ cp ~/Downloads/contrast.jar tools/Contrast'''
  
 
* In Terminal 1, launch the Benchmark application and wait until it starts
 
* In Terminal 1, launch the Benchmark application and wait until it starts
   '''$  ./runBenchmark_wContrast.sh''' (.bat on Windows)
+
   '''$ cd tools/Contrast  
 +
  '''$ ./runBenchmark_wContrast.sh''' (.bat on Windows)
 
   '''[INFO] Scanning for projects...
 
   '''[INFO] Scanning for projects...
 
   '''[INFO]                                                                         
 
   '''[INFO]                                                                         
Line 752: Line 769:
 
   '''Copying Contrast report to results directory'''
 
   '''Copying Contrast report to results directory'''
  
* Generate scorecards in /Benchmark/scorecard
+
* In Terminal 2, generate scorecards in /Benchmark/scorecard
 
   '''$ ./createScorecards.sh''' (.bat on Windows)
 
   '''$ ./createScorecards.sh''' (.bat on Windows)
 
   '''Analyzing results from Benchmark_1.2-Contrast.log
 
   '''Analyzing results from Benchmark_1.2-Contrast.log
Line 760: Line 777:
 
* Open the Benchmark Scorecard in your browser
 
* Open the Benchmark Scorecard in your browser
 
   '''/Users/owasp/Projects/Benchmark/scorecard/Benchmark_v1.2_Scorecard_for_Contrast.html'''
 
   '''/Users/owasp/Projects/Benchmark/scorecard/Benchmark_v1.2_Scorecard_for_Contrast.html'''
 +
 +
=== Hdiv Detection ===
 +
 +
Hdiv has written their own instructions on how to run the detection component of their product on the Benchmark here: https://hdivsecurity.com/docs/features/benchmark/#how-to-run-hdiv-in-owasp-benchmark-project. You'll see that these instructions involve using the same crawler used to exercise all the test cases in the Benchmark, just like Contrast above.
  
 
= RoadMap =
 
= RoadMap =
Line 777: Line 798:
 
While we don't have hard and fast rules of exactly what we are going to do next, enhancements in the following areas are planned for the next release:
 
While we don't have hard and fast rules of exactly what we are going to do next, enhancements in the following areas are planned for the next release:
  
* Add new vulnerability categories (e.g., Hibernate Injection)
+
* Add new vulnerability categories (e.g., XXE, Hibernate Injection)
 
* Add support for popular server side Java frameworks (e.g., Spring)
 
* Add support for popular server side Java frameworks (e.g., Spring)
 
* Add web services test cases
 
* Add web services test cases

Latest revision as of 03:05, 20 December 2019

Lab big.jpg

OWASP Benchmark Project

The OWASP Benchmark for Security Automation (OWASP Benchmark) is a free and open test suite designed to evaluate the speed, coverage, and accuracy of automated software vulnerability detection tools and services (henceforth simply referred to as 'tools'). Without the ability to measure these tools, it is difficult to understand their strengths and weaknesses, and compare them to each other. Each version of the OWASP Benchmark contains thousands of test cases that are fully runnable and exploitable, each of which maps to the appropriate CWE number for that vulnerability.

You can use the OWASP Benchmark with Static Application Security Testing (SAST) tools, Dynamic Application Security Testing (DAST) tools like OWASP ZAP and Interactive Application Security Testing (IAST) tools. Benchmark is implemented in Java. Future versions may expand to include other languages.

Benchmark Project Scoring Philosophy

Security tools (SAST, DAST, and IAST) are amazing when they find a complex vulnerability in your code. But with widespread misunderstanding of the specific vulnerabilities automated tools cover, end users are often left with a false sense of security.

We are on a quest to measure just how good these tools are at discovering and properly diagnosing security problems in applications. We rely on the long history of military and medical evaluation of detection technology as a foundation for our research. Therefore, the test suite tests both real and fake vulnerabilities.

There are four possible test outcomes in the Benchmark:

  1. Tool correctly identifies a real vulnerability (True Positive - TP)
  2. Tool fails to identify a real vulnerability (False Negative - FN)
  3. Tool correctly ignores a false alarm (True Negative - TN)
  4. Tool fails to ignore a false alarm (False Positive - FP)

We can learn a lot about a tool from these four metrics. Consider a tool that simply flags every line of code as vulnerable. This tool will perfectly identify all vulnerabilities! But it will also have 100% false positives and thus adds no value. Similarly, consider a tool that reports absolutely nothing. This tool will have zero false positives, but will also identify zero real vulnerabilities and is also worthless. You can even imagine a tool that flips a coin to decide whether to report whether each test case contains a vulnerability. The result would be 50% true positives and 50% false positives. We need a way to distinguish valuable security tools from these trivial ones.

If you imagine the line that connects all these points, from 0,0 to 100,100 establishes a line that roughly translates to "random guessing." The ultimate measure of a security tool is how much better it can do than this line. The diagram below shows how we will evaluate security tools against the Benchmark.

Wbe guide.png

A point plotted on this chart provides a visual indication of how well a tool did considering both the True Positives the tool reported, as well as the False Positives it reported. We also want to compute an individual score for that point in the range 0 - 100, which we call the Benchmark Accuracy Score.

The Benchmark Accuracy Score is essentially a Youden Index, which is a standard way of summarizing the accuracy of a set of tests. Youden's index is one of the oldest measures for diagnostic accuracy. It is also a global measure of a test performance, used for the evaluation of overall discriminative power of a diagnostic procedure and for comparison of this test with other tests. Youden's index is calculated by deducting 1 from the sum of a test’s sensitivity and specificity expressed not as percentage but as a part of a whole number: (sensitivity + specificity) – 1. For a test with poor diagnostic accuracy, Youden's index equals 0, and in a perfect test Youden's index equals 1.

 So for example, if a tool has a True Positive Rate (TPR) of .98 (i.e., 98%) 
   and False Positive Rate (FPR) of .05 (i.e., 5%)
 Sensitivity = TPR (.98)
 Specificity = 1-FPR (.95)
 So the Youden Index is (.98+.95) - 1 = .93
 
 And this would equate to a Benchmark score of 93 (since we normalize this to the range 0 - 100)

On the graph, the Benchmark Score is the length of the line from the point down to the diagonal “guessing” line. Note that a Benchmark score can actually be negative if the point is below the line. This is caused when the False Positive Rate is actually higher than the True Positive Rate.

Benchmark Validity

The Benchmark tests are not exactly like real applications. The tests are derived from coding patterns observed in real applications, but the majority of them are considerably simpler than real applications. That is, most real world applications will be considerably harder to successfully analyze than the OWASP Benchmark Test Suite. Although the tests are based on real code, it is possible that some tests may have coding patterns that don't occur frequently in real code.

Remember, we are trying to test the capabilities of the tools and make them explicit, so that users can make informed decisions about what tools to use, how to use them, and what results to expect. This is exactly aligned with the OWASP mission to make application security visible.

Generating Benchmark Scores

Anyone can use this Benchmark to evaluate vulnerability detection tools. The basic steps are:

  1. Download the Benchmark from GitHub
  2. Run your tools against the Benchmark
  3. Run the BenchmarkScore tool on the reports from your tools

That's it!

Full details on how to do this are at the bottom of the page on the Quick_Start tab.

We encourage both vendors, open source tools, and end users to verify their application security tools against the Benchmark. In order to ensure that the results are fair and useful, we ask that you follow a few simple rules when publishing results. We won't recognize any results that aren't easily reproducible:

  1. A description of the default “out-of-the-box” installation, version numbers, etc…
  2. Any and all configuration, tailoring, onboarding, etc… performed to make the tool run
  3. Any and all changes to default security rules, tests, or checks used to achieve the results
  4. Easily reproducible steps to run the tool

Reporting Format

The Benchmark includes tools to interpret raw tool output, compare it to the expected results, and generate summary charts and graphs. We use the following table format in order to capture all the information generated during the evaluation.

Code Repo and Build/Run Instructions

See the Getting Started and Getting, Building, and Running the Benchmark sections on the Quick Start tab.

Licensing

The OWASP Benchmark is free to use under the GNU General Public License v2.0.

Mailing List

OWASP Benchmark Mailing List

Project Leaders

Dave Wichers @

Project References

Related Projects

Quick Download

All test code and project files can be downloaded from OWASP GitHub.

Project Intro Video

BenchmarkPodcastTitlePage.jpg

News and Events

  • LOOKING FOR VOLUNTEERS!! - We are looking for individuals and organizations to join and make this a much more community driven project, including additional coleaders to help take this project to the next level. Contributors could work on things like new test cases, additional tool scorecard generators, adding support for languages beyond Java, and a host of other improvements. Please contact me if you are interested in contributing at any level.
  • June 5, 2016 - Benchmark Version 1.2 Released
  • Sep 24, 2015 - Benchmark introduced to broader OWASP community at AppSec USA
  • Aug 27, 2015 - U.S. Dept. of Homeland Security (DHS) is financially supporting the Benchmark project.
  • Aug 15, 2015 - Benchmark Version 1.2beta Released with full DAST Support. Checkmarx and ZAP scorecard generators also released.
  • July 10, 2015 - Benchmark Scorecard generator and open source scorecards released
  • May 23, 2015 - Benchmark Version 1.1 Released
  • April 15, 2015 - Benchmark Version 1.0 Released

Classifications

Owasp-incubator-trans-85.png Owasp-builders-small.png
Owasp-defenders-small.png
GNU General Public License v2.0
Project Type Files CODE.jpg