This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

Difference between revisions of "Input Validation Cheat Sheet"

From OWASP
Jump to: navigation, search
m (Point to the official site)
 
(27 intermediate revisions by 6 users not shown)
Line 2: Line 2:
 
<div style="width:100%;height:160px;border:0,margin:0;overflow: hidden;">[[File:Cheatsheets-header.jpg|link=]]</div>
 
<div style="width:100%;height:160px;border:0,margin:0;overflow: hidden;">[[File:Cheatsheets-header.jpg|link=]]</div>
  
{| style="padding: 0;margin:0;margin-top:10px;text-align:left;" |-
+
The Cheat Sheet Series project has been moved to [https://github.com/OWASP/CheatSheetSeries GitHub]!
| valign="top"  style="border-right: 1px dotted gray;padding-right:25px;" |
 
Last revision (mm/dd/yy): '''{{REVISIONMONTH}}/{{REVISIONDAY}}/{{REVISIONYEAR}}'''
 
= Introduction  =
 
__TOC__{{TOC hidden}}
 
  
This article is focused on providing clear, simple, actionable guidance for providing Input Validation security functionality in your applications.
+
Please visit [https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html Input Validation Cheat Sheet] to see the latest version of the cheat sheet.
 
 
== Goal of Input Validation ==
 
Input validation is performed to minimize malformed data from entering the system. Input Validation is NOT the primary method of preventing XSS, SQL Injection. These are covered in output encoding and related cheat sheets.
 
 
 
== Goal of Output Encoding ==
 
Output encoding is to ensure that data is sanitised before being displayed to the user 
 
 
 
== White List Input Validation ==
 
 
 
It is always recommended to prevent attacks as early as possible in the processing of the user’s (attacker's) request. Input validation can be used to detect unauthorized input before it is processed by the application. Developers frequently perform black list validation in order to try to detect attack characters and patterns like the ' character, the string 1=1, or the &lt;script&gt; tag, but this is a massively flawed approach as it is typically trivial for an attacker to avoid getting caught by such filters. Plus, such filters frequently prevent authorized input, like O'Brian, when the ' character is being filtered out.
 
 
 
White list validation is appropriate for all input fields provided by the user. White list validation involves defining exactly what IS authorized, and by definition, everything else is not authorized. If it's well structured data, like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place. The most difficult fields to validate are so called 'free text' fields, like blog entries. However, even those types of fields can be validated to some degree, you can at least exclude all non-printable characters, and define a maximum size for the input field.
 
 
 
Developing regular expressions can be complicated, and is well beyond the scope of this cheat sheet. There are lots of resources on the internet about how to write regular expressions, including: [http://www.regular-expressions.info/ http://www.regular-expressions.info/] and the [[OWASP Validation Regex Repository]]. The following provides a few examples of ‘white list’ style regular expressions:
 
 
 
== White List Regular Expression Examples ==
 
 
 
Validating a Zip Code (5 digits plus optional -4)
 
^\d{5}(-\d{4})?$
 
 
 
Validating U.S. State Selection From a Drop-Down Menu
 
^(AA|AE|AP|AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|
 
HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|
 
NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|
 
TX|UT|VT|VI|VA|WA|WV|WI|WY)$
 
 
 
 
 
'''Java Regex Usage Example'''
 
 
 
  Example validating the parameter “zip” using a regular expression.
 
 
 
  private static final Pattern zipPattern = Pattern.compile("^\d{5}(-\d{4})?$");
 
  public void doPost( HttpServletRequest request, HttpServletResponse response) {
 
  try {
 
  String zipCode = request.getParameter( "zip" );
 
  if ( !zipPattern.matcher( zipCode ).matches()  {
 
  throw new YourValidationException( "Improper zipcode format." );
 
  }
 
  .. do what you want here, after its been validated ..
 
  } catch(YourValidationException e ) {
 
  response.sendError( response.SC_BAD_REQUEST, e.getMessage() );
 
  }
 
  }
 
 
 
Some white list validators have also been predefined in various open source packages that you can leverage. For example:
 
* [http://jakarta.apache.org/commons/validator Apache Commons Validator]
 
 
 
 
 
Input Validation Must Be:
 
* Applied to all user controlled data
 
* Define the types of characters that can be accepted (often U+0020 to U+007E, though most special characters could be removed and control characters are almost never needed)
 
* Defines a minimum and maximum length for the data (e.g. {1,25} )
 
== Client Side vs Server Side Validation ==
 
Be aware that any JavaScript input validation performed on the client can be bypassed by an attacker that disables JavaScript or uses a Web Proxy. Ensure that any input validation performed on the client is also performed on the server.
 
== Positive Approach ==
 
The variations of attacks are enormous. Use regular expressions to define what is good and then deny the input if anything else is received. In other words, we want to use the approach "Accept Known Good" instead of "Reject Known Bad"
 
 
 
Example A field accepts a username. A good regex would be to verify
 
that the data consists of the following [0-9a-zA-Z]{3,10}. The data
 
is rejected if it doesn't match. 
 
 
 
A bad approach would be to build a list of malicious strings and then
 
just verify that the username does not contain the bad string. This
 
approach begs the question, did you think of all possible bad strings?
 
 
 
== Robust Use of Input Validation ==
 
All data received from the user should be treated as malicious and verified before using within the application. This includes the following
 
* Form data
 
* URL parameters
 
* Hidden fields
 
* Cookie data
 
* HTTP Headers
 
* Essentially anything in the HTTP request
 
 
 
== Input Validation ==
 
Data recieved from the user should be validated for the following factors as well:
 
 
 
1. Boundary conditions (Out of range values)
 
 
 
2. Length of the data inputed (for example, if the input control can accept only 8 character, the same should be validated while accepting the data. The input chars should not exceed 8 characters).
 
 
 
== Validating Rich User Content ==
 
It is very difficult to validate rich content submitted by a user. Consider more formal approaches such as [http://htmlpurifier.org/ HTML Purifier (PHP)],  [http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project AntiSamy] or [http://github.com/jsocol/bleach/ bleach (Python)]
 
 
 
== Preventing XSS and Content Security Policy ==
 
* All user data controlled must be encoded when returned in the html page to prevent the execution of malicious data (e.g. XSS). For example &lt;script&gt; would be returned as &amp;lt;script&amp;gt;
 
* The type of encoding is specific to the context of the page where the user controlled data is inserted. For example, HTML entity encoding is appropriate for data placed into the HTML body. However, user data placed into a script would need JavaScript specific output encoding
 
 
 
Detailed information on XSS prevention here: [http://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet OWASP XSS Prevention Cheat Sheet]
 
 
 
= Output Encoding =
 
 
 
== Preventing SQL Injection ==
 
* It's not realistic to always know if a piece of data is user controlled, therefore parameterized queries should be used whenever a method/function accepts data and uses this data as part of the SQL statement.
 
* String concatenation to build any part of a SQL statement with user controlled data creates a SQL injection vulnerability.
 
* Parameterized queries are a guaranteed approach to prevent SQL injection.
 
 
 
Further Reading: [http://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet SQL Injection Prevention Cheat Sheet]
 
 
 
== Preventing OS Injection ==
 
* Avoid sending user controlled data to the OS as much as possible
 
* Ensure that a robust escaping routine is in place to prevent the user from adding additional characters that can be executed by the OS ( e.g. user appends | to the malicious data and then executes another OS command). Remember to use a positive approach when constructing escaping routinges. Example
 
 
 
Further Reading: [http://www.owasp.org/index.php/Reviewing_Code_for_OS_Injection Reviewing Code for OS Injection]
 
 
 
== Preventing XML Injection ==
 
* In addition to the existing input validation, define a positive approach which escapes/encodes characters that can be interpreted as xml. At a minimum this includes the following: < > " ' &
 
* If accepting raw XML then more robust validation is necessary. This can be complex. Please contact the infrastructure security team for additional discussion
 
 
 
 
 
= Authors and Primary Editors  =
 
 
 
Dave Wichers - dave.wichers [at] aspectsecurity.com
 
 
 
== Other Cheatsheets ==
 
 
 
{{Cheatsheet_Navigation_Body}}
 
 
 
|}
 
 
 
[[Category:Cheatsheets]]
 

Latest revision as of 14:13, 15 July 2019

Cheatsheets-header.jpg

The Cheat Sheet Series project has been moved to GitHub!

Please visit Input Validation Cheat Sheet to see the latest version of the cheat sheet.