This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org
Difference between revisions of "Input Validation Cheat Sheet"
m (→Goal of Input Validation) |
(Implementing input validation) |
||
Line 10: | Line 10: | ||
This article is focused on providing clear, simple, actionable guidance for providing Input Validation security functionality in your applications. | This article is focused on providing clear, simple, actionable guidance for providing Input Validation security functionality in your applications. | ||
− | == | + | == Goals of Input Validation == |
Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components. | Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components. | ||
Input Validation is not the ''primary'' method of preventing [[XSS (Cross Site Scripting) Prevention Cheat Sheet|XSS]], [[SQL Injection Prevention Cheat Sheet|SQL Injection]] and other attacks which are covered in respective [[OWASP Cheat Sheet Series|cheat sheets]] but can significantly contribute to reducing their impact if implemented properly. | Input Validation is not the ''primary'' method of preventing [[XSS (Cross Site Scripting) Prevention Cheat Sheet|XSS]], [[SQL Injection Prevention Cheat Sheet|SQL Injection]] and other attacks which are covered in respective [[OWASP Cheat Sheet Series|cheat sheets]] but can significantly contribute to reducing their impact if implemented properly. | ||
+ | |||
+ | == Input validation strategies== | ||
+ | Input validation should be applied on both '''syntactical''' and '''semantic''' level. Syntactic validation should enforce correct syntax of structured fields (e.g. SSN, date, currency symbol) while semantic validation should enforce correctness of their ''values'' in the specific business context (e.g. start date is before end date, price is within expected range). | ||
+ | |||
+ | == Implementing input validation== | ||
+ | Input validation can be implemented using any programming technique that allows effective enforcement of syntactic and semantic correctness, for example: | ||
+ | |||
+ | * Data type validators available natively in web application frameworks (such as [https://docs.djangoproject.com/en/1.11/ref/validators/ Django Validators], [https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/package-summary.html#doc.Usage.validator Apache Commons Validators] etc) | ||
+ | * Validation against [http://json-schema.org/ JSON Schema] and [https://www.w3.org/standards/techs/xmlschema#w3c_all XML Schema (XSD)] for input in these formats | ||
+ | * Type conversion (e.g. <code>Integer.parseInt()</code> in Java, <code>int()</code> in Python) with strict exception handling | ||
+ | * Minimum and maximum value range check for numerical parameters and dates, minimum and maximum length check for strings | ||
+ | * Array of allowed values for small sets of string parameters (e.g. days of week) | ||
+ | * Regular expressions for any other structured data covering the whole input string (^...$) and '''not''' using "any character" wildcard (such as "." or "\S") | ||
== White List Input Validation == | == White List Input Validation == |
Revision as of 10:19, 16 May 2017
Last revision (mm/dd/yy): 05/16/2017 IntroductionThis article is focused on providing clear, simple, actionable guidance for providing Input Validation security functionality in your applications. Goals of Input ValidationInput validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components. Input Validation is not the primary method of preventing XSS, SQL Injection and other attacks which are covered in respective cheat sheets but can significantly contribute to reducing their impact if implemented properly. Input validation strategiesInput validation should be applied on both syntactical and semantic level. Syntactic validation should enforce correct syntax of structured fields (e.g. SSN, date, currency symbol) while semantic validation should enforce correctness of their values in the specific business context (e.g. start date is before end date, price is within expected range). Implementing input validationInput validation can be implemented using any programming technique that allows effective enforcement of syntactic and semantic correctness, for example:
White List Input ValidationIt is always recommended to prevent attacks as early as possible in the processing of the user’s (attacker's) request. Input validation can be used to detect unauthorized input before it is processed by the application. Developers frequently perform black list validation in order to try to detect attack characters and patterns like the ' character, the string 1=1, or the <script> tag, but this is a massively flawed approach as it is typically trivial for an attacker to avoid getting caught by such filters. Plus, such filters frequently prevent authorized input, like O'Brian, when the ' character is being filtered out. For more information on XSS filter evasion please see the XSS Filter Evasion Cheat Sheet. White list validation is appropriate for all input fields provided by the user. White list validation involves defining exactly what IS authorized, and by definition, everything else is not authorized. If it's well structured data, like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place. The most difficult fields to validate are so called 'free text' fields, like blog entries. However, even those types of fields can be validated to some degree. For example, you can at least exclude all non-printable characters (except acceptable white space, e.g., CR, LF, tab, space), and define a maximum length for the input field. Developing regular expressions can be complicated, and is well beyond the scope of this cheat sheet. There are lots of resources on the internet about how to write regular expressions, including: http://www.regular-expressions.info/ and the OWASP Validation Regex Repository. In summary, input validation should:
White List Regular Expression ExamplesValidating an U.S. Zip Code (5 digits plus optional -4) ^\d{5}(-\d{4})?$ Validating U.S. State Selection From a Drop-Down Menu ^(AA|AE|AP|AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU| HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE| NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN| TX|UT|VT|VI|VA|WA|WV|WI|WY)$ Java Regex Usage Example Example validating the parameter “zip” using a regular expression. private static final Pattern zipPattern = Pattern.compile("^\d{5}(-\d{4})?$"); public void doPost( HttpServletRequest request, HttpServletResponse response) { try { String zipCode = request.getParameter( "zip" ); if ( !zipPattern.matcher( zipCode ).matches() { throw new YourValidationException( "Improper zipcode format." ); } .. do what you want here, after its been validated .. } catch(YourValidationException e ) { response.sendError( response.SC_BAD_REQUEST, e.getMessage() ); } } Some white list validators have also been predefined in various open source packages that you can leverage. For example: Client Side vs Server Side ValidationBe aware that any JavaScript input validation performed on the client can be bypassed by an attacker that disables JavaScript or uses a Web Proxy. Ensure that any input validation performed on the client is also performed on the server. Validating Rich User ContentIt is very difficult to validate rich content submitted by a user. For more information, please see the cheatsheet on Sanitizing HTML Markup with a Library Designed for the Job. Preventing XSS and Content Security Policy
Detailed information on XSS prevention here: OWASP XSS Prevention Cheat Sheet File Upload ValidationMany websites allow users to upload files, such as a profile picture or more. This section helps provide that feature securely. Upload Verification
Upload Storage
Public Serving of Uploaded Content
Beware of "special" files
Upload Verification
Email Address ValidationEmail Validation BasicsMany web applications do not treat email addresses correctly due to common misconceptions about what constitutes a valid address. Specifically, it is completely valid to have an mailbox address which:
At the time of writing, RFC 5321 is the current standard defining SMTP and what constitutes a valid mailbox address. Please note, email addresses should be considered to be public data. Many web applications contain computationally expensive and inaccurate regular expressions that attempt to validate email addresses. Recent changes to the landscape mean that the number of false-negatives will increase, particularly due to:
Following RFC 5321, best practice for validating an email address would be to:
To ensure an address is deliverable, the only way to check this is to send the user an email and have the user take action to confirm receipt. Beyond confirming that the email address is valid and deliverable, this also provides a positive acknowledgement that the user has access to the mailbox and is likely to be authorized to use it. This does not mean that other users cannot access this mailbox, for example when the user makes use of a service that generates a throw away email address.
Address NormalizationAs the local-part of email addresses are, in fact - case sensitive, it is important to store and compare email addresses correctly. To normalise an email address input, you would convert the domain part ONLY to lowercase. Unfortunately this does and will make input harder to normalise and correctly match to a users intent. It is reasonable to only accept one unique capitalisation of an otherwise identical address, however in this case it is critical to:
Authors and Primary EditorsDave Wichers - dave.wichers [at] aspectsecurity.com Other Cheatsheets |