Last revision (mm/dd/yy): 01/10/2016

Introduction

This article is focused on providing clear, simple, actionable guidance for providing Input Validation security functionality in your applications.

Goal of Input Validation

Input validation is performed to minimize malformed data from entering the system. Input Validation is NOT the primary method of preventing XSS, SQL Injection. These are covered in output encoding and related cheat sheets.

Goal of Output Encoding

Output encoding is to ensure that attack data is neutralized before being displayed to a user. For more information on XSS defense and output encoding visit the XSS (Cross Site Scripting) Prevention Cheat Sheet.

Goal of Securing File Uploads

Files uploaded to servers must be secured to ensure that malware / auto-executables / OS configuration changing files etc are not uploaded to servers that can impact the confidentiality, integrity and availability of the data stored on the server or other servers.

Goal of Error Handling

Errors should be handled in a manner that internal details of the systems are not disclosed to the end user via a stack trace

Goal of Secure Application Deployment

Application deployments should ensure that confidentiality, integrity and availability of the application are not vulnerable to malicious attacks

White List Input Validation

It is always recommended to prevent attacks as early as possible in the processing of the user’s (attacker's) request. Input validation can be used to detect unauthorized input before it is processed by the application. Developers frequently perform black list validation in order to try to detect attack characters and patterns like the ' character, the string 1=1, or the <script> tag, but this is a massively flawed approach as it is typically trivial for an attacker to avoid getting caught by such filters. Plus, such filters frequently prevent authorized input, like O'Brian, when the ' character is being filtered out.

White list validation is appropriate for all input fields provided by the user. White list validation involves defining exactly what IS authorized, and by definition, everything else is not authorized. If it's well structured data, like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input. If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place. The most difficult fields to validate are so called 'free text' fields, like blog entries. However, even those types of fields can be validated to some degree, you can at least exclude all non-printable characters, and define a maximum size for the input field.

Developing regular expressions can be complicated, and is well beyond the scope of this cheat sheet. There are lots of resources on the internet about how to write regular expressions, including: http://www.regular-expressions.info/ and the OWASP Validation Regex Repository. The following provides a few examples of ‘white list’ style regular expressions:

White List Regular Expression Examples

Validating a Zip Code (5 digits plus optional -4)

^\d{5}(-\d{4})?$

Validating U.S. State Selection From a Drop-Down Menu

^(AA|AE|AP|AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|
HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE| 
NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|
TX|UT|VT|VI|VA|WA|WV|WI|WY)$

Java Regex Usage Example

 Example validating the parameter “zip” using a regular expression.
 
 private static final Pattern zipPattern = Pattern.compile("^\d{5}(-\d{4})?$");
 public void doPost( HttpServletRequest request, HttpServletResponse response) {
 	try {
 		String zipCode = request.getParameter( "zip" );
 		if ( !zipPattern.matcher( zipCode ).matches()  {
 			throw new YourValidationException( "Improper zipcode format." );
 		}
 		.. do what you want here, after its been validated ..
 	} catch(YourValidationException e ) {
 		response.sendError( response.SC_BAD_REQUEST, e.getMessage() );
 	}
 }

Some white list validators have also been predefined in various open source packages that you can leverage. For example:

Apache Commons Validator

Input Validation Must Be:

Applied to all user controlled data
Define the types of characters that can be accepted (often U+0020 to U+007E, though most special characters could be removed and control characters are almost never needed)
Defines a minimum and maximum length for the data (e.g. {1,25} )

Client Side vs Server Side Validation

Be aware that any JavaScript input validation performed on the client can be bypassed by an attacker that disables JavaScript or uses a Web Proxy. Ensure that any input validation performed on the client is also performed on the server.

Positive Approach

The variations of attacks are enormous. Use regular expressions to define what is good and then deny the input if anything else is received. In other words, we want to use the approach "Accept Known Good" instead of "Reject Known Bad"

Example A field accepts a username. A good regex would be to verify 
that the data consists of the following [0-9a-zA-Z]{3,10}. The data 
is rejected if it doesn't match.

A bad approach would be to build a list of malicious strings and then 
just verify that the username does not contain the bad string. This 
approach begs the question, did you think of all possible bad strings?

Robust Use of Input Validation

All data received from the user should be treated as malicious and verified before using within the application. This includes the following

Form data
URL parameters
Hidden fields
Cookie data
HTTP Headers
Essentially anything in the HTTP request

Input Validation

Data recieved from the user should be validated for the following factors as well:

1. Boundary conditions (Out of range values)

2. Length of the data inputed (for example, if the input control can accept only 8 character, the same should be validated while accepting the data. The input chars should not exceed 8 characters).

Validating Rich User Content

It is very difficult to validate rich content submitted by a user. Consider more formal approaches such as HTML Purifier (PHP), AntiSamy or bleach (Python)

Preventing XSS and Content Security Policy

All user data controlled must be encoded when returned in the html page to prevent the execution of malicious data (e.g. XSS). For example <script> would be returned as <script>
The type of encoding is specific to the context of the page where the user controlled data is inserted. For example, HTML entity encoding is appropriate for data placed into the HTML body. However, user data placed into a script would need JavaScript specific output encoding

Detailed information on XSS prevention here: OWASP XSS Prevention Cheat Sheet

Output Encoding

Preventing SQL Injection

It's not realistic to always know if a piece of data is user controlled, therefore parameterized queries should be used whenever a method/function accepts data and uses this data as part of the SQL statement.
String concatenation to build any part of a SQL statement with user controlled data creates a SQL injection vulnerability.
Parameterized queries are a guaranteed approach to prevent SQL injection.

Further Reading: SQL Injection Prevention Cheat Sheet

Preventing OS Injection

Avoid sending user controlled data to the OS as much as possible
Ensure that a robust escaping routine is in place to prevent the user from adding additional characters that can be executed by the OS ( e.g. user appends | to the malicious data and then executes another OS command). Remember to use a positive approach when constructing escaping routinges. Example

Further Reading: Reviewing Code for OS Injection

Preventing XML Injection

In addition to the existing input validation, define a positive approach which escapes/encodes characters that can be interpreted as xml. At a minimum this includes the following: < > " ' &
If accepting raw XML then more robust validation is necessary. This can be complex. Please contact the infrastructure security team for additional discussion

File Uploads

Upload Verification

Use input validation to ensure the uploaded filename uses an expected extension type
Ensure the uploaded file is not larger than a defined maximum file size

Upload Storage

Use a new filename to store the file on the OS. Do not use any user controlled text for this filename or for the temporary filename.
Store all user uploaded files on a separate domain (e.g. mozillafiles.net vs mozilla.org). Archives should be analyzed for malicious content (anti-malware, static analysis, etc)

Public Serving of Uploaded Content

Ensure the image is served with the correct content-type (e.g. image/jpeg, application/x-xpinstall)

Beware of "special" files

The upload feature should be using a whitelist approach to only allow specific file types and extensions. However, it is important to be aware of the following file types that, if allowed, could result in security vulnerabilities.
"crossdomain.xml" allows cross-domain data loading in Flash, Java and Silverlight. If permitted on sites with authentication this can permit cross-domain data theft and CSRF attacks. Note this can get pretty complicated depending on the specific plugin version in question, so its best to just prohibit files named "crossdomain.xml" or "clientaccesspolicy.xml".
".htaccess" and ".htpasswd" provides server configuration options on a per-directory basis, and should not be permitted. See http://en.wikipedia.org/wiki/Htaccess

Upload Verification

Use image rewriting libraries to verify the image is valid and to strip away extraneous content.
Set the extension of the stored image to be a valid image extension based on the detected content type of the image from image processing (e.g. do not just trust the header from the upload).
Ensure the detected content type of the image is within a list of defined image types (jpg, png, etc)

Error Handling

Typical types of errors: • The result of business logic conditions not being met. • The result of the environment wherein the business logic resides fails. • The result of upstream or downstream systems upon which the application depends fail. • Technical hardware / physical failure.

To address these errors:

Ensure that all method/function calls that return a value have proper error handling and return value checking
Ensure that exceptions and error conditions are properly handled
Ensure that no system errors can be returned to the user
Ensure that the application fails in a secure manner
Ensure resources are released if an error occurs
Ensure that stack trace is not thrown to the user
If the language in question has a finally method, use it. The finally method is always called. The finally method can be used to release resources referenced by the method that threw the exception.

This is very important. An example would be if a method gained a database connection from a pool of connections, and an exception occurred without finally, the connection object shall not be returned to the pool for some time (until the timeout). This can lead to pool exhaustion. The method finally() is called even if no exception is thrown.

Secure Deployment

Secure access to with authentication and authorisation to configuration files, directories, and resources on the host so that direct access to such artifacts is disallowed
Use a “deny all” rule to deny access to resources on the hosts and then grant access on need basis
In Apache HTTP server, ensure directories like WEB-INF and META-INF are protected. If permissions for a directory and subdirectories are specified in .htaccess file, ensure that it is protected using the “deny all” rule
While using Struts framework, ensure that JSP files are not accessible directly by denying access to *.jsp files in web.xml
Maintain a clean environment. remove files that contain source code but are not used by the application.
Ensure production environment does not contain any source code / development tools and that the production
Ensure environment contains only compiled code / executables.
Remove test code / debug code (that might contain backdoors).
Remove commented code and meta tags as they might contain sensitive data.
If applicable, obfuscate your code to ensure that reverse engineering is avoided

Authors and Primary Editors

Dave Wichers - dave.wichers [at] aspectsecurity.com

Other Cheatsheets

V - T - E Cheat Sheets
Developer / Builder	3rd Party Javascript Management Access Control AJAX Security Cheat Sheet Authentication (ES) Bean Validation Cheat Sheet Choosing and Using Security Questions Clickjacking Defense Credential Stuffing Prevention Cheat Sheet Cross-Site Request Forgery (CSRF) Prevention Cryptographic Storage C-Based Toolchain Hardening CSS Security Deserialization DOM based XSS Prevention Forgot Password HTML5 Security HTTP Strict Transport Security Injection Prevention Cheat Sheet Injection Prevention Cheat Sheet in Java JSON Web Token (JWT) Cheat Sheet for Java Input Validation Insecure Direct Object Reference Prevention JAAS Key Management LDAP Injection Prevention Logging Mass Assignment Cheat Sheet .NET Security OS Command Injection Defense Cheat Sheet OWASP Top Ten Password Storage Pinning Query Parameterization REST Security Ruby on Rails Session Management SAML Security SQL Injection Prevention Transaction Authorization Transport Layer Protection TLS Cipher String Configuration Unvalidated Redirects and Forwards User Privacy Protection Web Service Security XSS (Cross Site Scripting) Prevention XML External Entity (XXE) Prevention Cheat Sheet
Assessment / Breaker	Attack Surface Analysis REST Assessment Web Application Security Testing XML Security Cheat Sheet XSS Filter Evasion
Mobile	Android Testing IOS Developer Mobile Jailbreaking
OpSec / Defender	Virtual Patching Vulnerability Disclosure
Draft and Beta	Application Security Architecture Business Logic Security Content Security Policy Denial of Service Cheat Sheet Grails Secure Code Review IOS Application Security Testing PHP Security Regular Expression Security Cheatsheet Secure Coding Secure SDLC Threat Modeling
All Pages In This Category

Input Validation Cheat Sheet

Introduction

Goal of Input Validation

Goal of Output Encoding

Goal of Securing File Uploads

Goal of Error Handling

Goal of Secure Application Deployment

White List Input Validation

White List Regular Expression Examples

Client Side vs Server Side Validation

Positive Approach

Robust Use of Input Validation

Input Validation

Validating Rich User Content

Preventing XSS and Content Security Policy

Output Encoding

Preventing SQL Injection

Preventing OS Injection

Preventing XML Injection

File Uploads

Upload Verification

Upload Storage

Public Serving of Uploaded Content

Beware of "special" files

Upload Verification

Error Handling

Secure Deployment

Authors and Primary Editors

Other Cheatsheets

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Reference

Tools