This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

XSS (Cross Site Scripting) Prevention Cheat Sheet

From OWASP
Revision as of 00:44, 20 January 2009 by Jeff Williams (talk | contribs) (Introduction)

Jump to: navigation, search

Introduction

This article provides a simple positive model for preventing XSS using output escaping/encoding properly. While there are a huge number of XSS attack vectors, following a few simple rules can completely defend against this serious attack.

These rules apply to all the different varieties of XSS. Both reflected and stored XSS can be addressed by performing the appropriate escaping on the server-side. The use of an escaping/encoding library like the one in ESAPI is strongly recommended as there are many special cases. DOM Based XSS can be addressed by applying these rules on the client on untrusted data.

For a great cheatsheet on the attack vectors related to XSS, please refer to the excellent XSS Cheat Sheet by RSnake. More background on browser security and the various browsers can be found in the Browser Security Handbook.

Model

This article treats an HTML page like a template, with slots where a developer is allowed to put untrusted data. These slots cover the vast majority (99%?) of the common places where a developer might want to put untrusted data. Putting untrusted data in other places in the HTML is not allowed. This is a "whitelist" model, that denies everything that is not specifically allowed.

Because of the way browsers parse HTML, each of the different types of slots has slightly different security rules. When you put untrusted data into these slots, you need to take certain steps to make sure that the data does not "escape" that slot and break into a context that allows code execution. In a way, this approach treats an HTML document like a parameterized database query - the data is kept separate from the code.

Untrusted data is most often data that comes from the HTTP request, in the form of URL parameters, form fields, headers, or cookies. But data that comes from databases, web services, and other sources is often frequently untrusted from a security perspective. That is, it might not have been perfectly validated. Therefore, it is best to always escape/encode this data to make sure it can't be used to convey an attack. There is no harm in escaping data - it will still render in the browser properly. Escaping merely prevents attacks from working.

This document sets out the most common types of slots and the rules for putting untrusted data into them safely. Based on the various specifications, known XSS vectors, and a great deal of manual testing with all the popular browsers, we have determined that the rule proposed here are safe.

The slots are defined and a few examples of each are provided. Developers SHOULD NOT put data into any other slots without a very careful analysis to ensure that what they are doing is safe. Browser parsing is extremely tricky and many innocuous looking characters can be significant in the right context.

Rules

RULE #0 - Never Insert Untrusted Data Except in Allowed Locations

The first rule is to deny all - don't put untrusted data into your HTML document unless it is within one of the slots defined below. The reason for this rule is that there are so many strange contexts within HTML that the list of escaping rules gets very complicated. There’s no good reason to put untrusted data in these contexts.

 <script>...NO UNTRUSTED HERE...</script>   directly in a script
 
 <!--...NO UNTRUSTED HERE...-->             inside an HTML comment
 
 <div ...NO UNTRUSTED HERE...=test />       in an attribute name
 
 <...NO UNTRUSTED HERE... href="/test" />   in a tag name

Most importantly, never accept actual JavaScript code from an untrusted source and then run it. For example, a parameter named "callback" that contains a JavaScript code snippet. No amount of escaping can fix that.


RULE #1 - Escape Before Inserting into HTML Element Content

Rule #1 is for when you want to put untrusted data directly into the HTML body somewhere. This includes inside normal tags like div, p, b, td, etc...

 <body>...UNTRUSTED HERE...</body>
 
 <div>...UNTRUSTED HERE...</div>
 
 any other normal HTML elements

Escape the following characters with HTML entity encoding to prevent switching into any execution context, such as script, style, or event handlers. Using hex entities is recommended in the spec. In addition to the 5 characters significant in XML, the forward slash is included as it helps to end an HTML entity.

 &amp;
 &lt;
 &gt;
 &quot;
 &apos;
 &#x2F

See the ESAPI reference implementation of HTML entity escaping and unescaping.


RULE #2 - Escape Before Inserting into HTML Common Attributes

Rule #2 is for putting untrusted data into typical attribute values like width, name, value, etc... It is extremely important that event handler attributes like onmouseover should use Rule #3 for HTML JavaScript Data Values.

 <div attr=...UNTRUSTED HERE...>content</div>     inside UNquoted attribute
 
 <div attr='...UNTRUSTED HERE...'>content</div>   inside single quoted attribute
 
 <div attr="...UNTRUSTED HERE...">content</div>   inside double quoted attribute

Escape all characters less than 256 except alphanumeric characters with the &#xHH; format (or a named entity in available) to prevent switching out of the attribute. The reason this rule is so broad is that developers frequently leave attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters including space % * + , - / ; < = > ^ | could break out.

See the ESAPI reference implementation of HTML entity escaping and unescaping.


RULE #3 - Escape Before Inserting into HTML JavaScript Data Values

Rule #3 concerns the JavaScript event handlers that are specified on various HTML elements. The only safe place to put untrusted data into these event handlers is into a "data value." Including untrusted data inside these little code blocks is quite dangerous, as it is very easy to switch into an execution context, so use with caution.

 <script>alert('...UNTRUSTED HERE...')</script>     inside a quoted string
 
 <script>x=...UNTRUSTED HERE...</script>            one side of an expression
 
 <div onmouseover=...UNTRUSTED HERE...</div>        inside UNquoted event handler
 
 <div onmouseover='...UNTRUSTED HERE...'</div>      inside quoted event handler
 
 <div onmouseover="...UNTRUSTED HERE..."</div>      inside quoted event handler

Escape all characters less than 256 except alphanumeric characters with the \xHH format to prevent switching out of the data value into the script context or into another attribute. Do not use any escaping shortcuts like \" because the quote character may be matched by the HTML attribute parser which runs first. If an event handler is quoted, breaking out requires the corresponding quote. The reason this rule is so broad is that developers frequently leave event handler attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters including space % * + , - / ; < = > ^ | could break out. Also, </script> tag is also likely to close the script block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser.

See the ESAPI reference implementation of JavaScript escaping and unescaping.



RULE #4 - Escape Before Inserting into HTML Style Property Values

Rule #4 is for when you want to put untrusted data into a stylesheet or a style tag. CSS is surprisingly powerful, and can be used for numerous attacks. Therefore, it's important that you only use untrusted data in a property value and not into other places in style data. You should stay away from putting untrusted data into complex properties like url, behavior, and custom (-moz-binding). You should also not put untrusted data into IE’s expression property value which allows JavaScript.

 <style>selector { property : ...UNTRUSTED HERE...; } </style>     property value
 
 <span style=property : ...UNTRUSTED HERE...;>text</style>         property value

Use \HH for all characters less than 256 except alphanumeric. Do not use any escaping shortcuts like \" because the quote character may be matched by the HTML attribute parser which runs first. Prevent switching out of the property value and into another property or attribute. Also prevent switching into an expression or other property value that allows scripting. If attribute is quoted, breaking out requires the corresponding quote. All attributes should be quoted. Unquoted attributes can be broken out of with many characters including space % * + , - / ; < = > ^ | could break out. Also, the </style> tag is also likely to close the style block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser.

See the ESAPI reference implementation of CSS escaping.


RULE #5 - Escape Before Inserting into HTML URL Attributes

Rule #5 is for when you want to put untrusted data into a link to another location. This includes href and src attributes. There are a few other location attributes, but we recommend against using untrusted data in them. One important note is that using untrusted data in javascript: urls is a very bad idea, but you could possibly use the HTML JavaScript Data Value rule above.

 <a href=http://...UNTRUSTEDHERE...>link</a >         a normal link
 
 <img src='http://...UNTRUSTED HERE...' />            an image source
 
 <script src="http://...UNTRUSTED HERE..." />         a script source

Use %HH for all characters less than 256 except alphanumeric. Including untrusted data in data: urls should not be allowed as there is no good way to disable attacks with encoding to prevent switching out of the url. All attributes should be quoted. Unquoted attributes can be broken out of with many characters including space % * + , - / ; < = > ^ | could break out. Note that entity encoding is useless in this context.

See the ESAPI reference implementation of URL escaping and unescaping.

Encoding Information

Coming soon...