OWASP - User contributions [en]

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T17:11:31Z

Harish s s: /* Basic Encoding */

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre><IMG SRC=javascript:alert(&quot ;XSS&quot ;)></pre>

<pre><IMG SRC=javascript:alert(&#34 ;XSS&#34 ;)> (Numeric reference)</pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T17:10:38Z

Harish s s: /* Basic Encoding */

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre><IMG SRC=javascript:alert(&quot ;XSS&quot ;)></pre>

<nowiki><IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T17:09:51Z

Harish s s: /* Basic Encoding */

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre><IMG SRC=javascript:alert("XSS")></nowiki></pre>

<nowiki><IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T17:05:44Z

Harish s s: /* Basic Encoding */

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<nowiki><IMG SRC=javascript:alert("XSS")></nowiki>

<nowiki><IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T17:01:33Z

Harish s s: /* Basic Encoding */

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre><IMG SRC=javascript:alert("XSS")></pre>

<pre><IMG SRC=javascript:alert("XSS")> (Numeric reference)</pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T17:00:40Z

Harish s s: /* Basic Encoding */

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre><IMG SRC=javascript:alert("XSS")></pre>

<nowiki><IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T16:52:10Z

Harish s s: /* Basic Encoding */

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre><IMG SRC=javascript:alert("XSS")></pre>

<pre><nowiki><IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki></pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T16:51:30Z

Harish s s: /* Basic Encoding */

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre><IMG SRC=javascript:alert("XSS")></pre>

<pre><IMG SRC=javascript:alert("XSS")> (Numeric reference)</pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T16:50:30Z

Harish s s:

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<nowiki><IMG SRC=javascript:alert("XSS")></nowiki>

<nowiki><IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html

http://www.cert.org/tech_tips/malicious_code_mitigation.html

http://www.w3schools.com/HTML/html_entities.asp

http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm

http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html

http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T16:44:58Z

Harish s s:

{{Template:OWASP Testing Guide v3}}

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8).

Character encoding has another use or rather misuse. It is being commonly used for encoding malicious injection strings in order to obfuscate and thus bypass input validation filters or take advantage of the browser’s functionality of rendering an encoding scheme.

== Input Encoding – Filter Evasion ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

In other words, an encoded injection works because even though an input filter might not recognize or filter an encoded attack, the browser correctly interprets it while rendering the web page.

== Output Encoding – Server & Browser Consensus ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre><nowiki>Content-Type: text/html; charset=UTF-8</nowiki></pre>

<nowiki> or through HTML META tag (“META HTTP-EQUIV”), as shown below:</nowiki>

<pre><nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki></pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.
Note: The content type mentioned in the HTTP header has precedence over the META tag declaration.

CERT describes it here as follows:

''Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters''

Therefore in the event of not receiving the character encoding information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page (server) and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected.

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki></pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre>
<nowiki><IMG SRC=javascript:alert("XSS")></nowiki></pre>

<pre><nowiki><IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki></pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

Hex, short for Hexadecimal, is a base 16 numbering system i.e it has 16 different values from 0 to 9 and A to F to represent various characters. Hex encoding is another form of obfuscation that is, sometimes, used to bypass input validation filters. For instance, hex encoded version of the string
<IMG SRC=javascript:alert('XSS')> is

<pre>
<nowiki><IMG SRC=%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%27%58%53%53%27%29></nowiki></pre>

A variation of the above string is given below. Can be used in case ‘%’ is being filtered:

<pre><nowiki><IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29></nowiki></pre>

There are other encoding schemes like Base64 and Octal as well that may be used for obfuscation. Although, every encoding scheme may not work every time, a bit of trial and error coupled with intelligent manipulations would definitely reveal the loophole in a weakly built input validation filter.

==== UTF-7 Encoding ====

UTF-7 encoding of <SCRIPT>alert(‘XSS’);</SCRIPT> is as below

<pre><nowiki>+ADw-SCRIPT+AD4-alert('XSS');+ADw-/SCRIPT+AD4-</nowiki></pre>

For the above script to work, the browser has to interpret the web page as encoded in UTF-7.

==== Multi-byte Encoding ====

Variable-width encoding is another type of character encoding scheme that uses codes of varying lengths to encode characters. Multi-Byte Encoding is a type of variable-width encoding that uses varying number of bytes to represent a character.
Multibyte encoding is primarily used to encode characters that belong to a large character set e.g. Chinese, Japanese and Korean.

Multibyte encoding has been used in the past to bypass standard input validation functions and carry out cross site scripting and sql injection attacks.

== References ==
http://ha.ckers.org/xss.html
http://www.cert.org/tech_tips/malicious_code_mitigation.html
http://www.w3schools.com/HTML/html_entities.asp
http://www.iss.net/security_center/advice/Intrusions/2000639/default.htm
http://searchsecurity.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid14_gci1212217_tax299989,00.html
http://www.joelonsoftware.com/articles/Unicode.html

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T14:00:37Z

Harish s s: /* Output Encoding – Server & Browser Consensus */

OWASP Testing Guide Appendix D: Encoded Injection

2008-08-27T13:41:08Z

Harish s s:

OWASP Testing Guide v3 Table of Contents

2008-08-27T04:34:03Z

Harish s s:

__NOTOC__

This is the draft of table of content of the New Testing Guide.
You can download the stable version [http://www.owasp.org/index.php/Image:OWASP_Testing_Guide_v2_pdf.zip here]

Back to the OWASP Testing Guide Project:
http://www.owasp.org/index.php/OWASP_Testing_Project

Testing Guide v3 (draft)
Updated: 25th August 2008
(new)--> new articles, (toimp)--> needs to improve or to review, (xxx%) --> current state of the article

'''T A B L E o f C O N T E N T S'''
----

==[[Testing Guide Foreword|(toimp)Foreword by OWASP Chair]]==

==[[Testing Guide Frontispiece |(toimp: M.Meucci)1. Frontispiece]]==

'''[[Testing Guide Frontispiece|(toimp: M.Meucci)1.1 About the OWASP Testing Guide Project]]'''

'''[[About The Open Web Application Security Project|1.2 About The Open Web Application Security Project]]'''

==[[Testing Guide Introduction|2. Introduction]]==

'''2.1 The OWASP Testing Project'''

'''2.2 Principles of Testing'''

'''2.3 Testing Techniques Explained'''

(new: M. Morana 100%) 2.4 [https://www.owasp.org/index.php/Testing_Guide_Introduction#Security_Requirements_Test_Derivation Security requirements test derivation],[https://www.owasp.org/index.php/Testing_Guide_Introduction#Functional_and_Non_Functional_Test_Requirements functional and non functional test requirements], and [https://www.owasp.org/index.php/Testing_Guide_Introduction#Test_Cases_Through_Use_and_Misuse_Cases test cases through use and misuse cases]

(new: M. Morana 100%) 2.4.1 [https://www.owasp.org/index.php/Testing_Guide_Introduction#Security_Tests_Integrated_in_Developers_and_Testers_Workflow Security tests integrated in developers and testers workflows]

(new: M. Morana 100%) 2.4.2 [https://www.owasp.org/index.php/Testing_Guide_Introduction#Developers.27_Security_Tests Developers' security tests: unit tests and component level tests]

(new: M. Morana 100%) 2.4.3 [https://www.owasp.org/index.php/Testing_Guide_Introduction#Functional_Testers.27_Security_Tests Functional testers' security tests: integrated system tests, tests in UAT, and production environment]

(new: M. Morana 100%) 2.5 [https://www.owasp.org/index.php/Testing_Guide_Introduction#Security_Test_Data_Analysis_and_Reporting Security test data analysis and reporting: root cause identification and business/role case test data reporting]

==[[The OWASP Testing Framework|3. The OWASP Testing Framework]]==

'''3.1. Overview'''

'''3.2. Phase 1: Before Development Begins '''

'''3.3. Phase 2: During Definition and Design'''

'''3.4. Phase 3: During Development'''

'''3.5. Phase 4: During Deployment'''

'''3.6. Phase 5: Maintenance and Operations'''

'''3.7. A Typical SDLC Testing Workflow '''

==[[Web Application Penetration Testing |4. (M.Meucci) Web Application Penetration Testing ]]==

[[Testing: Introduction and objectives|'''4.1 Introduction and Objectives''']]

[[Testing Checklist| (new: M.Meucci - 100% ) 4.1.1 Testing Checklist]]

[[Testing: Information Gathering|'''4.2 Information Gathering''']]

[[Testing: Spiders Robots and Crawlers|(C.Heinrich)4.2.1 Spiders, Robots and Crawlers]]

[[Testing: Search engine discovery|(C.Heinrich)4.2.2 Search Engine Discovery/Reconnaissance]]

[[Testing: Identify application entry points| (new: K.Horvath - 100%) 4.2.3 Identify application entry points]]

[[Testing for Web Application Fingerprint|4.2.4 Testing for Web Application Fingerprint]]

[[Testing for Application Discovery|4.2.5 Application Discovery]]

[[Testing for Error Code|4.2.6 Analysis of Error Codes]]

[[Testing for configuration management|''' (new) 4.3 Configuration Management Testing''']]

[[Testing for SSL-TLS| 4.3.1 SSL/TLS Testing (SSL Version, Algorithms, Key length, Digital Cert. Validity]]<br>

[[Testing for DB Listener|4.3.2 DB Listener Testing]]

[[Testing for infrastructure configuration management| (new) 4.3.3 Infrastructure Configuration Management Testing]]

[[Testing for application configuration management|4.3.4 Application Configuration Management Testing]]

[[Testing for file extensions handling|4.3.5 Testing for File Extensions Handling]]

[[Testing for old_file|4.3.6 Old, Backup and Unreferenced Files]]

[[Testing for Admin Interfaces|(imp: A. Goodman - 100%) 4.3.7 Infrastructure and Application Admin Interfaces]]

[[Testing for HTTP Methods and XST| (imp: A. van der Stock - 100%)4.3.8 Testing for HTTP Methods and XST]]

[[Testing for business logic|'''(K.Horvath - 100%) 4.4 Business Logic Testing''']]

[[Testing for authentication|'''(M.Meucci - 100%) 4.5 Authentication Testing''']]

[[Testing for credentials transport|(new: G.Ingrosso - 100%) 4.5.1 Credentials transport over an encrypted channel]]

[[Testing for user enumeration|(new: M.Meucci, M.Mella - 90%) 4.5.2 Testing for user enumeration]]

[[Testing for Default or Guessable User Account|(K.Horvath - 100% - adam updated) 4.5.3 Testing for Guessable (Dictionary) User Account]]

[[Testing for Brute Force|4.5.4 Brute Force Testing]]

[[Testing for Bypassing Authentication Schema|4.5.5 Testing for bypassing authentication schema]]

[[Testing for Vulnerable Remember Password and Pwd Reset|4.5.6 Testing for vulnerable remember
password and pwd reset]]

[[Testing for Logout and Browser Cache Management|4.5.7 Testing for Logout and Browser Cache Management Testing]]

[[Testing for Captcha|(new: P.Luptak - 100% ) 4.5.8 Testing for CAPTCHA]]

[[Testing Multiple Factors Authentication| (new: G.Fedon - 100%) 4.5.9 Testing Multiple Factors Authentication]]

[[Testing for Race Conditions| (new: A. Goodman - 100%) 4.5.10 Testing for Race Conditions]]

[[Testing for Authorization|'''(new: M.Meucci - 100%) 4.6 Authorization testing''']]

[[Testing for Path Traversal|(new) 4.6.1 Testing for path traversal]]

[[Testing for Bypassing Authorization Schema|(new: M.Meucci - 100%)4.6.2 Testing for bypassing authorization schema]]

[[Testing for Privilege escalation|(new: Cecil Su, M.Meucci - 100%)4.6.3 Testing for Privilege Escalation]]

[[Testing for Session Management|'''4.7 Session Management Testing''']]

[[Testing for Session_Management_Schema|(new: M.Meucci - 100%) 4.7.1 Testing for Session Management Schema]]

[[Testing for cookies attributes| (new: K.Horvath - 100%) 4.7.2 Testing for Cookies attributes]]

[[Testing for Session Fixation| (M.Meucci - 100% (updated by adam)) 4.7.3 Testing for Session Fixation]]

[[Testing for Exposed Session Variables|4.7.4 Testing for Exposed Session Variables ]]

[[Testing for CSRF|4.7.5 Testing for CSRF]]

[[Testing for HTTP Exploit|4.7.6 Testing for HTTP Exploit ]]

[[Testing for Data Validation|'''4.8 Data Validation Testing''']]

[[Testing for Reflected Cross site scripting|(new: A. Coronel -100%)4.8.1 Testing for Reflected Cross Site Scripting]]

[[Testing for Stored Cross site scripting|(new: R. Suggi Liverani - 100%)4.8.2 Testing for Stored Cross Site Scripting]]

[[Testing for DOM-based Cross site scripting|(new: A.Agarwwal, Kuza55 - 80%) 4.8.3 Testing for DOM based Cross Site Scripting]]

[[Testing for Cross site flashing|(new: A.Agarwwal, S.Di Paola - 0%)4.8.4 Testing for Cross Site Flashing]]

[[Testing for SQL Injection| 4.8.5 Testing for SQL Injection ]]

[[Testing for Oracle|4.8.5.1 Oracle Testing ]]

[[Testing for MySQL|4.8.5.2 MySQL Testing ]]

[[Testing for SQL Server|4.8.5.3 SQL Server Testing]]

[[Testing for MS Access|(new:A.Parata - 100%) 4.8.5.4 MS Access Testing]]

[[OWASP_Backend_Security_Project_Testing_PostgreSQL|4.8.5.5 (new: D.Bellucci 100% from OWASP BSP) Testing PostgreSQL]]

[[Testing for LDAP Injection|4.8.6 Testing for LDAP Injection]]

[[Testing for ORM Injection|4.8.7 Testing for ORM Injection]]

[[Testing for XML Injection|4.8.8 Testing for XML Injection]]

[[Testing for SSI Injection|4.8.9 Testing for SSI Injection]]

[[Testing for XPath Injection|4.8.10 Testing for XPath Injection]]

[[Testing for IMAP/SMTP Injection|4.8.11 IMAP/SMTP Injection]]

[[Testing for Code Injection|4.8.12 Testing for Code Injection]]

[[Testing for Command Injection|4.8.13 Testing for Command Injection]]

[[Testing for Buffer Overflow|4.8.14 Testing for Buffer overflow]]

[[Testing for Heap Overflow|4.8.14.1 Testing for Heap overflow]]

[[Testing for Stack Overflow|4.8.14.2 Testing for Stack overflow]]

[[Testing for Format String|4.8.14.3 Testing for Format string]]

[[Testing for Incubated Vulnerability|4.8.15 Testing for incubated vulnerabilities]]

[[Testing for Denial of Service|'''4.9 Testing for Denial of Service''']]

[[Testing for SQL Wildcard Attacks|(new: F.Mavituna - 100%) 4.9.1 Testing for SQL Wildcard Attacks]]

[[Testing for DoS Locking Customer Accounts|4.9.2 Testing for DoS Locking Customer Accounts]]

[[Testing for DoS Buffer Overflows|4.9.3 Testing for DoS Buffer Overflows]]

[[Testing for DoS User Specified Object Allocation|4.9.4 Testing for DoS User Specified Object Allocation]]

[[Testing for User Input as a Loop Counter|4.9.5 Testing for User Input as a Loop Counter]]

[[Testing for Writing User Provided Data to Disk|4.9.6 Testing for Writing User Provided Data to Disk]]

[[Testing for DoS Failure to Release Resources|4.9.7 Testing for DoS Failure to Release Resources]]

[[Testing for Storing too Much Data in Session|4.9.8 Testing for Storing too Much Data in Session]]

[[Testing for Web Services|(toimp: M.Meucci -100%) '''4.10 Web Services Testing''']]

[[Testing: WS Information Gathering|(new: M.Meucci -100%) 4.10.1 WS Information Gathering]]

[[Testing WSDL|(new: M.Meucci -100%) 4.10.2 Testing WSDL]]

[[Testing for XML Structural|(toimp: M.Meucci -100%)4.10.3 XML Structural Testing ]]

[[Testing for XML Content-Level|4.10.4 XML Content-level Testing ]]

[[Testing for WS HTTP GET parameters/REST attacks|4.10.5 HTTP GET parameters/REST Testing ]]

[[Testing for Naughty SOAP Attachments|4.10.6 Naughty SOAP attachments ]]

[[Testing for WS Replay|4.10.7 Replay Testing ]]

[[Testing_for_AJAX:_introduction|'''4.11 AJAX Testing''']]

[[Testing for AJAX Vulnerabilities|4.11.1 AJAX Vulnerabilities]]

[[Testing for AJAX|4.11.2 How to test AJAX]]

==[[Writing Reports: value the real risk |(toimp: Mat)5. Writing Reports: value the real risk ]]==

[[How to value the real risk |5.1 How to value the real risk]]

[[How to write the report of the testing |5.2 How to write the report of the testing]]

==[[Appendix A: Testing Tools |Appendix A: Testing Tools ]]==

* Black Box Testing Tools
* Source Code Analyzers
* Other Tools

==[[OWASP Testing Guide Appendix B: Suggested Reading | Appendix B: Suggested Reading]]==
* Whitepapers
* Books
* Useful Websites

==[[OWASP Testing Guide Appendix C: Fuzz Vectors | Appendix C: Fuzz Vectors]]==

* Fuzz Categories
** Recursive fuzzing
** Replasive fuzzing
* Cross Site Scripting (XSS)
* Buffer Overflows and Format String Errors
** Buffer Overflows (BFO)
** Format String Errors (FSE)
** Integer Overflows (INT)
* SQL Injection
** Passive SQL Injection (SQP)
** Active SQL Injection (SQI)
* LDAP Injection
* XPATH Injection

==[[OWASP Testing Guide Appendix D: Encoded Injection | (new: Harish Sureddy.)Appendix D: Encoded Injection]]==

----

[[Category:OWASP Testing Project]]

OWASP Testing Guide Appendix D: Encoded Injection

2008-06-28T12:15:53Z

Harish s s: /* References */

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8)

Character encoding can be used in different ways to attack a web application:

== Case 1 ==

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

== Case 2 ==

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre>
<nowiki>Content-Type: text/html; charset=UTF-8</nowiki>
</pre>

or through HTML META tag (“META HTTP-EQUIV”), as shown below:

<pre>
<nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki>
</pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.

However, in the event of not receiving this information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected. This behavior of the browser is sometimes exploited to bypass output encoding mechanisms i.e HTML entities encoding (< for ‘<’, > for ‘>’).

== Various Encodings ==

Consider the following scenarios that better illustrate the idea of filter bypass using character encoding

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki>
</pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre>
<nowiki><IMG SRC=javascript:alert("XSS")>

<IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>
</pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

==== US-ASCII Encoding ====

==== UTF 8 Encoding ====

==== Multi-byte Encoding ====

== References ==

OWASP Testing Guide Appendix D: Encoded Injection

2008-06-28T12:15:24Z

Harish s s: /* Case 1 */

OWASP Testing Guide Appendix D: Encoded Injection

2008-06-28T12:13:55Z

Harish s s: /* Case 1 */

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8)

Character encoding can be used in different ways to attack a web application:

=== Case 1 ===

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

==== Case 2 ====

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre>
<nowiki>Content-Type: text/html; charset=UTF-8</nowiki>
</pre>

or through HTML META tag (“META HTTP-EQUIV”), as shown below:

<pre>
<nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki>
</pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.

However, in the event of not receiving this information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected. This behavior of the browser is sometimes exploited to bypass output encoding mechanisms i.e HTML entities encoding (< for ‘<’, > for ‘>’).

Consider the following scenarios that better illustrate the idea of filter bypass using character encoding

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki>
</pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre>
<nowiki><IMG SRC=javascript:alert("XSS")>

<IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>
</pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

==== US-ASCII Encoding ====

==== UTF 8 Encoding ====

==== Multi-byte Encoding ====

==== References ====

OWASP Testing Guide Appendix D: Encoded Injection

2008-06-28T12:13:11Z

Harish s s: /* Background */

== Background ==

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8)

Character encoding can be used in different ways to attack a web application:

==== Case 1 ====

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

==== Case 2 ====

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre>
<nowiki>Content-Type: text/html; charset=UTF-8</nowiki>
</pre>

or through HTML META tag (“META HTTP-EQUIV”), as shown below:

<pre>
<nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki>
</pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.

However, in the event of not receiving this information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected. This behavior of the browser is sometimes exploited to bypass output encoding mechanisms i.e HTML entities encoding (< for ‘<’, > for ‘>’).

Consider the following scenarios that better illustrate the idea of filter bypass using character encoding

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki>
</pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre>
<nowiki><IMG SRC=javascript:alert("XSS")>

<IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>
</pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

==== US-ASCII Encoding ====

==== UTF 8 Encoding ====

==== Multi-byte Encoding ====

==== References ====

OWASP Testing Guide Appendix D: Encoded Injection

2008-06-28T12:08:36Z

Harish s s:

=== Background ===

Character Encoding is primarily used to represent characters, numbers and other symbols in a format that is suitable for a computer to understand, store, and render data. It is, in simple terms, the conversion of bytes into characters - characters belonging to different languages like English, Chinese, Greek or any other known language. A common and one of the early character encoding schemes is ASCII (American Standard Code for Information Interchange) that initially, used 7 bit coded characters. Today, the most common encoding scheme used is Unicode (UTF 8)

Character encoding can be used in different ways to attack a web application:

==== Case 1 ====

Web applications usually employ different types of input filtering mechanisms to limit the input that can be submitted by its users. If these input filters are not implemented sufficiently well, it is possible to slip a character or two through these filters. For instance, a / can be represented as 2F (hex) in ASCII, while the same character (/) is encoded as C0 AF in Unicode (2 byte sequence). Therefore, it is important for the input filtering control to be aware of the encoding scheme used. If the filter is found to be detecting UTF 8 encoded injections a different encoding scheme may be employed to bypass the filter.

==== Case 2 ====

Web browsers, in order to coherently display a web page, are required to be aware of the encoding scheme used. Ideally, this information should be provided to the browser through HTTP headers (“Content-Type”) as shown below:

<pre>
<nowiki>Content-Type: text/html; charset=UTF-8</nowiki>
</pre>

or through HTML META tag (“META HTTP-EQUIV”), as shown below:

<pre>
<nowiki><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></nowiki>
</pre>

It is through these character encoding declarations that the browser understands which set of characters to use when converting bytes to characters.

However, in the event of not receiving this information from the server, the browser either attempts to ‘guess’ the encoding scheme or reverts to a default scheme. In some cases, the user explicitly sets the default encoding in the browser to a different scheme. Any such mismatch in the encoding scheme used by the web page and the browser may cause the browser to interpret the page in a manner that is unintended or unexpected. This behavior of the browser is sometimes exploited to bypass output encoding mechanisms i.e HTML entities encoding (< for ‘<’, > for ‘>’).

Consider the following scenarios that better illustrate the idea of filter bypass using character encoding

==== Encoded Injections ====

All the scenarios given below form only a subset of the various ways obfuscation can be achieved in order to bypass input filters. Also, the success of encoded injections depends on the browser in use. For e.g US-ASCII encoded injections were previously successful only in IE browser but not in Firefox. Therefore, it may be noted that encoded injections, to a large extent, are browser dependent.

==== Basic Encoding ====

Consider a basic input validation filter that protects against injection of single quote character. In this case the following injection would easily bypass this filter:

<pre>
<nowiki><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT></nowiki>
</pre>

String.fromCharCode Javascript function takes the given Unicode values and returns the corresponding string. This is one of the most basic forms of encoded injections. Another vector that can be used to bypass this filter is:

<pre>
<nowiki><IMG SRC=javascript:alert("XSS")>

<IMG SRC=javascript:alert("XSS")> (Numeric reference)</nowiki>
</pre>

The above uses HTML Entities to construct the injection string. HTML Entities encoding is used to display characters that have a special meaning in HTML. For instance, ‘>’ works as a closing bracket for a HTML tag. In order to actually display this character on the web page HTML character entities should be inserted in the page source. The injections mentioned above are one way of encoding. There are numerous other ways in which a string can be encoded (obfuscated) in order to bypass the above filter.

==== Hex Encoding ====

==== US-ASCII Encoding ====

==== UTF 8 Encoding ====

==== Multi-byte Encoding ====

==== References ====

OWASP Testing Guide Appendix D: Encoded Injection

2008-05-30T06:28:32Z

Harish s s: New page: Character Encoding

Character Encoding