XML Security Cheat Sheet
- 1 Introduction
- 2 Malformed XML Documents
- 3 Authors and Primary Editors
Specifications for XML and XML schemas include multiple security flaws. At the same time, these specifications provide the tools required to protect XML applications. This provides a complex scenario for developers, and a fun environment for hackers. Even though we use XML schemas to define the security of XML documents, they can be used to perform a variety of attacks: file retrieval, server side request forgery, port scanning, or brute forcing. This talk will analyze how to infer new attack vectors by analyzing the current vulnerabilities, and how it is possible to affect common libraries and software. This cheatsheet will also provide recommendations for safe deployment of applications relying on XML.
Malformed XML Documents
The W3C XML specification defines a set of principles that XML documents must follow to be considered well formed. When a document violates any of these principles, it must be considered a fatal error and the data it contains is considered malformed. Multiple tactics will cause a malformed document: removing an ending tag, rearranging the order of elements into a nonsensical structure, introducing forbidden characters, and so on. The XML parser should stop execution once detecting a fatal error. The document shouldn’t undergo any additional processing, and the application should display an error message.
More Time Required
A malformed document may affect the consumption of Central Processing Unit (CPU) resources. In certain scenarios, the amount of time required to process malformed documents may be greater than that required for well-formed documents. When this happens, an attacker may exploit an asymmetric resource consumption attack to take advantage of the greater processing time to cause a Denial of Service (DoS). The following variables should be analyzed when exploring this behavior:
- Parser inner workings: Each parser has its own particularities, which may make them more or less susceptible to malformed documents, thus requiring more time.
- Document size: Processing a large well-formed document requires more time than doing the same for a smaller well-formed document. If the parser is susceptible, this also applies to malformed documents.
- Parser limitation: Parsers may be limited to processing no more than a certain amount of certain data types. Maximum limits for elements, attributes, or entities may be set by default or by the developers. For example, the Java API for XML processing (JAXP) limits each element to no more than 10,000 attributes3.
- Architecture: The amount of computational resources available to the XML parser.
Apache Xerces-J XML may serve as an example for this type of vulnerability; in this case, malformed data caused the XML parser "...to consume CPU resource for several minutes before the data [was] eventually rejected. This behavior can be used to launch a denial of service attack against any Java server application, which processes XML data supplied by remote users.". An attacker could use this vulnerability in conjunction with an XML flood attack using multiple documents.
To avoid this attack, you must confirm that your version of the XML processor does not take additional time to process malformed documents.
Applications Processing Malformed Data
Certain XML parsers have the ability to recover malformed documents. They can be instructed to try their best to return a valid tree with all the content that they can manage to parse, regardless of the document’s noncompliance with the specifications. Since there are no predefined rules for the recovery process, the approach and results may not always be the same. Using malformed documents might lead to unexpected issues related to data integrity.
The following three scenarios illustrate attack vectors a parser will analyze in recovery mode:
Malformed Document to Malformed Document Containing Unexpected Characters
According to the XML specification, the string -- (double-hyphen) must not occur within comments. Using the recovery mode of lxml and PHP, the following document will remain the same after being recovered:
<element> <!-- one <!-- another comment comment --> </element>
Well-Formed Document to Well-Formed Document using Normalization
Certain parsers may consider normalizing the contents of your CDATA6 sections. This means that they will update the special characters contained in the CDATA section to contain the safe versions of these characters even though is not required:
<element> <![CDATA[<script>a=1;</script>]]> </element>
Normalization of a CDATA section is not a common rule among parsers. Libxml could transform this document to its canonical version, but although well formed, its contents may be considered malformed depending on the situation:
<element> <script>a=1;</script> </element>
Malformed Document to Well-Formed Document Including Content Modification
The contents of certain malformed documents could be altered after being recovered. Consider the scenario where a book is on sale unless the value of its "onsale" element is no:
<book> <item>ABC101</item> <value>10</value> <onsale&>no</onsale> <onsalevalue>5</onsalevalue> </book>
The previous onsale element contains the & character, which is not supposed to be there. The resulting value of that element may be different after document recovery:
<book> <item>ABC101</item> <value>10</value> <onsale/> >no <onsalevalue>5</onsalevalue> </book>