Extensible Markup Language (XML) files are plain-text files that describe data behavior as that data relates to a connected network or server application. If you open an XML file, you’ll see code describing how that file’s particular data is transported, structured and stored. You’ll also notice that XML information is wrapped in tags that look much like HTML tags. However, unlike HTML, XML files don’t intrinsically accomplish anything. XML files essentially act as data carriers parsing information to some external entity. That is, the components of the XML file are broken down by an internal program that analyzes those components, sending them into your computer’s memory so that those components are available for use in an application or on a network immediately or at a later date. However, if that XML file is weakly parsed to that external entity by that internal program, your data can become vulnerable to an XML External Entities (XXE) attack.
External entities attacks can cause denial of service, file scans and remote code execution that undermine the security of your computer system. Understanding the relationship between XML files, parsing, and weak parsing is imperative to understanding what an XXE attack is and why such an attack can put your company at risk.
First, to parse means to break down components so that they can be comprehended in a meaningful way. If you’ve ever diagrammed a sentence, you have parsed the components of the sentence so that you can better understand the sentence’s structure. It’s the same principle if you are parsing XML code. Today’s computers have standard parsing interfaces built-in, such as the Document Object Model (DOM), which parses XML files into flowchart-like structures or trees. This parsed information can be utilized by applications and networks independent of programming language and no matter what platform is being used. That’s the beauty of modern XML. However, that beauty can be a double-edged sword.
Take this simple XML file, for instance. It has three components:
John Smith
software engineer
January 7
The DOM (or whatever processor your particular system uses to parse the XML code) identifies the purpose of each of those components (name, occupation, birthday) and puts them into the computer’s memory where the information is readily available to external entities. An external entity can be an application on a network that needs to pull that information for a specific purpose, such as an employee birthday list.
But what if your company hasn’t updated its system in years or what if it has only updated parts of its system? Examples might include a non-profit working with donated equipment or an older business that hasn’t had the need, staff or funding to update part or all of its computer system. These types of legacy computer systems often have older XML processors that point to specified entities like Uniform Resource Identifiers (URIs).
Much like URLs identify addresses, URIs identify resources over server applications or networks. If you have an XML file that is parsed to a URI that has been injected with malevolent data, then that information reaches your XML files’ components. That URI’s malicious data infiltrates your computer system. This can cause your company’s files to be reconfigured, scanned, or extracted. For instance, the attacker can add code that reroutes your XML file to a URI that sends it to another external entity that changes the XML tags to something reflecting an end result that is definitely not an employee birthday list.
Often you don’t know if your files are vulnerable or compromised until an issue comes up that alerts you, like an XML upload from an untrusted source creating probed data or a denial of service. As a result, developers and other manual testers don’t routinely test for XXE attacks. Penetration testing hones in on your system’s path of least resistance by manually injecting malicious URIs, tags, and other data into your XML files to exploit them. It’s easy to do because all modern computers use XML files that can be opened with any plain or dedicated text editor, read and edited. This is where XML’s beauty becomes that double-edged sword.
XML tags aren’t predetermined like HTML tags are. The XML author “invents” tags that define the file’s structure which are to be used for applications. HTML tags are visual. You would think that since the XML tag structure is so individualized, no external entity could attack it. However, since it is quite easy to inject new tags (code) into your XML files, XXE attacks can infiltrate your files even if those files are deeply embedded or nested into your source code.
Prevention is Key to Resolving XXE Attacks Before They Happen
There are several ways to prevent XXE attacks:
Sensitive data is an easy target to attack when it is serialized. Keep it simple and use special permissions when it cannot be unserialized.
Patch or upgrade XML processors in legacy systems
“Whitelist” server-side input by using specific criteria that will validate it
Update web firewalls to keep untrusted networks to protect server-side or Web services
Validate the functionality of XML data: make sure that it is well-structured
The first line of defense against XXE attacks is to formally train developers about what they are, how to recognize potential vulnerabilities so that they know how to minimize or negate the impact on your business. You don’t want attacks like a billion laughs to have the last laugh on you.