As new technologies emerge and become well established so do threats against those technologies. Blind SQL injection attacks are a well know and recognized form of code injection attack, but there are many other forms, some not so well documented or understood. An emerging code injection attack is the XPath injection attack, which takes advantage of the loose typing and forgiving nature of XPath parsers to allow malcontents to piggyback malicious XPath queries on URLs, forms, or other methods to gain access to privileged information and change it.
This article looks at how XPath attacks are usually carried out and provides an example in Java™ and XML environments. It discusses how to detect such threats, looks at what you can do to mitigate the threat, and finally discusses what you can do in response to a suspected penetration.
The focus of this article is a specific type of code injection attack: the Blind XPath injection. If you are not familiar with XPath 1.0 or need a primer, look at the W3 Schools XPath tutorial (see Resources for a link). Also, you can find a number of articles on working with XPath in a variety of languages on developerWorks (see Resources for links). This article will use examples focused on XPath 1.0 but will also work with XPath 2.0. XPath 2.0 actually expands the possible issues facing you.
This article also supplies a Java code example that is developed to work with the Java JDK 5.0. While the concepts and topics in this article are cross platform, if your application uses XPath to get the specific code sample, you will have to use the JDK 5.0.
One of the more common attacks or threats to Web applications is some form of code injection, which Wikipedia defines as:
... a technique to introduce (or "inject") code into a computer program or system by taking advantage of the unenforced and unchecked assumptions the system makes about its inputs. The purpose of the injected code is typically to bypass or modify the originally intended functionality of the program. When the functionality bypassed is system security, the results can be disastrous.
Any quick perusal of Web sites such as the Web Application Security Consortium or Security Focus (see Resources for links) will show a multitude of attacks using some form of code injection from JavaScript to SQL injection to other forms of code injection attacks. An emerging threat, first outlined by Amit Klein in a paper in 2004, is the blind XPath injection attack (see Resources). This attack functions almost exactly like the blind SQL injection attack but, unlike SQL injection attacks, few people know about XPath injection attacks or take precautions against them. Like the SQL injection attack, you can often easily deal with the threat if you follow best practices to develop secure applications.
Generally, most Web applications use relational databases to store and retrieve information. For example, if you have a Web site that requires authentication, you might have a table called users with a unique ID, a login name, a password, and perhaps some other sort of information like a role. A SQL query to retrieve a user from a users table might look like Listing 1.
Listing 1. SQL query to retrieve a user from a users table
Select * from users where loginID='foo' and password='bar'
|
In this query the user has to give the loginID and the password as input. If an attacker enters the following in the loginID field: ' or 1=1 and the password as: ' or 1=1, the query formed will be something like Listing 2.
Listing 2. Query formed from attacker entries
Select * from users where loginID = '' or 1=1 and password=' ' or 1=1
|
This will always result in a match so that the attacker gains entry to the system. XPath injection works much the same way. Assume, though, that instead of a table called users, you have an XML file that contains user information that looks like Listing 3.
Listing 3. user.xml
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user>
<firstname>Ben</firstname>
<lastname>Elmore</lastname>
<loginID>abc</loginID>
<password>test123</password>
</user>
<user>
<firstname>Shlomy</firstname>
<lastname>Gantz</lastname>
<loginID>xyz</loginID>
<password>123test</password>
</user>
<user>
<firstname>Jeghis</firstname>
<lastname>Katz</lastname>
<loginID>mrj</loginID>
<password>jk2468</password>
</user>
<user>
<firstname>Darien</firstname>
<lastname>Heap</lastname>
<loginID>drano</loginID>
<password>2mne8s</password>
</user>
</users>
|
In XPath, a similar statement to the SQL query is shown in Listing 4.
Listing 4. XPath statement matching the SQL query
//users/user[loginID/text()='abc' and password/text()='test123']
|
And to do the same sort of attack to bypass authentication, you might do something like Listing 5.
Listing 5. Bypassing authentication
//users/user[LoginID/text()='' or 1=1 and password/text()='' or 1=1]
|
You might have a method such as doLogin in your Java
application that performs the authentication again using the XML document in Listing 3. It might look like Listing 6.
Listing 6. XPathInjection.java
import java.io.IOException;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
public class XpathInjectionExample {
public boolean doLogin(String loginID, String password)
throws ParserConfigurationException, SAXException,IOException,
XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("users.xml");
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//users/user[loginID/text()='"+loginID+"'
and password/text()='"+password+"' ]/firstname/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
//print first names to the console
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());}
if (nodes.getLength() >= 1) {
return true;}
else
{return false;}
}
}
|
For Listing 6, if you pass in a login and password such as loginID = 'abc' and password = 'test123' the class will return true (as well as for your examples case a list of first names printed to the console). If, for example, you pass in values like ' or 1=1 or ''=' you will always get a return value of true because XPath will end up seeing a string like the one shown in Listing 7.
Listing 7. String
//users/user[loginID/text()='' or 1=1 or ''='' and password/text()='' or 1=1 or ''='']
|
This will logically result in a query that always returns true and will always allow the attacker to gain access.
Another even more likely and possibly more troubling attack in XPath is the ability of attackers to exploit XPath to manipulate XML documents on the fly in an application.
Extracting the XML document structure
The query used to bypass authentication can also be used to extract information about the XML document. Suppose an attacker makes a guess that the name of the first sub-node in the XML document is loginID and wants to confirm it. The attacker enters the input in Listing 8.
Listing 8. Input entered by attacker
abc' or name(//users/LoginID[1]) = 'LoginID' or 'a'='b
|
In place of 1=1 in Listing 7, the expression given in Listing 8 checks if the first subnode's name is loginID. The query formed is shown in Listing 9.
Listing 9. Query
String(//users[LoginID/text()='abc' or name(//users/LoginID[1]) =
'LoginID' or 'a=b' and password/text()=''])
|
By trial and error, the attacker can check the various child nodes of the XML document and gather information by seeing if this XPath expression results in a successful authentication. An attacker might then potentially write a simple script that sends various XPath injections and extracts an XML document from a system as mentioned in Klein's paper.
Since XPath injection attacks are much like SQL injection attacks, you can prevent with many of the same methods used to prevent SQL injection attacks. Not surprisingly most of these preventative methods are the same methods you can and should use to prevent other typical code injection attacks.
No matter what the application, environment, or language you should follow these best practices:
- Assume all input is suspect.
- Validate not only the type of data but also its format, length, range, and contents (for example, a simple regular expression such as
if (/^"*^';&<>()/)should find most suspect special characters). - Validate data both on the client and the server because client validation is extremely easy to circumvent.
- Follow a consistent written and [missing word] strategy toward application security based on secure software development best practices (see Apache's excellent list for Web Services in Resources).
- Test your applications for known threats before you release them. The article "Fuzz Testing", available in Resources, shows you how to do this.
Unlike most database applications, XPath does not support the concept of parameterized queries, but you can mimic the concept using other APIs such as XQuery. Rather than build expressions as strings that then pass to the XPath parser for dynamic execution at run time as shown in Listing 10, you can parameterize your query by creating an external file that holds your query like Listing 11.
Listing 10. Strings passed to the XPath parser
"//users/user[LoginID/text()=' " + loginID+ " ' and password/text()='
"+ password +" ']"
|
In Listing 11, parameterize your query by creating an external file that holds your query.
Listing 11. dologin.xq
declare variable $loginID as xs:string external;
declare variable $password as xs:string external;//users/user[@loginID=
$loginID and @password=$password]
|
You could then do the same thing as Listing 11 with slight modification as shown in Listing 12.
Listing 12. XQuery snippet
Document doc = new Builder().build("users.xml");
XQuery xquery = new XQueryFactory().createXQuery(new File("
dologin.xq"));
Map vars = new HashMap();
vars.put("loginid", "abc");
vars.put("password", "test123");
Nodes results = xquery.execute(doc, null, vars).toNodes();
for (int i=0; i < results.size(); i++) {
System.out.println(results.get(i).toXML());
}
|
This keeps important explicit variables, $loginID and $password from being processed as executable expressions at runtime. This way your execution logic and data are separated; unfortunately, query parameterization is not part of XPath, but it is freely available in open source parsers such as SAXON (see Resources for a link). Some other parsers allow this sort of functionality, and it can be a solid way to protect against XPath injection.
Data inspection at the Web server
To protect against both XPath injection and other forms of code injection,
you should check all data passed from your Web server to your backend services. For example, with Apache you could use a Mod_Security filter such as SecFilterSelective THE_REQUEST "(\'|\")" to look for single quotes
and double quotes in strings and disallow them. You might use this same approach to
filter and disallow other forms of special characters such as ("*^';&><</),
which are all characters that can be used for various injection attacks.
This approach might be very good for some applications that perhaps use REST- or
SOAP-based XML services, but in other cases, it might not be possible. As always the best
approach is intelligent secure design from the initial design through implementation of your application.
Most organizations think of threat detection and threat denial but rarely do they plan, with a qualified security professional, what to do if or when their systems are breached. You should always assume the worst case scenario and plan for it.
This depends greatly on your organization and the type of system that is penetrated, but usually the best thing to do is bring your systems offline and wait until a professional forensic engineer can come to inspect the system. Sometimes people immediately take systems offline and reimage their drives, but this wipes the evidence of the crime as well as possible information on other compromises the intruder has made to this system. If possible, always try to preserve the state of the system for a security expert to review.
Most applications that use XML will not be vulnerable to XPath injection attacks and XML applications should not be considered more at risk just because a specific vulnerability is found. At the same time, with the increased adoption of new platforms such as Ajax, RIA platforms such as FLEX, or Open Laszlo, as well as the federation of XML services from organizations such as Google that rely heavily on the use of XML for everything from communication with backend services to persistence, you the developer need to stay aware of the threats and risks created by these approaches.
Fortunately, while the specific threats are new, the problems and principles to solve them are not. Following good security best practices will help you protect yourself not only from XPath injection attacks but other forms of attacks as well.
Learn
-
The Java XPath
API (Elliotte Rusty Harold, developerWorks, July 2006): Read about querying XML from Java programs.
-
W3 Schools
XPath tutorial: Read a primer on XPath 1.0.
-
Get started with XPath
2.0 (Benoît Marchal, developerWorks, May 2006): Learn how to easily write more sophisticated requests with the new data model.
- The Web
Application Security Consortium: Visit the Web site for an international group of experts, industry practitioners, and organizational representatives who produce open source and widely agreed upon best-practice security standards for the World Wide Web.
-
Security
Focus: Find a host of resources for security experts.
-
Blind
XPath Injection: Read Amit Klein's description of the Blind XPath injection attack.
-
Apache: See an excellent article on Web services security best practices.
-
Fuzz testing
(Elliotte Harold, developerWorks, September 2006): See what happens when you
deliberately inject random bad data into an application to see what breaks and learn to use defensive coding techniques.
-
developerWorks XML zone: Find hundreds more XML resources.
-
IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
-
XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
-
developerWorks technical events and webcasts: Stay current with technology in these sessions.
- The technology bookstore: Browse for books on these and other technical topics.
Get products and technologies
-
List of XML and Database
products: Review and sample a variety of products.
-
Saxon: Get this open source parser.
-
IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
Discuss
- Participate in the discussion forum.
-
XML zone discussion forums: Participate in any of several XML-centered forums.
-
developerWorks blogs: Check out these blogs and get involved in the developerWorks community.
Robi Sen is the Vice President of Services for Department 13 LLC an IT consultancy. He spends most of his time helping customers from Fortune 500 companies to start-ups define and manage their technology challenges. He has written widely on various technologies and often lectures and presents at various conventions.