Skip to main content

Avoid the dangers of XPath injection

Become aware of the risks to better protect your XML applications

Robi Sen (rsen@department13.com), Vice President of Services, Department13
Robi Sen is the Vice President of Services for Department 13 LLC an IT consultancy. He spends most of his time helping customers from Fortune 500 companies to start-ups define and manage their technology challenges. He has written widely on various technologies and often lectures and presents at various conventions.

Summary:  With the proliferation of simple XML APIs, Web services, and Rich Internet Applications (RIAs), more organizations have adopted XML as a data format for everything from configuration files to remote procedure calls. Some people have even used XML documents instead of more traditional flat files or relational databases, but like any other application or technology that allows outside user submission of data, XML applications can be susceptible to code injection attacks, specifically XPath injection attacks.

Date:  17 Jul 2007
Level:  Intermediate
Activity:  5204 views

Introduction

As new technologies emerge and become well established so do threats against those technologies. Blind SQL injection attacks are a well know and recognized form of code injection attack, but there are many other forms, some not so well documented or understood. An emerging code injection attack is the XPath injection attack, which takes advantage of the loose typing and forgiving nature of XPath parsers to allow malcontents to piggyback malicious XPath queries on URLs, forms, or other methods to gain access to privileged information and change it.

This article looks at how XPath attacks are usually carried out and provides an example in Java™ and XML environments. It discusses how to detect such threats, looks at what you can do to mitigate the threat, and finally discusses what you can do in response to a suspected penetration.


Getting started

The focus of this article is a specific type of code injection attack: the Blind XPath injection. If you are not familiar with XPath 1.0 or need a primer, look at the W3 Schools XPath tutorial (see Resources for a link). Also, you can find a number of articles on working with XPath in a variety of languages on developerWorks (see Resources for links). This article will use examples focused on XPath 1.0 but will also work with XPath 2.0. XPath 2.0 actually expands the possible issues facing you.

This article also supplies a Java code example that is developed to work with the Java JDK 5.0. While the concepts and topics in this article are cross platform, if your application uses XPath to get the specific code sample, you will have to use the JDK 5.0.


Code injection

One of the more common attacks or threats to Web applications is some form of code injection, which Wikipedia defines as:

... a technique to introduce (or "inject") code into a computer program or system by taking advantage of the unenforced and unchecked assumptions the system makes about its inputs. The purpose of the injected code is typically to bypass or modify the originally intended functionality of the program. When the functionality bypassed is system security, the results can be disastrous.

Any quick perusal of Web sites such as the Web Application Security Consortium or Security Focus (see Resources for links) will show a multitude of attacks using some form of code injection from JavaScript to SQL injection to other forms of code injection attacks. An emerging threat, first outlined by Amit Klein in a paper in 2004, is the blind XPath injection attack (see Resources). This attack functions almost exactly like the blind SQL injection attack but, unlike SQL injection attacks, few people know about XPath injection attacks or take precautions against them. Like the SQL injection attack, you can often easily deal with the threat if you follow best practices to develop secure applications.


The XPath attack

Generally, most Web applications use relational databases to store and retrieve information. For example, if you have a Web site that requires authentication, you might have a table called users with a unique ID, a login name, a password, and perhaps some other sort of information like a role. A SQL query to retrieve a user from a users table might look like Listing 1.


Listing 1. SQL query to retrieve a user from a users table
                
Select * from users where loginID='foo' and password='bar' 

In this query the user has to give the loginID and the password as input. If an attacker enters the following in the loginID field: ' or 1=1 and the password as: ' or 1=1, the query formed will be something like Listing 2.


Listing 2. Query formed from attacker entries
                
Select * from users where loginID = '' or 1=1 and password=' ' or 1=1

This will always result in a match so that the attacker gains entry to the system. XPath injection works much the same way. Assume, though, that instead of a table called users, you have an XML file that contains user information that looks like Listing 3.


Listing 3. user.xml
                
<?xml version="1.0" encoding="UTF-8"?> 
<users> 
      <user>  
          <firstname>Ben</firstname>
          <lastname>Elmore</lastname> 
          <loginID>abc</loginID> 
          <password>test123</password> 
      </user> 
      <user>  
          <firstname>Shlomy</firstname>
          <lastname>Gantz</lastname>
          <loginID>xyz</loginID> 
          <password>123test</password> 
      </user> 
      <user>  
          <firstname>Jeghis</firstname>
          <lastname>Katz</lastname>
          <loginID>mrj</loginID> 
          <password>jk2468</password> 
      </user> 
      <user>  
          <firstname>Darien</firstname>
          <lastname>Heap</lastname>
          <loginID>drano</loginID> 
          <password>2mne8s</password> 
      </user> 
 </users> 

In XPath, a similar statement to the SQL query is shown in Listing 4.


Listing 4. XPath statement matching the SQL query
                
//users/user[loginID/text()='abc' and password/text()='test123']

And to do the same sort of attack to bypass authentication, you might do something like Listing 5.


Listing 5. Bypassing authentication
                
//users/user[LoginID/text()='' or 1=1  and password/text()='' or 1=1]

You might have a method such as doLogin in your Java application that performs the authentication again using the XML document in Listing 3. It might look like Listing 6.


Listing 6. XPathInjection.java
                
import java.io.IOException;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;

public class XpathInjectionExample {

  
       public boolean doLogin(String loginID, String password)
             throws ParserConfigurationException, SAXException,IOException, 
XPathExpressionException {

          DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
         domFactory.setNamespaceAware(true);
         DocumentBuilder builder = domFactory.newDocumentBuilder();
         Document doc = builder.parse("users.xml");

         XPathFactory factory = XPathFactory.newInstance();
         XPath xpath = factory.newXPath();
         XPathExpression expr = xpath.compile("//users/user[loginID/text()='"+loginID+"' 
and password/text()='"+password+"' ]/firstname/text()");
     Object result = expr.evaluate(doc, XPathConstants.NODESET);
         NodeList nodes = (NodeList) result;
//print first names to the console 
         for (int i = 0; i < nodes.getLength(); i++) {
             System.out.println(nodes.item(i).getNodeValue());}
             
       
         if (nodes.getLength() >= 1) {               
              return true;}
              else
             {return false;}
       }
}

For Listing 6, if you pass in a login and password such as loginID = 'abc' and password = 'test123' the class will return true (as well as for your examples case a list of first names printed to the console). If, for example, you pass in values like ' or 1=1 or ''=' you will always get a return value of true because XPath will end up seeing a string like the one shown in Listing 7.


Listing 7. String
                
//users/user[loginID/text()='' or 1=1 or ''='' and password/text()='' or 1=1 or ''='']

This will logically result in a query that always returns true and will always allow the attacker to gain access.

Another even more likely and possibly more troubling attack in XPath is the ability of attackers to exploit XPath to manipulate XML documents on the fly in an application.


Extracting the XML document structure

The query used to bypass authentication can also be used to extract information about the XML document. Suppose an attacker makes a guess that the name of the first sub-node in the XML document is loginID and wants to confirm it. The attacker enters the input in Listing 8.


Listing 8. Input entered by attacker
                
abc' or name(//users/LoginID[1]) = 'LoginID' or 'a'='b 

In place of 1=1 in Listing 7, the expression given in Listing 8 checks if the first subnode's name is loginID. The query formed is shown in Listing 9.


Listing 9. Query
                
String(//users[LoginID/text()='abc' or name(//users/LoginID[1]) =
'LoginID' or 'a=b' and password/text()='']) 

By trial and error, the attacker can check the various child nodes of the XML document and gather information by seeing if this XPath expression results in a successful authentication. An attacker might then potentially write a simple script that sends various XPath injections and extracts an XML document from a system as mentioned in Klein's paper.


XPath injection prevention

Since XPath injection attacks are much like SQL injection attacks, you can prevent with many of the same methods used to prevent SQL injection attacks. Not surprisingly most of these preventative methods are the same methods you can and should use to prevent other typical code injection attacks.

Validation

No matter what the application, environment, or language you should follow these best practices:

  • Assume all input is suspect.
  • Validate not only the type of data but also its format, length, range, and contents (for example, a simple regular expression such as if (/^"*^';&<>()/) should find most suspect special characters).
  • Validate data both on the client and the server because client validation is extremely easy to circumvent.
  • Follow a consistent written and [missing word] strategy toward application security based on secure software development best practices (see Apache's excellent list for Web Services in Resources).
  • Test your applications for known threats before you release them. The article "Fuzz Testing", available in Resources, shows you how to do this.

Parameterization

Unlike most database applications, XPath does not support the concept of parameterized queries, but you can mimic the concept using other APIs such as XQuery. Rather than build expressions as strings that then pass to the XPath parser for dynamic execution at run time as shown in Listing 10, you can parameterize your query by creating an external file that holds your query like Listing 11.


Listing 10. Strings passed to the XPath parser
                
"//users/user[LoginID/text()=' " + loginID+ " ' and password/text()='
"+ password +" ']" 

In Listing 11, parameterize your query by creating an external file that holds your query.


Listing 11. dologin.xq
                
declare variable $loginID as xs:string external;
declare variable $password as xs:string external;//users/user[@loginID=
$loginID and @password=$password]

You could then do the same thing as Listing 11 with slight modification as shown in Listing 12.


Listing 12. XQuery snippet
                
Document doc = new Builder().build("users.xml");
XQuery xquery = new XQueryFactory().createXQuery(new File(" 
dologin.xq"));
Map vars = new HashMap();
vars.put("loginid", "abc");
vars.put("password", "test123");
Nodes results = xquery.execute(doc, null, vars).toNodes();
for (int i=0; i < results.size(); i++) {
    System.out.println(results.get(i).toXML());
}

This keeps important explicit variables, $loginID and $password from being processed as executable expressions at runtime. This way your execution logic and data are separated; unfortunately, query parameterization is not part of XPath, but it is freely available in open source parsers such as SAXON (see Resources for a link). Some other parsers allow this sort of functionality, and it can be a solid way to protect against XPath injection.

Data inspection at the Web server

To protect against both XPath injection and other forms of code injection, you should check all data passed from your Web server to your backend services. For example, with Apache you could use a Mod_Security filter such as SecFilterSelective THE_REQUEST "(\'|\")" to look for single quotes and double quotes in strings and disallow them. You might use this same approach to filter and disallow other forms of special characters such as ("*^';&><</), which are all characters that can be used for various injection attacks. This approach might be very good for some applications that perhaps use REST- or SOAP-based XML services, but in other cases, it might not be possible. As always the best approach is intelligent secure design from the initial design through implementation of your application.


What if?

Most organizations think of threat detection and threat denial but rarely do they plan, with a qualified security professional, what to do if or when their systems are breached. You should always assume the worst case scenario and plan for it.

This depends greatly on your organization and the type of system that is penetrated, but usually the best thing to do is bring your systems offline and wait until a professional forensic engineer can come to inspect the system. Sometimes people immediately take systems offline and reimage their drives, but this wipes the evidence of the crime as well as possible information on other compromises the intruder has made to this system. If possible, always try to preserve the state of the system for a security expert to review.


Summary

Most applications that use XML will not be vulnerable to XPath injection attacks and XML applications should not be considered more at risk just because a specific vulnerability is found. At the same time, with the increased adoption of new platforms such as Ajax, RIA platforms such as FLEX, or Open Laszlo, as well as the federation of XML services from organizations such as Google that rely heavily on the use of XML for everything from communication with backend services to persistence, you the developer need to stay aware of the threats and risks created by these approaches.

Fortunately, while the specific threats are new, the problems and principles to solve them are not. Following good security best practices will help you protect yourself not only from XPath injection attacks but other forms of attacks as well.


Resources

Learn

  • The Java XPath API (Elliotte Rusty Harold, developerWorks, July 2006): Read about querying XML from Java programs.

  • W3 Schools XPath tutorial: Read a primer on XPath 1.0.

  • Get started with XPath 2.0 (Benoît Marchal, developerWorks, May 2006): Learn how to easily write more sophisticated requests with the new data model.

  • The Web Application Security Consortium: Visit the Web site for an international group of experts, industry practitioners, and organizational representatives who produce open source and widely agreed upon best-practice security standards for the World Wide Web.

  • Security Focus: Find a host of resources for security experts.

  • Blind XPath Injection: Read Amit Klein's description of the Blind XPath injection attack.

  • Apache: See an excellent article on Web services security best practices.

  • Fuzz testing (Elliotte Harold, developerWorks, September 2006): See what happens when you deliberately inject random bad data into an application to see what breaks and learn to use defensive coding techniques.

  • developerWorks XML zone: Find hundreds more XML resources.

  • IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.

  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.

  • developerWorks technical events and webcasts: Stay current with technology in these sessions.

  • The technology bookstore: Browse for books on these and other technical topics.

Get products and technologies

Discuss

About the author

Robi Sen is the Vice President of Services for Department 13 LLC an IT consultancy. He spends most of his time helping customers from Fortune 500 companies to start-ups define and manage their technology challenges. He has written widely on various technologies and often lectures and presents at various conventions.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology, SOA and Web services
ArticleID=229354
ArticleTitle=Avoid the dangers of XPath injection
publish-date=07172007
author1-email=rsen@department13.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers