More advanced XPath
Most of the time, you can do what you need with just the techniques already covered. It's not unusual, however, to run into a situation which requires that you to be more specific. This section shows you how to use predicates to select nodes based on specific criteria, and introduces you to some of the functions built into XPath.
Often, you want not just any node, but a specific node based on specific conditions. You saw an example of that earlier, when you used the expression
/recipes/recipe//instructions. That's really the abbreviated version of
/recipes/recipe[position() = 2]//instructions and what it means is that you're asking the XPath processor to go through each recipes element (of which there is, of course, only one), and for each recipes element go through each recipe element. For each recipe element, check to see if the expression
position() = 2 is true. (In other words, is this the second recipe in the list?) If that statement, called the predicate, is true, the processor uses that node and moves on, returning any instructions.
You can do a variety of things with predicates. For example, you might return only recipes that have a name:
/recipes/recipe[name]. This expression is only testing for the existence of a
name element that is a child of the
recipe element. You can also look for specific values. For example, you can return only the recipe named "A balanced breakfast":
//recipe[name="A balanced breakfast"].
Note that the predicate only tells the processor whether to return the actual node, so in this case, what is returned is the recipe element and not the name. On the other hand, you can tell the processor to return only the name of the first recipe with either of these two expressions (see Listing 22).
Listing 22. Returning only the names of the first recipe
In the case of the first expression, you first select all of the
recipe elements, then return only the one that has a
recipeId attribute of 1. Once you find that node, you move to its child node called name, and return that. In the case of the second expression, you find all of the
name elements, then select only the one that has a parent with a
recipeId attribute of 1. In either case, you get the same output (see Listing 23).
Listing 23. Output
<?xml version="1.0" encoding="UTF-8"?> <name>Gush'gosh</name>
XPath also provides a number of different functions. Some of them relate to the nodes themselves, such as those that look at position, some of them manipulate strings, some relate to numbers, such as sums, and some relate to boolean values.
The nodeset-related functions help you to do things like choose a particular node based on position. For example, you can specifically request the last recipe:
//recipe[last()]. The expression selects all of the
recipe elements, and then returns only the last one. You can also use functions on their own, as opposed to as part of a predicate. For example, you can specifically request the count of recipe elements:
You've already looked at the
position() function and how that works. Other nodeset-related functions include
Most of the string functions are for manipulating strings rather than testing them, with the exception of the
contains() function. The
contains() function can tell you whether a particular string is part of a larger whole. For example, you can return only the nodes that contain a particular string, such as:
This expression returns the recipe element that has the string "breakfast" in its name element.
substring() function enables you to select a specific range of characters from a string. For example, the expression:
substring(//recipe/name, 1, 5) returns
The first argument is the complete string, the second is the position of the first character, and the third is the length of the string.
Other string functions include
Numeric functions include the
number() function, which converts a value into a numeric value so that other functions can act on it. Number functions also include
round(). For example, you can find the sum of all of the recipeId values with the expression:
True, there's not much reason to do such a calculation, but it's the only numeric value in the sample document.
floor() function finds the greatest integer that is less than or equal to the supplied value, while the
ceiling() function goes in the other direction. The
round() performs in the traditional way (see Listing 24).
Listing 24. Results of numeric functions
floor(42.7) = 42 ceiling(42.7) = 43 round(42.7) = 43
Boolean functions are most useful when you get into conditional expressions, which you'll look at in Conditional processing. Perhaps the most useful is the
not() function, which can be used to tell if a particular node does not exist. For example, the expression
//recipe[contains(name, 'breakfast')] returns every recipe that has the string "breakfast" in its
name element. But what if you wanted all of the recipes that were not for breakfast? You could use the expression:
Other boolean functions include
false(), which return constants, and
boolean(), which converts a value into a boolean so it can be used as a test value.