XML Path language

XML Path (XPath) is a language for addressing parts of an XML document. It is a W3C recommendation. XPath is well known and commonly used in XML applications. This language will be used for specifying location path expressions which covers most areas of an XML document.

The following are considered nodes in the XPath language:
  • Document root
  • Elements
  • Attributes that are not namespace declarations
  • Processing instructions (PIs)
  • Comments
  • Text
During the progress of a particular parse, constructs will become XPath identifiable when sufficient characters are parsed to uniquely recognize the type of node and its name (if it has one). The following are the points where this occurs for each node:
Document root
This is identifiable when the first non-XML declaration structure is discovered. This will never have a corresponding name.
Element
This is identifiable after the element type is fully parsed. This will be when either whitespace, a '>' character or a '/' character are encountered after the element type.
Attribute
This is identifiable after the attribute name is fully parsed. This will be when either whitespace or an '=' character are encountered after the attribute name.
PI
This is identifiable after the PI target is fully parsed. This will be when either whitespace or a “?” character are encountered after the PI target.
Comment
This is identifiable after the beginning of the comment markup is parsed. This will be after the “” are encountered.
Text
This is identifiable after the “>” of markup within element nodes.
Some structures within an XML document are not identifiable using the XPath language. The following constructs are not XPath identifiable:
XML declaration
The path location string will be 0 length. This is recognized when “<?xml “ is encountered at the beginning of the document and before the subsequent “>" is encountered.
Doctype declaration
The path location string will be 0 length. This is recognized when "<?DOCTYPE " is encountered and before the subsequent ].
Namespace declarations
The path location string will denote the containing element node. This is recognized when xmlns is encountered where there should be an attribute.
While text nodes are XPath identifiable, they will not be uniquely denoted. Instead, the location path will denote the containing parent element node.

For more information on the format of the auxiliary information records, see Metadata records.