Considerations when specifying text() nodes
The inclusion of text() nodes with XML pattern expressions may affect the generation of index entries. Use /text() consistently in index definitions and predicates.
- How specification of the text() node affects index keys
- Consider the following sample XML document fragment:
<company name="Company1"> <emp id="31201" salary="60000" gender="Female"> <name><first>Laura</first><last>Brown</last></name> <dept id="M25"> Finance </dept> </emp> </company>
If the following index is created with text() specified at the end of the pattern, no index entries will be inserted because the name elements in the sample XML document fragments do not contain text themselves. Text is found only in the child elements, first and last.
CREATE INDEX nameindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/name/text()' AS SQL VARCHAR(30)
However, if the next index is created with the element name specified at the end of the pattern, the text from the first and last child elements will be concatenated in the inserted index entries.
CREATE INDEX nameindex on company(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/name' AS SQL VARCHAR(30)
The presence or absence of the text() node will affect index entry generation for non leaf elements, but not for leaf elements. If you are indexing leaf elements, specifying text() is not recommended. If you do specify text(), queries must also use text() for successful index matching. In addition, schema validation applies only to elements and not to text nodes.
Caution must be used when specifying XML patterns that can match elements which are non leaf nodes without text(). The concatenation of descendant element text nodes can cause unexpected results. Especially specifying //* with an XML pattern will most likely index a non leaf element.
In some cases, concatenation can be useful for indexes using VARCHAR. For example, an index on /title in the following document fragment may be useful for ignoring the bold formatting within the title:
<title>This is a <bold>great</bold> book about XML</title>
A query predicate to look for a specific employee name could be written as follows:
db2-fn:xmlcolumn('COMPANY.COMPANYDOCS')/company/emp[name='LauraBrown']
Whitespace is significant in the predicate and the document. If a space is inserted between 'Laura' and Brown' in the predicate, nothing will be returned for the following query, since the sample XML document fragment itself does not contain a space between the first and last names:
db2-fn:xmlcolumn('COMPANY.COMPANYDOCS')/company/emp[name='Laura Brown']
- Indexes for queries with a compound equal predicate
- The following query retrieves company information for employees
who are in the Finance or Marketing departments.
SELECT companydocs FROM companyinfo WHERE XMLExists('$x/company/emp[dept/text()='Finance' or dept/text()='Marketing']' PASSING companydocs AS "x")
To be compatible, the index over XML data needs to index the text node for the department of each employee among the indexed nodes and to store values as a VARCHAR type.
The query can use this index over XML data:
CREATE INDEX empindex on companyinfo(companydocs) GENERATE KEY USING XMLPATTERN '/company/emp/dept/text()' AS SQL VARCHAR(30)