I have read a lot of the documentation about various indexes and types of queries. However, I've not seen clear explanations about which type of index to use for particular kinds of queries.
For my application, I need to query rather small XML documents. One table has about 3.5 million records in it; the 2 other tables about 2 million. I need to search the text() of the nodes for text, or attributes for a particular integer range or value. I need to use the XQuery functions "starts-with", "ends-with", and "matches" for the text. I would like this query to return results in less than a second.
What indexes should I implement? What type of query should I use (XQuery/SQL)?
I have been trying to find some good examples of queries using the @xpath syntax, but I've not found any. The one query I tried returned an error (which I can't an explanation for):
SQL20423N, The error message is ""IQQS0032E The query cannot be pro". SQLSTATE=38H10
Is there a Technical Article about the proper way to use @xpath?
This topic has been locked.
14 replies Latest Post - 2011-06-17T11:53:15Z by ruchi04
Pinned topic Best indexes and query to use?
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2011-06-17T11:53:15Z at 2011-06-17T11:53:15Z by ruchi04
Re: Best indexes and query to use?2010-02-02T18:44:58Z in response to kevinncHi Kevin,
have you seen the following article:
"Exploit XML indexes for XML query performance in DB2 9"
Among various general guidelines for using XML indexes, this article contains a section on querying text nodes.
Both XQuery and SQL/XML can use the same XML indexes.
Full-text search predicates such as "ends-with", "contains", and so on require the use of a full-text index. DB2 offers two options for full-text search, the "Net Search Extender", and the "DB2 Text Search" features. The following resources provide more information:
Here are two simple examples to get you started with full-text search. The examples are taken from Chapter 19 of the DB2 pureXML Cookbook, and assume that "doc" is the name of an XML column in the table "orders":
Example of a DB2 Net Search Extender query:
SELECT doc FROM orders WHERE CONTAINS(doc, 'SECTION("/order/item/name") "Deluxe" ') = 1;
Example of a DB2 Text Search query:
SELECT doc FROM orders WHERE CONTAINS(doc, '@xpath:' '/order/item/name[. contains("Deluxe")]' ' ')=1;
Re: Best indexes and query to use?2010-02-19T04:58:09Z in response to MatthiasNicolaThanks Matthias,
I had read all but one of those articles (I've now read them all). Conspicuously absent from them are the XQuery functions, "ends-with", "starts-with", etc. I am going to have to use a SQL/XQuery query, as I need to select other data as well as utilize the aforementioned XQuery functions.
I've been doing a lot of "playing around" with queries and exploring tools such as db2expln and db2exfmt. I don't believe some of my queries are utilizing the DB2 Text Search index. Each of these should produce the same results (unless I messed up the syntax):
SELECT DOC FROM ORDERS WHERE CONTAINS(DOC, '@xpath:''/order/item/name'' ')=1
XQUERY for $i in db2-fn:sqlquery("select DOC FROM ORDERS WHERE CONTAINS(DOC, '@xpath:''//namehttp://. contains("Deluxe")''')=1") where $i//name return $i
These take about 45 seconds, each. I assume these two are doing the same thing, as any XQuery function took about the same amount of time ("starts-with", "ends-with"). That is, it pulls out the XML via sqlquery and then processes it with the XQuery function, right?
xquery for $i in db2-fn:xmlcolumn("ORDERS.DOC") where $i//name return $i
This one takes about 8 minutes. I assume it's not using an index at all. I have a SQL VARCHAR index on "name", but it didn't seem to use it. Should/could it have used the VARCHAR index for #3?
SELECT DOC FROM ORDERS WHERE CONTAINS(DOC, '@xpath:''//name'' ')=1 fetch first 50 rows only
This one produced results almost immediately, but doesn't give me the XQuery functions.
Also, I seemed to have run into a roadblock with searching for integer attribute values. According to db2exfmt, it used a text index on an integer value:
SELECT DOC FROM ORDERS WHERE CONTAINS(DOC, '@xpath:''//days
@shipping > 2'' ')=1
I have a SQL double index on //days/@shipping, but it didn't seem to use it. Can a text index "know" that 3 is greater than 2?
For my initial testing (a few months ago, v9.5), I had used Net Search Extender with a custom document model. However, it had a flaw. Specifically, there wasn't a way to search for the text() of the node. I.e., if I had an index on this (<XMLFieldDefinition name="street" locator="//street" />):
And I did a search for "street":
SELECT doc FROM ORDERS WHERE CONTAINS(DOC, SECTIONS("street")“street”)=1
...it would return both documents, even though I only wanted the second one. Is there a way to just search the text()?
So, I'm not sure where to go from here. I'd like my results returned in a matter of seconds. I might have some more bottle-necks (like memory, disk I/O), but the fact that #4 give me results very quickly give me hope. Should I add more indexes? SQL Select into a TEMPORARY table and then process with XQuery? Are there indexes I can use to do substring matches (XQuery starts-with/ends-with/contains)? Any help/suggestions are appreciated.
Re: Best indexes and query to use?2010-02-22T08:35:30Z in response to kevinncHi Kevin,
I also posted some text search examples in my blog, including examples of expressing starts-with and ends-with predicates for the DB2 text index:http://nativexmldatabase.com/2010/02/21/get-started-with-xml-full-text-search-in-db2/
Let me make sure I understand the intention of your queries. Your queries 1), 3), and 4) do not specify a value to search for. So it looks like your intention is to find all documents that have a "name" element regardless of its value, is that correct? I don't know exactly what your XML documents look like, but chances are that all or many of them contain such a "name" element, in which case the use of an index is not very helpful, and the number of matching documents certainly influences the query response time quite a bit.
There are differences between "structural predicates", which look for the existence of some element, and "value predicates", which look for specific values within the XML elements.
Based on your query 3), here are examples of what I call a structural vs. a value predicate:
Structural predicate: where $i//name
Value predicate: where $i//name = "Kevin"
Structural predicates tend to be less common than value predicates.
Some comments on your queries:
Your 1st this query should use the DB2 Text Search index, if the index has been created and updated correctly.
In XQuery you can also use the function db2-fn:xmlcolumn-contains, which makes the query a little bit simpler. Here is an example:
xquery for $i in db2-fn:xmlcolumn-contains( 'ORDER.DOC', '@xpath:' '/Order/item/name[. contains("Shovel")]' '' ) return $i/Order;
The function db2-fn:xmlcolumn-contains specifies a text search predicate that is evaluated by the text index. The index indentifies a matching set of documents, hopefully a small subset of all documents in your table. The surrounding XQuery then operates on those documents only. The key thing is that the text index avoids a table scan over all documents.
Your 3rd query cannot use an XML index of type VARCHAR as is, because the predicate is a structural predicate, not a value predicate. But, it's easy to express a structural predicate as an equivalent value predicate. The following query looks for "name" elements whose value is greater than the empty string. This means, any document that contains a "name" element will be found:
xquery for $i in db2-fn:xmlcolumn( "ORDERS.DOC") where $i //name >= "" return $i;
This query can potentially use the following index:
create index idx1 on orders(doc) generate keys using xmlpattern '//name' as sql varchar(200);
However, if all or many documents contain a "name" element, i.e. if the predicate is not very selectve, then DB2 may decide that a table scan provides better performance than using an index.
Also note that using the double slash // tends to be more costly than using a fully specified xpath. For best performance, use fully specified xpaths where possible.
How many documents do you have in your table and how many of them contain a "name" element? The selectivity of your search predicate has a big impact on performance! If most or all of your documents contain a "name" element, then Query 4 is faster than the other queries simply because it only returns 50 matches, and not all matches.
Yes, the text index can evaluate numeric predicates on XML attributes. In your example, you coded the numeric predicate as a text search predicate using the "contains" function, and hence only the text index can be used.
Alternatively you can code a regular XML predicates:
select doc from order where xmlexists( '$DOC//days[@shipping > 2]'); xquery for $i in db2-fn:xmlcolumn( "ORDER.DOC") where $i //days/@shipping > 2 return $i;
Such queries can then use the following index:
create index idx2 on orders(doc) generate keys using xmlpattern '//days/@shipping' as sql double;
Hope this helps. Let us know how things go.
Re: Best indexes and query to use?2010-02-23T02:51:46Z in response to MatthiasNicolaThanks, Matthias,
Sorry, you're right, my queries make no sense. :-) I think the forum coding screwed it up*. Basically, with #1-4, I wanted to test different queries that would "pull out all documents where 'name' contained the word 'Deluxe' in it". Let's see if this comes through better:
SELECT DOC FROM ORDERS WHERE CONTAINS(DOC, '@xpath:' '/order/item/name[. contains("Deluxe")]' ' ')=1
XQUERY for $i in db2-fn:sqlquery( "select DOC FROM ORDERS WHERE CONTAINS(DOC, '@xpath:''//name[. contains("Deluxe ")''')=1") where $i //name[contains(., "Deluxe")] return $i
XQUERY for $i in db2-fn:xmlcolumn( "ORDERS.DOC") where $i //name[contains(., "Deluxe")] return $i
SELECT DOC FROM ORDERS WHERE CONTAINS(DOC, '@xpath:' '//name[contains(., "Deluxe")]' ' ')=1 fetch first 50 rows only
5) I think this one came through "well enough". I did some testing with the newly formated query and it is fast.
I had a Net Search Extender that I was starting to play with. Now that I see a Text Search will get me closer, I'm reindexing using it. I will post another post when it gets done...Thanks again!!
- Why doesn't the Forum Help page tell me how to put stuff in pretty code-blocks? Why can't I preview a message prior to posting? Any why is the "Save draft" button grayed out for me? And why can't I edit/delete my message once posted? Arrrgh! (Sorry for the rant...I know this isn't your area, but this would make life easier for us both.)
Re: Best indexes and query to use?2010-02-23T03:29:45Z in response to kevinncHi Kevin,
I fully understand your frustration about the editing and display capabilities of this forum. For me I find that the forum works better in IE than in Firefox. When you edit a post you should see a legend with editing help on the right, and there should be a "preview" button at the top of the text box where you are writing. The "preview" helps. I never see the "preview" button in Firefox. Sometimes the preview button is missing even in IE, and when that happens I don't even start writing but discard the post right away and try again. On the 2nd try the "preview" button usually appears.
There is also a "Browse..." button to attach files. This is a nice option to provide your code samples without having to deal with the formatting.
Now that I see your queries better, I can offer a few more comments.
Query 1 can use the text index and should be fine. If this is slow, check the selectivity of the predicate and the size of the result set.
Query 2 can use the text index for the contains function in the SQL "where" clause within the db2-fn:sqlquery function, because that predicate is written for the text index. Query 2 cannot use any index for the XQuery contains function in the "where" clause of the flwor expression. That second contains predicate is redudant and can be omitted in this query.
Query 3 cannot use any index because the contains predicate is not written for the text index. Query 2 and 3 would better use db2-fn:xmlcolumn-contains.
Query 4: same as Query 1 but faster due to smaller result set.
Good luck! And yes, do post again when you have progress.
Re: Best indexes and query to use?2010-02-25T09:00:33Z in response to MatthiasNicolaHi Matthias,
LOL. Ok, now I see the Plain Text Help and the Preview in Firefox. My default browser is Safari (I'm running Mac OS X) and nothing showed up. Even with FF, I had to do the Discard trick to see both. Ugh.
As for the other questions/suggestions. I don't know the structure of my XML, so I will have to use the double-slash // to search the document. Most of my documents in each table have many of the same elements in them. There are 3 tables--one has about 3.5 million rows in it (the table I'm running tests on); another one has about 470,000; the smallest one has about 160,000. Each document is pretty small, about 20-50k.
Thanks for the blog post. It's interesting that in your last example that uses "starts-with", you return all documents that "contains" with the db2-fn:xmlcolumn-contains function, and then use the XQuery function to filter the desired ones out. I'm not sure how well that will work for me, as it may be a lot of docs that are pulled out unnecessarily. I may have to try to FETCH a certain number of rows, process it with the XQuery, see if I have my desired number of results and FETCH again. (Or, I may use my application to pull out the desired results.) I am pulling out whole words, so I won't use a wildcard. Using a wildcard in my tests slowed things considerably (as I'd expect).
Like you say, #1 and #4 are the same. But, #4 returns results almost immediately. #1 has a long pause prior to returning all of the results. It gets to about 75% done and it pauses for several seconds. However, if I add "OPTIMIZE FOR 100 ROWS", #1 is just as fast. Why? (FWIW, many of the queries I've tried return fewer than 100 results.) Is there any harm in using OPTIMIZE FOR n ROWS? Also, is there a way to use OPTIMIZE FOR with 'db2-fn:xmlcolumn-contains'?
Another thing I've noticed is that when trying to return a subset of documents via XQuery, it takes a lot longer than if I return the entire set. Using the 'subsequence' function:
XQUERY let $results := ( for $i...return $i) return subsequence($results, 1, 50)
Without the 'subsequence', my query is quite fast. With the subsequence, it's very slow. Why? I've also tried using the position() function of returning a sub-set of results, but it was also slow. What's the best way of limiting the number of results when using XQuery?
Re: Best indexes and query to use?2010-02-27T23:27:16Z in response to kevinncHi Kevin,
the "optimize for n rows" clause can affect the query execution plan that is chosen by the DB2 optimizer, which may explain some of the differences you are seeing. As far as I understand your usage scenario, I think there is no harm in using "optimize for n rows". Applications that use text search often only want to display the first 20 or 50 results to the users, so "optimize for n rows" can make a lot of sense.
In your example with the subsequence() function, I suspect that the entire result of the "for $i ...return $i" gets computed and assigned to the variable $results before the desired subsequence is computed. That's probably something that could be optimized in DB2. For now you can wrap your XQuery in an XMLTABLE function and use the SQL clause "fetch first n rows only":
from xmltable('for $i in ... return $i')
fetch first 50 rows only;
Maybe this works for you.
Re: Best indexes and query to use?2011-06-08T11:56:34Z in response to kevinncI have been having the following problem with text search for querying an xml column:
my tag/attribute name have a "name.number" format for eg: <PID.28><CWE.1>Nationality</CWE.1></PID.28>
On trying to query xpath it gives me error: SQL20423N, The error message is ""IQQS0032E The query cannot be pro
SELECT MESSAGE_ID, MESSAGE_TYPE
'@xpath:''/ORM_O01/ORM_O01PATIENT/PID/PID.28/CWE.1 ''') = 1
However, it gives correct result for the ones without a dot in atttribute name.
SELECT MESSAGE_ID, MESSAGE_TYPE
'@xpath:''/ORM_O01/ORM_O01PATIENT/PID/PID13/XTN2 ''') = 1
I am trying to find out if there is a way to escape the "dot" in xml attribte name while querying for the xpath?
Bryan_P 120000JRSS27 PostsACCEPTED ANSWER
Re: Best indexes and query to use?2011-06-09T19:53:03Z in response to kevinnchi Ruchi04,
It is generally not advised to use certain characters, such a "dot", in XML element names because certain software can misinterpret the character. The following statement is from XMLSCHOOLS.COM:
"Avoid "." characters. If you name something "first.name," some software may think that "name" is a property of the object "first." "
That being said, you have two choices (the first may not be feasible):
1. replace the element names that contain "." with a more suitable name
2. escape (as you asked) - again using W3SCHOOLS as a reference it shows "." as the escape for the period (dot) character.
BTW, your post got appended to a previous discussion topic so it might be overlooked by others.
Re: Best indexes and query to use?2011-06-10T09:39:04Z in response to Bryan_PHi Bryan,
Its not possible to replace the dot with something else.. Its basically a HL7 message that is converted into xml.
I tried using .. It did not work for me.
SELECT author, bookno
'@xpath:''/bookinfo/message/message.11/pt.1/''') = 1
I am not sure if this escape sequence works for db2 text search
Re: Best indexes and query to use?2011-06-10T16:43:34Z in response to ruchi04Hi,
although using dots in element names is not a good practice, I think that the DB2 Text Search ought to be able to handle it. This might be a bug in DB2 Text Search and might require a fix. We're looking into it, but please do open a PMR if you can.
Re: Best indexes and query to use?2011-06-17T11:53:15Z in response to MatthiasNicolaHi Matthias,
Do we have a work around for escaping the dot in node name..I have tried a couple of escape sequences. It did not work. Also it is not possible for us to change the XML attribute name.