Its been a great year and a very busy December for me. I have been getting many questions on BTS in the past week and it good it being used so much. As this will be the last blog entry of 2010, I will discuss a question I was asked on an error that BTS can generate:
(BTSB0) - bts clucene error: Too Many Clauses
When CLucene rewrites a search expression
with, by default, more than 1024 clauses, this error is generated.
What causes this error and how can it be addressed will be discuss here.
First, the cause: if it happens, it usually happens on a wildcard query.
In a previous blog entry I discussed how the input stream of characters are analyzed
into tokens. The CLucene index, on which BTS is based, maintains a dictionary of these tokens on a per field basis. When a wildcard or fuzzy
query is specified on a field, its dictionary is searched for all the tokens that will match
the wildcard or fuzzy expression and then its rewrites that expression with a number
of simple term searches joined with a
Boolean OR operator.
I have a data-set with a large number of distinct tokens that I use for testing. If I search this index
with the wildcard search:
the query is rewritten as
(slob or slobber or sloe or slog or slogan or sloop)
In this example we see the wildcard search is replaced with the search expression of 6 simple term clauses. The wildcard search of sl* generates a search expression with some 956 simple terms. And finally if I search for just s*, there are 32798 terms and, by default, it will generate the Too Many Clauses error.
CLucene has this limit because, as we see with s*, some wildcard expansions can be very large and this can use a lot of virtual memory. CLucene has a parameter to limit and help control memory usage. The good news is that it is tunable and an index basis. In BTS, if you have indexes that you want to allow very large query rewrites, then you can specify the max_clause_count parameter. For my test index, if I want to search on the wildcard search s*, then I need to create the index with a max_clause_count larger that 32798. For example:
create index bts_idx on bts_tab (text bts_char_ops)
using bts (max_clause_count="4000"); in sbspace1;
Keep in mind that these queries can result in more memory usage and you may see the server allocation more virtual segments. The number of virtual segments attached can be monitored with onstat -g seg.
For those celebrating the holiday season, I wish you all the best. I will be back here next year with some more interesting information on extensibility in the Informix Server.