Transaction considerations

Consider your environment when deciding how often to update your text indexes from the underlying data in the database tables.

A large consideration for traditional database users is the concept of transaction boundaries and transaction processing. A classic example is a bank transaction where money is transferred from one account to another. The transfer is considered to be a single transaction; either the transfer occurs or it does not. The customer would not appreciate having the money removed from one account but not show up in the other account until some time later. Conversely, the bank would not want the money to show up in both accounts for some amount of time until the transaction was finally completed. The idea is that if a change is made, it is reflected immediately.

There are many examples in information management where delays are more acceptable and even expected. In a traditional data warehouse design, the contents of the data warehouse often lag the operational data by hours or more. This acceptance of delay is based on a combination of realistic expectations of a data warehouse and a desire for predictable, coherent data.

Unstructured text searches lie somewhere between these two boundaries. Practically speaking, indexing text documents is an intensive process to analyze and break down the underlying meaning of the words in the document. Text searches allow a search for ‘mice' to find documents with ‘mouse'. This search result happens due to the ability of the indexing technique to break down words into their underlying meanings. This analysis is done at the time when a document is indexed in order to make subsequent searches as fast as possible.

Users of a traditional database index expect the index to reflect the state of the data in the database table. This same expectation does not hold for a text index. The contents of the text index reflect the state of the table based on the time when the last update (SYSTS_UPDATE) was performed.

In a highly changing environment, it is unlikely that the text index would reflect the current state of the table at any given time. However, in a more predictable environment where the database table is updated less frequently or in batch mode, the text index updates can be timed to perform after the table update, accurately reflecting the state of the table.

It is important to have the appropriate expectation for a text index. Use the UPDATE FREQUENCY option on the text index SYSPROC.SYSTS_CREATE or SYSPROC.SYSTS_UPDATE, or explicitly call the SYSTS_UPDATE procedure to update the contents of the text index appropriately.

For more static environments such as bulk data loads, it makes sense to time the text index update to take place after the bulk load is completed. For more transaction-oriented environments, the UPDATE FREQUENCY value can be set to a short duration or the SYSTS_UPDATE procedure can be invoked frequently. It is normally true that the more frequent the update, the more workload is placed on the machine.