Text search index creation, updates and property alterations

Text search index creation is the process of defining the properties of a text index. After you create a text search index, you must update it by adding data from the table that it is associated with. You can also alter some properties of the text search index later, such as the UPDATE FREQUENCY or UPDATE MINIMUM parameters.

You can use a text search index to search through the data in a text column using text search functions. A text search index consists of significant terms that are extracted from text documents. The primary key of the table row is used in the index to identify the source of the terms.

Immediately after its creation, a text search index contains no data. You add data to a text search index by using the db2ts UPDATE INDEX command or the SYSTS_UPDATE administrative SQL routine. The first index update, also known as initial update, adds all text documents in a text column to the text search index. Subsequent updates, also known as incremental updates, synchronize the data in the base table with the text search index.

In the following example, a user creates a text search index called MYSCHEMA.PRODUCTINDEX on the PRODUCT table in the SAMPLE database. Creating a text search index and then performing initial and incremental updates shows that the index is empty until the user performs an initial update and that as the user adds data to the table, an incremental update must be run to add the new data to the text search index.
Figure 1. Creating a text search index and then performing initial and incremental updates
Creating a text search index and then performing initial and incremental updates
DB2® Text Search provides two methods for synchronizing a text index with its table:
  • The basic synchronization method uses triggers that automatically store information about new, changed, and deleted documents in a staging table. There is one staging table for each text index.

    Figure 2. Incremental update with triggers
    Incremental update with triggers

    Because the basic method uses only triggers, updates that are not recognized by triggers are ignored, for example, loading data with the LOAD command and attaching or detaching the ranges of a range-partitioned table.

  • The extended synchronization method uses a trigger to store information about changed documents in a staging table but captures information about new and deleted documents through integrity processing and stores that information in a text-maintained auxiliary staging table. If you attach a partition or load data, you must then issue the SET INTEGRITY command on the base table to make data available in the auxiliary staging table. As for the case when a partition is detached, the staging table then requires another SET INTEGRITY command to make the data accessible for processing. Alternatively, a RESET PENDING command on the base table can be used to make the data accessible in all its auxiliary staging tables. The base table is fully accessible for read and write operations while the command is executing. If you detach a partition, you must issue the RESET PENDING command on the base table or the SET INTEGRITY command on each of the staging tables.

    Figure 3. Incremental update with triggers and integrity processing
    Incremental update with triggers and integrity processing

Some database operations implicitly or explicitly invalidate the text search index. An explicit invalidation will set the status of the text search index INDSTATUS='INVALID' in the SYSIBMTS.TSINDEXES administrative view, for example, the command ALTER DATABASE PARTITION GROUP. An implicit invalidation occurs when content changes bypass the staging mechanism, for example, if a LOAD INSERT is used without the extended staging infrastructure. An implicit invalidation will not mark the text search index as invalid.

You can update the text index by using a manual or automatic option. The automatic option uses an update schedule with specified days and times. You can manually update the text search index by issuing the UPDATE INDEX FOR TEXT command or the SYSPROC.SYSTS_UPDATE procedure. The text search index is updated asynchronously, that is, outside the transaction that inserts, updates, or deletes data in the database. Asynchronous text search index update processing improves throughput and concurrency because multiple updates can be batched and applied to a copy of the affected text index segments. The text search index is then only locked for read access for a short period of time while the updated index segments are put in place of the original.

Text search indexes are reorganized automatically as needed; in addition, you can explicitly trigger a reorganization with the adminTool or re-create an index with the ALLROWS option when you update it.