DB2 Text Search enables you to issue SQL and XQuery statements to perform text search queries on data stored in a DB2 database. After creating a text index on a table column, the initial update will populate the index from the table content. However, unlike other database indices, text indices are not updated immediately when a document is inserted, updated, or deleted. Instead, to ensure proper search results, an incremental update must be applied to synchronize the content of the text index with the content of the table column. Instead of manually starting incremental updates, text indices can be scheduled for automatic updates.
DB2 Text Search provides two interfaces to initiate an index update: one through a command-line interface, the other a stored procedure interface. Any scheduling mechanism that allows invoking such an operation can be used to integrate text search index updates into scheduling workflows. For example, for a text index TEST.MYTXIDX in database TESTDB, the update can be started with one of the following operations:
db2ts update test.mytxidx for text connect to testdbcall sysproc.systs_update('TEST', 'MYTXIDX', '', 'en_US', ?)call sysproc.systs_admin_cmd('update index test.mytxidx for text', 'en_US', ?)
As an integrated scheduling mechanism, DB2 Text Search in DB2 10.1 provides the
option to schedule tasks for the DB2 Administrative Task Scheduler by specifying an
update frequency when creating or altering a text search index. The automatically
created schedule tasks execute a call to sysproc.systs_admin_cmd('update index indexname for
text with update minimum', language ,
?). The DB2 Administrative Task Scheduler will trigger the update task according to the
specified frequency as long as the database is active. This occurs independent of
the text search server status.
For indices that require other update options (for example, if the allrows option is used to force a rebuild instead of an
incremental update), the scheduling tasks must be created manually.
How to use the DB2 Administrative Task Scheduler
This section provides a short summary of core scheduler features pertinent to text search index scheduling. For further details, see The administrative task scheduler.
The administrative task scheduler is disabled by default and must, therefore, be enabled before text index updates can be scheduled. To enable the scheduler:
- Set the
DB2_ATS_ENABLEregistry variable to YES, TRUE, 1 or ON. - Ensure that the tablespace
SYSTOOLSPACEexists.
After setting the DB2_ATS_ENABLE registry variable, the instance must be restarted for the variable to take effect. Similar to several other tools and procedures, the scheduler requires the SYSTOOLSPACE tablespace. If this tablespace does not yet exist for the database, refer to Setting up the administrative scheduler for information about how to create the SYSTOOLSPACE manually. Without an existing SYSTOOLSPACE tablespace, specifying an update frequency to schedule automatic index updates fails with an error.
If the DB2_ATS_ENABLE registry variable is set to OFF, the update frequency can be
specified and schedule tasks are created, however, the scheduler will not be active
and no index updates triggered.
Creating, altering, and deleting schedule tasks
For scheduler tasks that are automatically created through the text search
interface, a generated taskname is assigned, and the update frequency is
automatically converted into cron format. The taskname is generated from the prefix TSSCHED_ together with a
unique indexidentifier (also used for staging and event tables).
When creating or altering a text index, specify an update frequency via the days of the week, hours, and minutes when the update should run. For example:
- To run every 30 minutes at the hour and half hour
db2ts "create index indexname for text on tablecolumn update frequency D(*) H(*) M(0,30)"db2 "call sysproc.systs_create(indexschema, indexname, tablecolumn, 'UPDATE FREQUENCY D(*) H(*) M(0,30)', 'en_US',?)"
- To modify a schedule, here to run every 20 minutes starting at 10 past the hour, use:
db2ts "alter index indexname for text update frequency D(*) H(*) M(10,30,50)"db2 "call sysproc.systs_alter(indexschema, indexname, 'UPDATE FREQUENCY D(*) H(*) M(10,30,50)', 'en_US',?)"
- To remove a scheduler task, use:
db2ts "alter index indexname for text update frequency NONE"db2 "call sysproc.systs_alter(indexschema, indexname, 'UPDATE FREQUENCY NONE','en_US',?)"
For details about the update frequency options, see SYSTS_ALTER procedure - Change the update characteristics of an index.
Depending on the privileges of the user administrative commands that affect the frequency parameter can have a different result for the task status. If a user does not have DBADMIN privileges, a schedule should therefore only be modified or removed by the user who created the schedule. This will prevent orphaned tasks and related cleanup steps. The following table summarizes these effects.
Table 1. Effect of user privilege on schedule task status for schedule frequency modifications
| Same user or DBADMIN | Different user | |||
|---|---|---|---|---|
| Task exists, Yes | Task exists, No | Task exists, Yes | Task exists, No | |
| Create with frequency | Modify1 | Add task | ---- | ---- |
| Alter - add frequency | Modify1 | Add task | ---- | ---- |
| Alter - change frequency | Modify task | Add task | Warning 'existing task' | Add task2 |
| Alter - remove frequency | Drop task | Skip2 | Orphaned task | n/a2 |
| Drop index (with frequency) | Drop task | Skip2 | Orphaned task | n/a2 |
1 If a schedule task exists already with the same name, the task will be modified to execute an index update with the specified schedule.
2 In these cases, a task should exist but doesn't — for example, when it was explicitly deleted.
Schedule tasks and task status
Update status information can be found in the event table for the text index and the
schedule task status table. The eventtablename
has the prefix TSEVENT_ followed by
an identifier nnnnnn. The same identifier is also used for the scheduler taskname.
- The event tablename can be found with the query
db2 "select eventtablename from sysibmts.tsindexes where indexschema = 'indexschema' and indexname = 'indexname'" - View update messages:
db2 "select message from sysibmts.tsevent_nnnnnn" - View your scheduled tasks:
db2 "select * from systools.admin_task_list" - View scheduled tasks created for text indices
db2 "select * from systools.admin_task_list where name like 'TSSCHED_%'" - View the status of schedule tasks:
db2 "select name, status, sqlcode from systools.admin_task_status"
What to consider when planning an update schedule
Independent of the scheduling mechanism, each text index update operation requires that the user has the appropriate data access privileges, as well as the SYSTS_MGR role. In addition, if the integrated scheduling mechanism is used, the user must have execution privileges to add, change, and drop scheduler tasks. The new scheduler task will be created with the access privileges of the user who creates the task.
Text search index schedule tasks adhere to the DB2 Administrative Task Scheduler access management. Users with the SYSTS_MGR role, but without DBADMIN privilege, will only see their own schedules, not those created by other users with the SYSTS_MGR role. So it is recommended that a dedicated user manages text search schedules. This also helps keep track of workload impact.
Update frequencies are usually arranged to accommodate incremental updates. If an update frequency is specified when creating the text index, the first scheduled task will execute an initial update. Depending on the size of the workload for the initial update, it is recommended to create the text search index without a schedule, execute the initial update and only then add the schedule to automatically trigger incremental updates.
The DB2 Administrative Task Scheduler uses a cron format and allows specifying day of
the week, hour and minute, and month and day. If the desired schedule
cannot be expressed with text search index update frequency settings, create a
schedule task manually as needed with the DB2 Administrative Task Scheduler tools
and use the stored procedure sysproc.systs_admin_cmd('update
index indexname for text options',
'language',?). An option you may specify, for
example, is with update minimum or updateautocommit options.
To avoid schedule conflicts, do not mix automatically generated and manually created schedule tasks for a text search index.
Text search index updates are processed as batch updates and during that time pose significant load on the system. Consider the configured maximum number of allowed parallel updates for the system when defining update frequencies for multiple text indices. Remember that for multi-partition systems, a single text index update is by default processed in parallel for all associated collections on the participating nodes, the update process count therefore must account for the number of collections.
Each incremental update operation checks for conflicting operations and if the index
is locked by another process, the error CIE0357E Conflicting
administration operation in progress is returned. Otherwise, it proceeds
to check whether the rows to be updated exceed the update minimum. If there are
enough rows to process, a cutoff timestamp is applied and changes made
after the cutoff will be processed with the next update.
To minimize processing costs (for example, in a scenario where data changes trickle in very unpredictably), it can be advantageous to manually create multiple schedule tasks with non-conflicting timing and different update minimum thresholds, like checking every 10 minutes for a large update minimum and every 30 minutes without or with a low update minimum.
- To verify whether a schedule task was triggered, use the schedule task status. A message for the event table will only be created, if data for indexing is available.
- If expected data is not available in search results, check that the index update
was executed:
- Verify that the environment variable is set and the database is active.
- Check the event table for error messages.
- For incremental updates, determine whether the number of entries in the staging tables exceeds the specified update minimum.
- Verify the schedule task in
SYSTOOLS.ADMIN_TASK_LISTand the task status inSYSTOOLS.ADMIN_TASK_STATUS.
- To remove a generated schedule, use the
db2ts alter indexindexnamefor text update frequency nonecommand or the corresponding stored procedure. If a generated scheduled task gets deleted by mistake (withsystools.admin_task_remove, for example), recreate the schedule task by specifying the update frequency in the alter command. - If the text index was dropped or the text index was altered and the frequency
removed by a user who is not the creator of the index and does not have DBADMIN
privilege, remove the orphaned schedule task as a user with DBADMIN privilege.
Use the
systools.admin_task_removeprocedure, for example,db2 "call systools.admin_task_remove('TSSCH_111713',NULL)"
Notes on upgrades from earlier releases
- When upgrading existing text search indices with a frequency setting to DB2 10.1, the upgrade procedure for text search indices attempts to create the schedule tasks automatically. This requires that the DB2 Administrative Task Scheduler setup has been enabled for the instance. Make sure SYSTOOLSPACE exists before upgrading the database.
- If the schedule tasks cannot be created during the upgrade, use the alter index for text operation to add the schedule tasks.
Sample scenario: Schedule a text index
Listing 1. Create table BOOKS and insert sample data
CREATE TABLE books ( isbn VARCHAR(18) NOT NULL PRIMARY KEY,
author VARCHAR(300), title LONG VARCHAR, year INTEGER, bookinfo XML)
INSERT INTO books VALUES ('123-014014014', 'Joe Climber','Climber''s
Mountain Tops',2001,XMLPARSE(DOCUMENT '<bookinfo> <author>Joe Climber</author>
<title>Climber''s Mountain Tops</title><story>This vivid description of
Joe Climber''s experiences when tackling the mountains of his native Mountainland
lets you hold your breath when you follow Joe on his adventures in the regions
where the air is thin and the weather is treacherous. Includes beautiful color photos of
Moun-tainland''s mountain ranges.</story> <year>1995</year>
<price>16.00</price> <pages<176</pages></bookinfo>'));
INSERT INTO books VALUES ('678-014014078', 'Joe Smith', 'The Range', 2001,
XMLPARSE(DOCUMENT '<bookinfo> <author>Joe Smith</author>
<title>The Range</title> <story>All you need to know about
kitchen ranges. A pictured description based on the most recent ergonomics
studies for everybody who is involved in food preparation in the home.</story>
<year>1991</year> <price>6.00</price> <pages>76
</pages> </bookinfo>'));
|
Note that text indices require that a primary key is defined for the table.
Create and schedule a text index
The following command creates a text index with name title_indx and a scheduler task to update the text index every five minutes, 24x7, if there are at least two entries that need to be updated. When the scheduler starts the update for the first time, it will execute an initial update for the new index independent of the number of rows in the table.
Listing 2. Create a text index with update schedule on column 'title'
db2ts "create index db2ts.title_indx for text on books(title) update frequency
D(*) H(*) M(0,5,10,15,20,25,30,35,40,45,50,55) update minimum 2 connect to testdb"
|
Create a text index on column title and schedule for incremental updates
Initial updates usually involve a significantly larger workload than incremental updates and, therefore, often do not fit well with schedules intended to manage incremental updates. To avoid that a large workload is processed for an initial update at an inopportune time, run the first update manually and only add the schedule after successful completion of the initial update.
In the following example, the schedule is again set to run an incremental update every five minutes if at least two entries need to be processed.
Listing 3. Create a text index on column title and schedule for automatic incremental updates
db2ts "create index db2ts.title_indx for text on books(title) connect to testdb"
db2ts "update index db2ts.title_indx for text connect to testdb"
db2ts "alter index db2ts.title_indx for text update frequency
D(*) H(*) M(0,5,10,15,20,25,30,35,40,45,50,55) update minimum 2 connect to testdb"
|
View update and schedule information
Retrieve the event table name from the text index catalog and record the name for later use. Take note of the numeric suffix, this is the unique identifier also used for the schedule tasks.
Listing 4. get event table name
db2 "select substr(eventtablename,1,30) as eventtable from sysibmts.tsindexes
where indexname='TITLE_INDX'"
EVENTTABLE
------------------------------
SYSTSEVENT_111713
1 record(s) selected.
|
The event table contains status messages with statistics information as well as warnings or errors that occurred during the update processing. NOTE: Message output shortened for readability.
Listing 5. view messages in the event table
db2 "select substr(message,1,100) from sysibmts.TSEVENT_111713"
1
--------------------------------------------------------------------------------
CIE00003 Index update started.
CIE00011 Index build started.
CIE00012 Index build ended.
CIE00004 Index update ended.
CIE00005I DB2 Text Search Index update statistics: "2" number of rows processed.
5 record(s) selected.
|
Schedule information is available through the procedures and views for the DB2 Administrative Task Scheduler.
Listing 6. view schedule task
db2 "select substr(name,1,12) as name, taskid, substr(schedule,1,50) as schedule from
systools.admin_task_list where name ='TSSCH_111713'"
NAME TASKID SCHEDULE
------------ ------ --------------------------------------------------
TSSCH_111713 1 0,5,10,15,20,25,30,35,40,45,50,55 * * * *
1 record(s) selected.
|
Listing 7. view task status for the text index schedule
db2 "select status,sqlcode from systools.admin_task_status where name='TSSCH_111713'"
STATUS SQLCODE
---------- -----------
COMPLETE 0
COMPLETE 0
COMPLETE 0
3 record(s) selected.
|
DB2 Text Search indices are updated asynchronously. Index updates can be started manually or scheduled to be automatically triggered.
- Index updates can be scheduled by using the command-line or the stored procedure interface. An integrated scheduling administration is provided, serving as the most common scheduling scenario and is based on the DB2 Administrative Task Scheduler.
- When planning text index schedule frequencies, use the integrated schedule administration only when a single schedule task is sufficient for a text index; otherwise, create the update schedule tasks for this index manually.
- To avoid resource contention, adjust the maximum number of parallel text index updates to match the Text Search server resource configuration, and stagger multiple text index updates appropriately.
- Review the event table for statistical information to adjust schedule frequencies, if necessary.
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample DB2 CLP script | scheduling_textindexes.zip | 4KB | HTTP |
Information about download methods
Learn
- "DB2
Text Search, Part 1: Full text search" is a step-by-step guide
through a simple full-text search scenario.
- "DB2
Text Search, Part 2: Understanding linguistic processing for full text search in DB2" explains
why documents get included in search results.
- "DB2 Text Search, Part 3: What to consider when planning text index backups" discusses procedures to
back up and restore text indices.
- "DB2 Text Search, Part 4: High-availability setups with DB2 Text Search" provides examples how to
include text indices in a fail-over scenario.
- Visit the developerWorks
resource page for DB2 for Linux, UNIX, and Windows to read articles and
tutorials and connect to other resources to expand your DB2 skills.
- Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation,
how-to articles, education, downloads, product information, and
more.
- Stay current with
developerWorks technical events and webcasts.
- Follow developerWorks on
Twitter.
Get products and technologies
- Build your next
development project with
IBM trial software,
available for download directly from developerWorks.
- Now you can use
DB2 for free. Download DB2 Express-C, a no-charge
version of DB2 Express Edition for the community that offers the same core
data features as DB2 Express Edition and provides a solid base to build
and deploy applications.
Discuss
- Participate in the discussion forum.
- Check out the
developerWorks
blogs and get involved in the
developerWorks community.
Marion Behnen is a senior software engineer within the IBM Software Group and works as a technical lead for full-text search in DB2. She has more than 20 years of experience with database application development, business reporting, data warehousing, and business process integration. Prior to joining IBM, she was involved with many aspects of business process and data integration, in particular for the manufacturing industry.





