DB2 Text Search, Part 5: Scheduling updates for DB2 Text Search indices

IBM® DB2® Text Search enables users to create full-text indices in a DB2 for Linux®, UNIX®, and Windows® database. Full-text indices are updated asynchronously, and it is necessary to explicitly and regularly start text index updates to ensure that an index properly reflects the content of the base table. A convenient mechanism to automate this is to use a scheduler to start regular updates in the required frequency. This article provides information about scheduling options and discusses troubleshooting the default scheduling mechanism.

Marion Behnen (mbehnen@us.ibm.com), DB2 Text Search Development, IBM

Marion Behnen is a senior software engineer within the IBM Software Group and works as a technical lead for full-text search in DB2. She has more than 20 years of experience with database application development, business reporting, data warehousing, and business process integration. Prior to joining IBM, she was involved with many aspects of business process and data integration, in particular for the manufacturing industry.



Kiran Kumar (kambaldh@in.ibm.com), DB2 Text Search QA, IBM

Kiran KumarKiran Kumar is a software engineer with IBM India Software Labs. He is a certified Advanced DB2 administrator and quality engineer with two years experience of full-text search in DB2, including DB2 Text Search.



27 September 2012

Also available in Chinese

Introduction

DB2 Text Search enables you to issue SQL and XQuery statements to perform text search queries on data stored in a DB2 database. After creating a text index on a table column, the initial update will populate the index from the table content. However, unlike other database indices, text indices are not updated immediately when a document is inserted, updated, or deleted. Instead, to ensure proper search results, an incremental update must be applied to synchronize the content of the text index with the content of the table column. Instead of manually starting incremental updates, text indices can be scheduled for automatic updates.


Scheduling mechanisms

DB2 Text Search provides two interfaces to initiate an index update: one through a command-line interface, the other a stored procedure interface. Any scheduling mechanism that allows invoking such an operation can be used to integrate text search index updates into scheduling workflows. For example, for a text index TEST.MYTXIDX in database TESTDB, the update can be started with one of the following operations:

  • db2ts update test.mytxidx for text connect to testdb
  • call sysproc.systs_update('TEST', 'MYTXIDX', '', 'en_US', ?)
  • call sysproc.systs_admin_cmd('update index test.mytxidx for text', 'en_US', ?)

As an integrated scheduling mechanism, DB2 Text Search in DB2 10.1 provides the option to schedule tasks for the DB2 Administrative Task Scheduler by specifying an update frequency when creating or altering a text search index. The automatically created schedule tasks execute a call to sysproc.systs_admin_cmd('update index indexname for text with update minimum', language , ?). The DB2 Administrative Task Scheduler will trigger the update task according to the specified frequency as long as the database is active. This occurs independent of the text search server status.

For indices that require other update options (for example, if the allrows option is used to force a rebuild instead of an incremental update), the scheduling tasks must be created manually.


How to use the DB2 Administrative Task Scheduler

This section provides a short summary of core scheduler features pertinent to text search index scheduling. For further details, see The administrative task scheduler.

Enabling the scheduler

The administrative task scheduler is disabled by default and must, therefore, be enabled before text index updates can be scheduled. To enable the scheduler:

  1. Set the DB2_ATS_ENABLE registry variable to YES, TRUE, 1 or ON.
  2. Ensure that the tablespace SYSTOOLSPACE exists.

After setting the DB2_ATS_ENABLE registry variable, the instance must be restarted for the variable to take effect. Similar to several other tools and procedures, the scheduler requires the SYSTOOLSPACE tablespace. If this tablespace does not yet exist for the database, refer to Setting up the administrative scheduler for information about how to create the SYSTOOLSPACE manually. Without an existing SYSTOOLSPACE tablespace, specifying an update frequency to schedule automatic index updates fails with an error.

If the DB2_ATS_ENABLE registry variable is set to OFF, the update frequency can be specified and schedule tasks are created, however, the scheduler will not be active and no index updates triggered.

Creating, altering, and deleting schedule tasks

For scheduler tasks that are automatically created through the text search interface, a generated taskname is assigned, and the update frequency is automatically converted into cron format. The taskname is generated from the prefix TSSCHED_ together with a unique indexidentifier (also used for staging and event tables).

When creating or altering a text index, specify an update frequency via the days of the week, hours, and minutes when the update should run. For example:

  • To run every 30 minutes at the hour and half hour
    • db2ts "create index indexname for text on tablecolumn update frequency D(*) H(*) M(0,30)"
    • db2 "call sysproc.systs_create(indexschema, indexname, tablecolumn, 'UPDATE FREQUENCY D(*) H(*) M(0,30)', 'en_US',?)"
  • To modify a schedule, here to run every 20 minutes starting at 10 past the hour, use:
    • db2ts "alter index indexname for text update frequency D(*) H(*) M(10,30,50)"
    • db2 "call sysproc.systs_alter(indexschema, indexname, 'UPDATE FREQUENCY D(*) H(*) M(10,30,50)', 'en_US',?)"
  • To remove a scheduler task, use:
    • db2ts "alter index indexname for text update frequency NONE"
    • db2 "call sysproc.systs_alter(indexschema, indexname, 'UPDATE FREQUENCY NONE','en_US',?)"

For details about the update frequency options, see SYSTS_ALTER procedure - Change the update characteristics of an index.

Depending on the privileges of the user administrative commands that affect the frequency parameter can have a different result for the task status. If a user does not have DBADMIN privileges, a schedule should therefore only be modified or removed by the user who created the schedule. This will prevent orphaned tasks and related cleanup steps. The following table summarizes these effects.

Table 1. Effect of user privilege on schedule task status for schedule frequency modifications
Same user or DBADMINDifferent user
Task exists, YesTask exists, NoTask exists, YesTask exists, No
Create with frequencyModify1Add task--------
Alter - add frequencyModify1Add task--------
Alter - change frequencyModify taskAdd taskWarning 'existing task'Add task2
Alter - remove frequencyDrop taskSkip2Orphaned taskn/a2
Drop index (with frequency)Drop taskSkip2Orphaned taskn/a2

1 If a schedule task exists already with the same name, the task will be modified to execute an index update with the specified schedule.

2 In these cases, a task should exist but doesn't — for example, when it was explicitly deleted.

Schedule tasks and task status

Update status information can be found in the event table for the text index and the schedule task status table. The eventtablename has the prefix TSEVENT_ followed by an identifier nnnnnn. The same identifier is also used for the scheduler taskname.

  • The event tablename can be found with the query db2 "select eventtablename from sysibmts.tsindexes where indexschema = 'indexschema' and indexname = 'indexname'"
  • View update messages: db2 "select message from sysibmts.tsevent_nnnnnn"
  • View your scheduled tasks: db2 "select * from systools.admin_task_list"
  • View scheduled tasks created for text indices db2 "select * from systools.admin_task_list where name like 'TSSCHED_%'"
  • View the status of schedule tasks: db2 "select name, status, sqlcode from systools.admin_task_status"

What to consider when planning an update schedule

Authorizations

Independent of the scheduling mechanism, each text index update operation requires that the user has the appropriate data access privileges, as well as the SYSTS_MGR role. In addition, if the integrated scheduling mechanism is used, the user must have execution privileges to add, change, and drop scheduler tasks. The new scheduler task will be created with the access privileges of the user who creates the task.

Text search index schedule tasks adhere to the DB2 Administrative Task Scheduler access management. Users with the SYSTS_MGR role, but without DBADMIN privilege, will only see their own schedules, not those created by other users with the SYSTS_MGR role. So it is recommended that a dedicated user manages text search schedules. This also helps keep track of workload impact.

Frequency and workload

Update frequencies are usually arranged to accommodate incremental updates. If an update frequency is specified when creating the text index, the first scheduled task will execute an initial update. Depending on the size of the workload for the initial update, it is recommended to create the text search index without a schedule, execute the initial update and only then add the schedule to automatically trigger incremental updates.

The DB2 Administrative Task Scheduler uses a cron format and allows specifying day of the week, hour and minute, and month and day. If the desired schedule cannot be expressed with text search index update frequency settings, create a schedule task manually as needed with the DB2 Administrative Task Scheduler tools and use the stored procedure sysproc.systs_admin_cmd('update index indexname for text options', 'language',?). An option you may specify, for example, is with update minimum or updateautocommit options.

To avoid schedule conflicts, do not mix automatically generated and manually created schedule tasks for a text search index.

Text search index updates are processed as batch updates and during that time pose significant load on the system. Consider the configured maximum number of allowed parallel updates for the system when defining update frequencies for multiple text indices. Remember that for multi-partition systems, a single text index update is by default processed in parallel for all associated collections on the participating nodes, the update process count therefore must account for the number of collections.

Each incremental update operation checks for conflicting operations and if the index is locked by another process, the error CIE0357E Conflicting administration operation in progress is returned. Otherwise, it proceeds to check whether the rows to be updated exceed the update minimum. If there are enough rows to process, a cutoff timestamp is applied and changes made after the cutoff will be processed with the next update.

To minimize processing costs (for example, in a scenario where data changes trickle in very unpredictably), it can be advantageous to manually create multiple schedule tasks with non-conflicting timing and different update minimum thresholds, like checking every 10 minutes for a large update minimum and every 30 minutes without or with a low update minimum.


Troubleshooting

  • To verify whether a schedule task was triggered, use the schedule task status. A message for the event table will only be created, if data for indexing is available.
  • If expected data is not available in search results, check that the index update was executed:
    1. Verify that the environment variable is set and the database is active.
    2. Check the event table for error messages.
    3. For incremental updates, determine whether the number of entries in the staging tables exceeds the specified update minimum.
    4. Verify the schedule task in SYSTOOLS.ADMIN_TASK_LIST and the task status in SYSTOOLS.ADMIN_TASK_STATUS.
  • To remove a generated schedule, use the db2ts alter index indexname for text update frequency none command or the corresponding stored procedure. If a generated scheduled task gets deleted by mistake (with systools.admin_task_remove , for example), recreate the schedule task by specifying the update frequency in the alter command.
  • If the text index was dropped or the text index was altered and the frequency removed by a user who is not the creator of the index and does not have DBADMIN privilege, remove the orphaned schedule task as a user with DBADMIN privilege. Use the systools.admin_task_remove procedure, for example, db2 "call systools.admin_task_remove('TSSCH_111713',NULL)"

Notes on upgrades from earlier releases

  • When upgrading existing text search indices with a frequency setting to DB2 10.1, the upgrade procedure for text search indices attempts to create the schedule tasks automatically. This requires that the DB2 Administrative Task Scheduler setup has been enabled for the instance. Make sure SYSTOOLSPACE exists before upgrading the database.
  • If the schedule tasks cannot be created during the upgrade, use the alter index for text operation to add the schedule tasks.

Sample scenario: Schedule a text index

Create and populate a table

Listing 1. Create table BOOKS and insert sample data
CREATE TABLE books ( isbn VARCHAR(18) NOT NULL PRIMARY KEY,
author VARCHAR(300), title LONG VARCHAR, year INTEGER, bookinfo XML)

INSERT INTO books VALUES ('123-014014014', 'Joe Climber','Climber''s
Mountain Tops',2001,XMLPARSE(DOCUMENT '<bookinfo> <author>Joe Climber</author>
<title>Climber''s Mountain Tops</title><story>This vivid description of
Joe Climber''s experiences when tackling the mountains of his native Mountainland
lets you hold your breath when you follow Joe on his adventures in the regions
where the air is thin and the weather is treacherous. Includes beautiful color photos of 
Moun-tainland''s mountain ranges.</story> <year>1995</year> 
<price>16.00</price> <pages<176</pages></bookinfo>'));
                
INSERT INTO books VALUES ('678-014014078', 'Joe Smith', 'The Range', 2001,
XMLPARSE(DOCUMENT '<bookinfo> <author>Joe Smith</author>
<title>The Range</title> <story>All you need to know about
kitchen ranges. A pictured description based on the most recent ergonomics
studies for everybody who is involved in food preparation in the home.</story>
<year>1991</year> <price>6.00</price> <pages>76
</pages> </bookinfo>'));

Note that text indices require that a primary key is defined for the table.

Create and schedule a text index

The following command creates a text index with name title_indx and a scheduler task to update the text index every five minutes, 24x7, if there are at least two entries that need to be updated. When the scheduler starts the update for the first time, it will execute an initial update for the new index independent of the number of rows in the table.

Listing 2. Create a text index with update schedule on column 'title'
db2ts "create index db2ts.title_indx for text on books(title) update frequency
    D(*) H(*) M(0,5,10,15,20,25,30,35,40,45,50,55) update minimum 2 connect to testdb"

Create a text index on column title and schedule for incremental updates

Initial updates usually involve a significantly larger workload than incremental updates and, therefore, often do not fit well with schedules intended to manage incremental updates. To avoid that a large workload is processed for an initial update at an inopportune time, run the first update manually and only add the schedule after successful completion of the initial update.

In the following example, the schedule is again set to run an incremental update every five minutes if at least two entries need to be processed.

Listing 3. Create a text index on column title and schedule for automatic incremental updates
db2ts "create index db2ts.title_indx for text on books(title) connect to testdb"

db2ts "update index db2ts.title_indx for text connect to testdb"

db2ts "alter index db2ts.title_indx for text update frequency 
    D(*) H(*) M(0,5,10,15,20,25,30,35,40,45,50,55) update minimum 2 connect to testdb"

View update and schedule information

Retrieve the event table name from the text index catalog and record the name for later use. Take note of the numeric suffix, this is the unique identifier also used for the schedule tasks.

Listing 4. get event table name
db2 "select substr(eventtablename,1,30) as eventtable from sysibmts.tsindexes 
where indexname='TITLE_INDX'"
                
EVENTTABLE
------------------------------
SYSTSEVENT_111713
                
1 record(s) selected.

The event table contains status messages with statistics information as well as warnings or errors that occurred during the update processing. NOTE: Message output shortened for readability.

Listing 5. view messages in the event table
db2 "select substr(message,1,100) from sysibmts.TSEVENT_111713"
                
1
--------------------------------------------------------------------------------
CIE00003 Index update started.
CIE00011 Index build started.
CIE00012 Index build ended.
CIE00004 Index update ended.
CIE00005I DB2 Text Search Index update statistics: "2" number of rows processed. 
                
5 record(s) selected.

Schedule information is available through the procedures and views for the DB2 Administrative Task Scheduler.

Listing 6. view schedule task
db2 "select substr(name,1,12) as name, taskid, substr(schedule,1,50) as schedule from 
systools.admin_task_list where name ='TSSCH_111713'"
                
NAME            TASKID      SCHEDULE
------------   ------      --------------------------------------------------
TSSCH_111713        1       0,5,10,15,20,25,30,35,40,45,50,55 * * * *
                
1 record(s) selected.
Listing 7. view task status for the text index schedule
db2 "select status,sqlcode from systools.admin_task_status where name='TSSCH_111713'"
                
STATUS     SQLCODE
---------- -----------
COMPLETE             0
COMPLETE             0
COMPLETE             0
                
3 record(s) selected.

Conclusion

DB2 Text Search indices are updated asynchronously. Index updates can be started manually or scheduled to be automatically triggered.

  • Index updates can be scheduled by using the command-line or the stored procedure interface. An integrated scheduling administration is provided, serving as the most common scheduling scenario and is based on the DB2 Administrative Task Scheduler.
  • When planning text index schedule frequencies, use the integrated schedule administration only when a single schedule task is sufficient for a text index; otherwise, create the update schedule tasks for this index manually.
  • To avoid resource contention, adjust the maximum number of parallel text index updates to match the Text Search server resource configuration, and stagger multiple text index updates appropriately.
  • Review the event table for statistical information to adjust schedule frequencies, if necessary.

Download

DescriptionNameSize
Sample DB2 CLP scriptscheduling_textindexes.zip4KB

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=837462
ArticleTitle=DB2 Text Search, Part 5: Scheduling updates for DB2 Text Search indices
publish-date=09272012