DB2 Text Search, Part 3: What to consider when planning text index backups

IBM® DB2® Text Search enables users to create full-text search applications in a DB2 for Linux®, UNIX®, and Windows® database. Full-text search is fully integrated into database query processing, but text-search indices are file system-based, and backup requires synchronization with DB2 backup and recovery facilities. It is necessary to combine database backups and text-search index backups to prepare for recovery scenarios. This article provides information about how database objects and text-search indices relate and provides recommendations for backup and recovery procedures.

Share:

Marion Behnen (mbehnen@us.ibm.com), DB2 Text Search Development, IBM

Marion Behnen is a senior software engineer within the IBM Software Group and works as a technical lead for full-text search in DB2. She has more than 20 years of experience with database application development, business reporting, data warehousing, and business process integration. Prior to joining IBM, she was involved with many aspects of business process and data integration, in particular for the manufacturing industry.



Kavya Rao (kavyarao@in.ibm.com), DB2 Text Search Support, IBM

Kavya Rao is a software engineer with IBM India Software Labs. She is a certified DB2 administrator and a support specialist with several years' experience of full-text search in DB2, including DB2 Net Search Extender and DB2 Text Search.



23 February 2012

Also available in Chinese

Introduction

Text-search indices are tied to database tables, but managed by the text-search server process on the file system. The text-search server process runs independently from the database server and text-search indices are, therefore, not included in the DB2 Backup and Restore operations. Except for scenarios with very small text-search indices, which can easily and quickly be recreated after restoring a database, it is necessary to manually synchronize the DB2 backup and the text-search index backup to be able to achieve a consistent state when a restore operation is necessary. As illustrated in the following figure, such a synchronization needs to take several types of data into account.

Figure 1. Basic single-node scenario
Image shows basic scenario for text indices in a single-node setup

The following objects are relevant:

  • Database
    • Base table — The database table that owns the text search index.
    • Text-search index tables — The staging table and event tables for the text-search index.
    • Text-search index catalog — The administrative tables for DB2 Text Search containing metadata for the text-search solution and for each text-search index.
  • File system
    • Text-search index collection — The files that contain the index data for a text-search index (names are prefixed with the instance and database name); each collection is in a dedicated directory and consists of multiple segments.
    • TSS-Configuration — (Selected) content of .../db2tss/ config
      • ciedem.dat — Contains the scheduler data (per database instance)
      • systemSettings.xml — Lists the existing collections (per database instance)
      • Synonym dictionaries
      • Configuration files

The logfiles in db2tss/logs are for informational purposes. They are not mandatory for a system recovery. The distribution of base table, text-search index catalog tables and text-search index administration tables is mostly relevant in selective backup and restore scenarios.


Backup options

DB2 provides several backup methods:

  • Full backup (online or offline)
  • Incremental backup includes all changes since the last full backup
  • Delta backup includes all changes since the last backup (any type)
  • Logs contain transactions since the last backup (restoring from a full online backup requires the logs)

The DB2 Backup includes text-search index catalog data and text-search index administration data. However, the actual text-search index collection content is not included. Instead, the text-search index collection backup relies on file backup mechanisms. It is possible, for example, to back up all files that belong to a text-search index collection or only those with a recent timestamp. Be aware that there is no online backup option provided by the text-search server to control parallel activity for the text-search index. Therefore, it must be explicitly ensured that there are no write operations on a text-search index during an index backup. The following data must be synchronized for a backup:

  • Per index
    • DB2
      • The database table that owns the text-search index
      • The staging table for the text-search index
        Note: Event tables contain historical data and their content is not critical for a synchronization of base table and text-search index in a recovery scenario provided previous document errors have been addressed.
    • File system
      • Collection data for the text-search indices
      • Text-search index metadata
  • Per database
    • Catalog tables with metadata for DB2 Text Search
      Note: Command locks may exist after a restore operation and must be removed explicitly to continue regular processing
  • Per instance
    • File system
      • Scheduler data
      • Text-search server metadata

For an integrated solution of database and text-search index backups, offline backups provide the advantage that there are no contention aspects to consider for parallel activities that involve text-search data. On the other hand, online backups may be more complex to synchronize, but require less downtime. The following sections show sample procedures and discuss details for these options.

Basic backup procedure for offline backup

Use the following steps to back up a database with DB2 Text Search indices:

  1. Get a current list of text-search index locations for DB2 Text Search indices: db2 "select indschema, indname, collectiondirectory, collectionnameprefix from sysibmts.tsindexes". If no specific location is defined, the collections reside in the default location in the database instance path.
  2. Ensure that no DB2 Text Search UPDATE INDEX command is running. If necessary:
    • Disable scheduled updates
    • Check text-search index event tables for update completion messages

    Note: If an index update is interrupted, all non-committed work is lost and must be repeated. This includes the case that the text-search server has already committed work, but the database has not finished processing the update.
  3. Stop the DB2 Text Search services. If the database instance contains multiple databases enabled for text search, make sure to block text-search index updates in all such databases in this instance before stopping the text-search server: db2ts stop for text. See discussion for reducing downtimes in a separate section.
  4. Back up
    • The database
      • For further information about database backup methods and criteria, refer to the DB2 Information Center for Linux, UNIX, and Windows.
    • The text-search index locations identified in step 1 with your preferred method for file backups
      • Include any synonym dictionaries explicitly, if not located in the db2tss/config directory.
    • The configuration directory
      • UNIX: <instancedir> /sqllib/db2tss/config
      • Windows: <instancedatadir>\db2tss\config; default location C:\Documents and Settings\All Users\Application Data\IBM\DB2\DB2COPY1\DB2\db2tss\config
      Configuration data can change when text-search indices are added, altered, or deleted, or through changes via the Configuration Tool.
    • Optional: The text-search server log directory (only to preserve history; not relevant for text-search index data)
      • UNIX: <instancedir>/sqllib/db2tss/log
      • Windows: < instancedatadir >\db2tss\log
  5. Restart DB2 Text Search services: db2ts start for text

The above procedure relies on a combined database and text-search index backup. However, it is often not feasible to backup a text-search index after each update, and further steps are, therefore, necessary to enable a synchronization after incremental updates.


Incorporating incremental updates

Text-search indices are updated with an asynchronous update process. After an initial update that processes the data from the base table, all subsequent changes are captured in a staging table and processed with an incremental update. The staging table is backed up together with the base table via a DB2 backup command. The following figure illustrates some scenarios in which database backups and text-search index backups might be synchronized or taken at different times.

Figure 2. Base and staging table changing with updates
Image shows that changes in the base table are propagated to the staging table and from there to the text index

As shown in Figure 2, without further synchronization measures, incremental updates can lead to an inconsistency in a restore scenario, if a text-search index update was applied to a text-search index without a corresponding text-search index backup. For example, if a database backup and a text-search index backup were completed at T0, and another database backup is taken at T4, the text-search index update at T2 will have deleted the staging table entries, and only new changes after T3 are available. A restore that wants to achieve T5 with a roll-forward will miss previously processed entries. The following options may be used to handle such a scenario:

  1. Restore only without roll-forward and ensure matching database and text-search index backups
  2. Back up the text-search index immediately after each text-search index update
  3. Save the staging table content before running the index update or create a shadow staging table that keeps staging entries that have already been applied to a text-search index, until the text-search index has been backed up.

A restriction as in option 1, that requires that restores are only done without using database transaction logs severely limits options for recovery scenarios. Therefore, it is not considered further here.

Including a text-search index backup with each text-search index update is an option for small indices or low-frequency updates — for example, if a text-search index is only updated once a day, it may be feasible to back it up immediately when the text-search index update is complete.

For high-frequency incremental updates or large text-search index file sizes, it is often not feasible to immediately back up a text-search index with each update. Instead, a method is needed to allow synchronizing the base table with its text-search index. One such method is to append the staging table content into a file and to reload the file content if necessary. The copy must be backed up with each index update and can be cleaned up with the next index backup. Another option is to create a shadow staging table that keeps the processed entries until they are no longer needed. For example, a delete trigger on the text-search index's staging table copies the entries into the shadow table when the current update process deletes the processed entries from the staging table. If a restore is necessary, the content of the shadow table is added back into the staging table, and the next incremental index update will process the entries again. The shadow table can be cleared after each backup of the text-search index.


Backup scenarios

Online backup

The online option controls whether backed-up tables are available for certain updates. An online database backup might cause a contention with a text-search index update. However, this contention is no different from other database applications with long-running tasks. What happens in a contention scenario?

  1. The online backup locks a table needed by the text-search index update. In this case, the text-search index update fails and must be restarted.
  2. The online backup succeeds and has backed up parts of the staging table content, which are then deleted as part of the text-search index update. In this case, restoring this state will simply cause a replay of text-search index updates processed before (i.e., the impact is limited to additional processing time). Document errors for missing documents may occur when a deleted document is reprocessed, and these errors can be ignored.
  3. The online backup fails due to some access contention with the parallel index update. While this is rare, in order to reduce the risk of contention for access to database resources and subsequent failures, avoid parallel schedules for online database backups and text-search index updates.

Selective backup

For a non-partitioned full-text search setup as supported in DB2 9.5 and 9.7, if no explicit table space is specified, the staging and event tables for a text-search index are created in the same table space as the base data table. If a selective backup (or later restore) of table spaces is a requirement, it is important to ensure that base table and staging table are backed up together. This implies that if multiple staging tables are in the same table space, all related base tables must be backed up. For example, if BaseTable1 is in TableSpace1, BaseTable2 is in TableSpace2, and the staging tables for TextIndex1 (on BaseTable1) as well as for Textindex2 (on BaseTable2) are in TableSpace3, then all three table spaces must be backed up together.

A further aspect is the text-search index catalog, which by default is in the Userspace1 table space. To avoid mismatches, it is necessary to use a non-selective backup (full, incremental, or delta) after creating or dropping a text-search index. A selective restore could otherwise lead to the case that the text-search index catalog contains the metadata for a text-search index, but the selective restore does not include the corresponding administration tables. If a text-search index was dropped, but a selective restore restores the administrative tables for this text-search index, they will persist. For these reasons, a selective backup or restore is not recommended if text-search indices are affected.


Recovery scenarios

Basic recovery procedure

For a basic recovery procedure, use the following steps to restore the database and the text-search indices:

  1. Make sure no DB2 Text Search administrative command is running.
  2. Stop the DB2 Text Search services: db2ts stop for text.
  3. Restore the database and the backup of the text-search index locations (text-search index and metadata) to the same path as before.
  4. Restart DB2 Text Search services: db2ts start for text.

This basic procedure is suitable if a matching database and text-search index backup are available. Following are some considerations for other common recovery scenarios.

Restore with forward recovery

The changes to the base table are captured in the staging table for the text-search index. The content of the staging table reflects the changes that have not yet been applied to the text-search index. If the staging table is backed up together with the base table and later restored together with the base table, the base table and the staging table contain the content up to the roll-forward recovery point. After the next index update, the text-search index will only match the base table content if:

  • Either no index update was applied between the database backup operation and the roll-forward recovery point
  • Or a new text-search index backup was performed after the index update, and the latest backup before the roll-forward recovery point is used

The timestamp of the text-search index backup used for the recovery must be after the last text-search index update or, if a shadow staging table is available, any gap covered by entries in the shadow staging table and before the roll-forward recovery point.

Do not use a text-search index backup with a timestamp more recent than the roll-forward recovery point. In this case, the text-search index may contain data that has no match in the database, or may miss data restored to the base table. These entries will not be updated with the next index update because there are no corresponding staging-table entries anymore, not even if a shadow table is used.

Restore without forward recovery

The difference to the more-generic scenario with forward recovery is only the recovery point itself. Otherwise, the same basic principles apply regarding the relationship of base table, staging table, and text-search index.

Restore from incremental or delta backup

Incremental backups take less time than full backups and are a commonly used feature for database backups. If the text-search index is backed up after each text search index update:

  • Apply full database backup
  • Apply incremental database (delta) backups
  • Restore text-search index from the index backup with the most recent timestamp before the last incremental database backup
  • Run an index update

If index updates have been applied to the text-search index between the last full backup and the incremental backup, the staging-table entries triggering the text-search index update will be lost. For such a scenario, use the consistency options discussed earlier.

Restore INTO target database

There is no support to restore a database with active text-search indices into a different target database, as the dependencies for a consistent state include using the same database instance name, the same database name, the same text-search server configuration, and the same file system structure for any text-search index locations. Other cases result in inconsistencies. For example:

  • If the target database is enabled and indices exist, existing catalog data might be overwritten such that existing text-search indices are inaccessible or unusable.
  • If text-search index locations are impacted, text-search index catalog updates are required.
  • The file name of the collection is constructed from instance name, database name, and collection name. If the database name changes, an attempt to access the collection will miss the corresponding files.

Recovery sample

From a text-search index perspective, what matters most in a recovery scenario is the synchronization of the three participants' database table, staging table, and text-search index. Using the illustration in Figure 3, following are some typical scenarios.

Figure 3. Sample backup timelines
Sample combinations of database backups and text index backups

A database backup was applied at T1, T4, T5, and T7, a text-search index backup at T1, T3, and T7. Database transactions occur continuously. Note that it does not matter whether the database backup was incremental or not; only the use (or not) of transaction logs is relevant. Restore to timeline:

  • T1 — Database and text-search index update match, the text-search index is current, no changes need to be applied to the text-search index.
  • T2 — The staging table has captured changes that must be applied to the base table. Using the database backup at T1 with a roll-forward recovery to T2 together with the text-search index backup at T1, will then only require a text-search index update to match the database and the text-search index content.
  • T3 — The text index backup from T3 must be used with the database backup from T1, including roll-forward recovery to T3 to achieve a consistent state.
  • T4 — The staging table contains the changes needed to update the text index backup taken at T3 to timeline T4.
  • T5 — The staging table changes between T3 and T5 must be saved to achieve a consistent state.
  • T6 — Same as T5, as the additional changes to the staging table are available with the database backup at T5 and a roll-forward with transaction logs to T6.

Multiple database scenario

A single Text Search server is assigned per database instance. If an instance contains multiple enabled databases, two additional challenges arise:

  • How to treat instance-level configuration data
  • How to minimize downtimes due to backup operations

Instance-level configuration data

If the database is backed up together with the text-search server configuration and later restored with this configuration data, the following scenarios may occur:

  • No text-search server configuration change was applied between backup and restore.
  • A schedule was changed for a text-search index.
  • A text-search index was dropped or created.
  • The text-search server configuration was changed with the ConfigurationTool.

To minimize inconsistencies, retain the current text-search server configuration. This will ensure that other enabled databases in the instance will not be affected. However, steps are needed to adapt the text-search server configuration to the restored database:

  1. Remove the file ciedem.dat in db2tss/config. The scheduler file will be automatically recreated with the schedules defined in the enabled databases.
  2. If a text-search index had been dropped between backup and restore, the text-search catalog data in the newly restored database will contain a catalog entry without a corresponding text-search index collection. Drop the text-search index.
  3. If a text-search index had been created between backup and restore, the text-search catalog data in the newly restored database will miss the entry for this index, but the text-search index collection still exists on the text-search server. Use the cleanup command to remove the orphaned collection.

Minimizing text server downtime for backups

It is critical for a successful text-search index backup to prevent unwanted operations on a text-search index. One option is to shut down the text-search server associated with the instance, as indicated in the sample backup procedures. However, especially when multiple databases are served by the text-search server, this option implies potentially significant downtimes that may make shutting down the text-search server unfeasible.

The main reason to shut down the text-search server is to prevent conflicting access to text-search index files. If the goal is to back up a specific text-search index, it is possible to avoid shutting down the server if it can be ensured that no other operation holds the text index files. Refrain from manual administration tasks (manual reorganization, adding synonyms, etc.) and make sure that no text-search index update is running for this text index. Set the frequency to NONE with an ALTER INDEX command to remove the schedule and check the command locks to ensure that no index update is in progress.


Conclusion

Planning for a comprehensive backup solution for text indices that include database and file system objects is crucial to achieving a consistent system in a recovery scenario. Remember:

  • Text-search indices are not included in the database backup.
  • Use a file-system backup method for text-search indices at least with each full database backup.
  • Prevent concurrent write access to the text index during any index backup.
  • Back up the text-search index after each update, or save a copy of the staging table content until the next index backup is complete.

Download

DescriptionNameSize
Simple backup/restore sampleSample_Backup.zip2KB

Resources

Learn

Get products and technologies

  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=793989
ArticleTitle=DB2 Text Search, Part 3: What to consider when planning text index backups
publish-date=02232012