DB2 Text Search, Part 4: High-availability setups with DB2 Text Search

What to consider when planning for setups that involve text indices

DB2 Text Search enables creating full-text search applications in IBM® DB2® for Linux®, UNIX®, and Windows® (LUW). For scenarios where high availability of the full-text search applications is required, a fail-over strategy must take into account that a DB2 Text Search setup involves two components: the DB2 database server and the text-search server. Text-search index catalog data and text-search index administration data are stored in the database and included in the DB2 High Availability Disaster Recovery (HADR) strategy. However, the text-search index collections are managed by the text-search server and are located on the file system. An extension of the database HADR strategy is, therefore, required to include the objects managed by the text-search server.

Share:

Marion Behnen (mbehnen@us.ibm.com), DB2 Text Search Development, IBM

Marion Behnen is a senior software engineer within the IBM Software Group and works as a technical lead for full-text search in DB2. She has more than 20 years of experience with database application development, business reporting, data warehousing, and business process integration. Prior to joining IBM, she was involved with many aspects of business process and data integration, in particular for the manufacturing industry.



Kavya Rao (kavyarao@in.ibm.com), DB2 Text Search Support, IBM

Kavya Rao is a software engineer with IBM India Software Labs. She is a certified DB2 administrator and a support specialist with several years' experience of full-text search in DB2, including DB2 Net Search Extender and DB2 Text Search.



01 March 2012

Also available in Chinese

Introduction

DB2 Text Search indexes are hosted in the DB2 database and linked to text index collection data managed with the text-search server configured for the database instance.

A database failure can affect consistency for text-search index collections, and the database server and the text-search server can be affected by hardware, software, or network failures, or by other factors like human influence or environmental impact (power failures, for example). A comprehensive high-availability (HA) solution is necessary to ensure that all components of the system are available in a planned timeframe and to re-synchronize the database content with the text-search index content on a standby system.

The following sections discuss basic elements to enable text-search setups for an HA scenario. They are based on a DB2 HADR scenario, where the primary server transfers logs to the standby server via TCP/IP. Transaction logs are transmitted and applied on the standby server. The primary server acts as TCP Server and the standby server as client. If the primary server fails, the standby server takes over the role of primary.

Figure 1. Basic setup for text search in an HA scenario
Image shows primary server connected to standby server through data synchronization

Fail-over scenarios

A DB2 database enabled for text search connects through a local host TCP/IP connection to the text-search server. The text-search server runs as a separate process and manages the file system-based text-search index collections.

In general, the following scenarios can be distinguished:

  1. A failure affects database availability and leads to a fail-over to the standby server.
  2. A failure affects the text-search server availability, but the database server is unaffected.
  3. A failure affects availability of the text-search index collection.

The focus of the following sections is how to ensure that text-search functionality can be executed consistently after a database fail-over to the standby server. The other cases are briefly discussed here.

Text-search server availability

Provided that the installation and configuration of the text-search server are correct, possible reasons for unavailability of the text-search server include:

  • Security system changes, such as authorization or access privilege changes, can prevent starting the text-search server.
  • Network issues like port or socket configuration can cause communication failures.
  • Runtime issues, such as the Java™ process, getting hung, killed, or crashing.

In each of these cases, the issue must be fixed on the primary system, and the text-search server associated with the instance must be restarted with the proper authority and the proper configuration.

Text-search index availability

If a text-search index becomes unavailable, two scenarios are most relevant:

  • A hardware error — for example, a disk crash of a custom location. This type of error is not specific to text-search indexes. Hardware errors of this kind should, for example, be addressed by choosing disk devices that support proven technologies for disk availability.
  • A human error — an unintended execution of a collection cleanup, for example.

To address index availability, the HA strategy must include a backup plan that allows restoring a text-search index collection. File permission changes may render a text-search index unavailable for access as well, but this is not a relevant scenario for this context.


Text-search server setup for a database fail-over scenario

If the machine that hosts the standby database system has different resource properties from the machine that hosts the primary database system, it might be necessary to have configuration differences for the text-search server. For example, configuration settings that impact resource usage, or the port number used for the local host TCP/IP connection between database server and text-search server.

For a smooth transition, prepare a configuration script that adjusts values for the standby text-search server. When a fail-over occurs: Ship the current text-search server configuration from the primary system to the standby system and run the configuration script to adapt configuration values before starting the text-search server on the standby system.

The re-configuration and startup of the text-search server may be integrated into a script that is part of an automatically managed solution. Note that HADR configuration settings do not apply to text-search indices, but they may affect the integrated strategy. For example, setting the parameter logindexbuild to ON will not keep track of text-search index builds.

The choice of HADR synchronization mode may cause a slight increase of the risk for a failed text-search index update. When an asynchronous mode is configured (ASYNC or SUPERASYNC), recent transactions might be lost on a standby system. In this case, it is possible that a text-search index update was committed by the text-search server and on the primary server, but the transactions to wrap up the text-search index update in the database are not applied to the secondary server. The scenario is to be treated like the corresponding update failure in a non-HA setup, and the next incremental update will re-process the data.

For all HA scenarios, to distribute disk I/O workload, do not share devices between the logging file system, the table-space files, and the text-search index collections.


Text-search indexes in a database fail-over scenario

Using a shared disk for text-search indexes

Text-search index collections can be created in a custom location. The location can be on a shared disk that is accessible from the primary server as well as the secondary server. The file system must be set up identically for the primary and secondary system to access the collections with the same absolute path information.

Figure 2. Text indices on a shared disk
Image shows that text indexes are stored on a shared disk to allow access from a standby text server

Depending on the reason for the database fail-over, the text-search server process on the primary system might still be active. If so, make sure to shut down the text-search server on the primary system before a startup of the text-search server on the secondary system.

The advantage of this setup is that it requires only space for one set of text-search indices and has a fast uptime because the text-search server on the standby system will have immediate access to the collections after starting. The disadvantage is that using a shared disk may be slower.

To resume full-text search on the fail-over system:

  • Refresh the text-search server configuration.
  • Identify and treat interrupted text-search administrative operations, if any.
  • Start the text-search server.

Using a text-search index copy on the standby system

Another approach to make text-search index collections available to a standby server is to ship text-search index files to the standby system. Again, the file system and folder structure must be set up on the standby system to mirror the primary system.

Figure 3. Text index copy on standby system
Image shows that text indices are copied to the standby system

Similarly to backup considerations, all files or only changed files may be shipped to the standby system. In a standby-with-copy scenario, use a backup strategy that ensures that either (changed) text-search index files are shipped after each index update or a shadow staging table keeps processed staging table entries until after a shipment. With such a strategy, the content of the database on the standby system and the content of the text-search index collections will be consistent. (See Part 3 of this series for information about backup strategies and staging table considerations.)

In this solution, the standby system is essentially used as repository for the most recent text-search index backup. To resume full-text search on the fail-over system:

  • Refresh the text-search server configuration.
  • Identify and treat interrupted text-search administrative operations, if any.
  • If staging table copies or shadow staging tables are used, move entries back to the original staging tables.
  • Start the text-search server.
  • Run an incremental text-search index update to apply changes from the staging table.

Using a restore strategy for text-search indexes on the standby system

If time is not critical and the standby server is not appropriate to serve as a backup repository, a variation of the standby-with-copy scenario is to use DB2 HADR to manage the database fail-over and restore the text-search index collections from a backup.

Figure 4. Using an index backup to restore the standby system
Image shows that text indices are restored to the standby system

To resume full-text search on the fail-over system:

  • Refresh the text-search server configuration.
  • Restore the text-search index collections from the current backup.
  • Identify and treat interrupted text-search administrative operations, if any.
  • If staging table copies or shadow staging tables are used, move entries back to the original staging tables.
  • Start the text-search server.
  • Run an incremental text-search index update to apply changes from the staging table.

Handling interrupted administrative operations

A critical element to enabling text-search on the standby server is to ensure that the text-search server on the standby system is configured to match the text-search server on the primary system. Similar to database management configuration and database configuration, text-search server configuration data is not automatically propagated from the primary to the standby system, but must be handled manually. Text-search server configuration data is on database instance level and persisted in configuration files located in .../sqllib/db2tss/config.

Independent of whether a shared disk, a standby-with-copy, or standby-with-restore solution is used to include text-search indices in a fail-over scenario, a fail-over from primary to standby server may involve incomplete text-search index administrative operations. This has implications for consistency between database content and text-search index content, and text-search server index metadata.

For example, an operation may be uncommitted on the text-search server and on the database server at the time of failure. Another example is that an operation may already have committed on the text-search server but not yet committed in the database, and the database did not send a rollback or undo operation request to the text-search server.

Before starting the text-search server to resume operations on the standby system, it is prudent to review the command locks view for interrupted operations:

  1. Create index. If a CREATE INDEX FOR TEXT operation was active at the time of the database failure:
    • Remove the command lock.
    • Find and remove superfluous collection entries by using CLEANUP FOR TEXT. Starting the text server does not depend on this step. Cleanup can be done when it is convenient.
  2. Drop the index. If a DROP INDEX FOR TEXT operation was active at the time of the database failure:
    • Remove the command lock.
    • Repeat the operation.
  3. Update the index. If an UPDATE INDEX FOR TEXT operation was active at the time of the database failure:
    • Remove the command lock.
    • Restart the text-search index update to reprocess the data.

    The text-search index collection status will be the last-committed state before the time of failure. This matches the last backed-up state of the text-search index combined with the saved staging table content up to the last successful incremental update.

    If the update operation was committed by the text-search server, the collection will contain the updated data, even if the index update is marked as failed in the database. In this case, if the update operation has not already committed all transactions in the database (excluding the command lock update), the same update scope will again be applied with the next index update processed at the time of failure.


Conclusion

To include text-search indices into a high-availability strategy for DB2, it is necessary to ensure that the standby system is properly configured for text search and has access to the text-search index collections. Keep in mind that:

  • The text-search server configuration on the standby system must match the configuration on the primary system, except for those configuration values that address system resource settings.
  • The text server on the secondary system must have access to the text-search index collections in a file structure that matches the primary system.
  • In a shared disk setup, the primary text server and the secondary text server must not use the text indices concurrently.

Download

DescriptionNameSize
Simple HA sampleSample_HA.txt2KB

Resources

Learn

Get products and technologies

  • You can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=795743
ArticleTitle=DB2 Text Search, Part 4: High-availability setups with DB2 Text Search
publish-date=03012012