Configuring DB2 Text Search in a partitioned environment

DB2® Text Search enables DB2 database applications to perform full text-search by using embedded full text-search clauses in SQL and XQuery statements. This allows you to create powerful text-retrieval programs. DB2 Text Search supports full-text search in both non-partitioned and partitioned database environments. Partitioned setups are often used for large workloads, and as the text search index is partitioned according to the partitioning of the table, careful planning of configuration and administration tasks is needed to account for search performance and high availability requirements. The article describes the concepts behind the text index partitioning scheme and the impact on administration, the configuration for text search of a sample partitioned database setup, and discusses monitoring features and workload control options.

Nikunja B Das (nikundas@in.ibm.com), Senior Software Engineer, IBM

Nikunja DasNikunja B Das has more than 7 years of experience in software development and QA. He is a certified DB2 Database Administrator that has good working skills on DB2 and Text Search. In his current role, he is responsible for handling the customer requests for DB2 Text Search and Net Search Extender at India software labs.



Venkata S Vanukuru (vvanukur@in.ibm.com), Software Developer, IBM

Venkata S VanukuruVenkata S Vanukuru has been with the IBM software labs for the past 5 years. He is a certified DB2 Application Developer and Advanced Database Administrator. His areas of expertise include DB2 and Text Search, Linux, UNIX, and Windows Install Up and Running. He is currently working with Netezza Appliance Development at India software labs.



29 August 2013

Introduction to Database Partitioning Feature and Text Search

The Database Partitioning Feature (DPF) is a value-added option available with DB2 that extends the capability of DB2 into the parallel, multi-partition environment, improving the performance and the scalability of very large databases. A database partition can be either logical or physical. Logical database partitioning is having a partitioned database on a single machine with multiple logical nodes sharing the same resources where physical partitions consist of two or more physical servers, and the database is partitioned across these servers where each partition has its own resources.

With larger amounts of searchable structured, semi-structured, and unstructured data stored in databases, the need for an integrated full-text search capability to find interesting data in partitioned environments has become vital. DB2 Text Search enables DB2 database applications to perform full-text search by using embedded full text-search predicates in SQL and XQuery statements. This allows creating powerful text-retrieval programs. DB2 Text Search supports full-text search in both non-partitioned and partitioned database environments. Partitioned setups are often used for large workloads, and as the text index is partitioned according to the partitioning of the table, you need to plan the configuration and administration tasks carefully to account for search performance and high availability requirements.


Install and configure DB2 Text Search in a partitioned environment

Installing and configuring DB2 Text Search needs few more steps from setting up a partitioned environment for DB2. The DB2 installation configure the local text server. For a partitioned setup, you can use this for testing purposes and small workloads. However, for more control of workload distribution, consider installing and configuring the text server in a stand-alone setup.

Follow these simple steps to have the Text Search up and running in a partitioned environment for Linux®/UNIX® systems:

  1. Install the DB2 v10.1 server in the same installation path on all the nodes.
  2. Mount the DB2 instance owner’s home directory across all the nodes.
  3. Update the /etc/hosts file with the hostnames of participating nodes.
  4. Create the DB2 instance using db2icrt command with -j option to configure Text Search with the required service name and port number.
  5. Edit the db2nodes.cfg file to have the hosts added.
  6. Update the /etc/services file for port entries, svcename with the port information,.rhosts profile with the hostnames, and instance userid.
  7. Set the required database manager configuration parameters.
  8. Start the DB2 and the Text Search instance services.

Follow these steps for installing and configuring the stand-alone text search server:

  1. Create an ECMTS_HOME directory ECMText Search in /opt/ibm (for Linux or UNIX) or in C:\Program Files\IBM (for Windows®). The installation directory should have read, write, and execute permission for Linux or UNIX and for Windows, it needs administrator authority.
  2. Download the DB2 Accessories Suite for the required platform from the IBM DB2 Accessories Suite for Linux, UNIX, and Windows website. Extract it to a temporary directory, which depends on the different releases and different ECMTS version.
  3. It depends on the Build number returned by the configuration tool
  4. Check and edit the license and ecmts_response.txt files to customize your settings.
  5. Use the setup file to install the stand-alone Text Search server, using these commands for installing:
    • For Linux and UNIX:./[ecmts_setup_file_name] -i silent -f ecmts_response.txt
    • For Windows :[ecmts_setup_file_name] -i silent -f ecmts_response.txt
  6. Verify that the installation is successful. Check the installation log file and the bin, lib, config, and resource folders created in the [ECMTS_HOME] directory.
  7. Change the configuration file config.xml according to the installation path and configpath of the stand-alone server
  8. Start the server by using the startup.sh script present in the bin folder of the [ECMTS_HOME] directory.
  9. After the installation of the stand-alone DB2 Text Search server, use the configuration tool to customize some default properties. Ensure that the stand-alone DB2 Text Search server is shut down and that the text search services are stopped before configuring the properties, such as the system level properties and the security properties for the system.
  10. Enable Standalone TS server by updating the catalog table sysibmts.tsservers with the server info like HOST, PORT, TOKEN, KEY, LOCALE, SERVERTYPE, SERVERSTATUS after the enablement of the database for text search.
  11. Flush the package cache.
  12. Execute the SYSTS_CONFIGURE procedure.

The command in Listing 1 is used to check if DB2 Text Search component is installed on a given system. If Text Search is installed, the command returns the product name along with its version; otherwise, the output is empty.

Listing 1. Running db2ls command to check if Text Search component is installed
db2ls -q -b  /opt/ibm/db2/V10.1_DPF| grep "TEXT_SEARCH"
TEXT_SEARCH     10.1.0.0          0   DB2 Text Search

Listing 2 displays the configuration values like the server build version, build number, authentication token, port number(default port 55000), and so on. In the output of the printAll option, the authentication token is an empty string if DB2 Text Search is not configured; otherwise, there is a valid authentication token that is used to communicate with the server.

Listing 2. Check for DB2 Text Search server configuration
configTool printAll -configPath [absolute-path-to-configuration-folder]
Example: configTool printAll –configPath $HOME/sqllib/db2tss/config
                
Where $HOME is the home directory of DB2 instance owner

You can start DB2 Text Search using the command in Listing 3:

Listing 3. Start DB2 Text Search command
db2ts start for text
CIE00001 Operation completed successfully

If the command in Listing 3 fails, you can also start the Text Search server via startup.sh, shown in Listing 4, to see more detailed error messages:

Listing 4. Starting Text Search server using startup.sh
db2inst1@beta4:/home/db2inst1/sqllib/db2tss/bin> ./startup.sh
IQQG0337I The server started successfully. The server is listening on port 55005

Perform the step in Listing 5 to shut down the Text Search server:

Listing 5. Shutdown script to stop the Text Search server
db2inst1@beta4:/home/db2inst1/sqllib/db2tss/bin> ./shutdown.sh
IQQG0088I The shutdown request is being sent: localhost:55005.
IQQG0089I The shutdown request was sent successfully with a response code of 200.
IQQG0025I Shutdown is complete.

Associate the text server for the database

To perform full-text search on columns containing textual data, you need to enable the database for DB2 Text Search using the command db2ts enable database for text or the SYSPROC.SYSTS_ENABLE stored procedure. This creates administrative tables and views under the schema SYSIBMTS, which contains the metadata and default settings for DB2 Text Search. The SYSIBMTS.TSSERVERS table contains information about the text server that is assigned to serve the full-text search requests in the database. You must provide the values for hostname, port number, and token for multi-partitioned databases and stand-alone text servers, and you have to update them for any scenario each time the configuration changes.

Sample screen shots for updating the TSSERVERS table and verifying the update has been highlighted in Listing 6 and Listing 7, respectively:

Listing 6. Updating TSSERVERS catalog table
db2 "UPDATE SYSIBMTS.TSSERVERS 
SET (HOST, PORT, TOKEN) =('beta4.in.ibm.com', 55005, ' mcQPjAU=')"
DB20000I  The SQL command completed successfully
Listing 7. Verifying TSSERVERS catalog table
db2 "select SERVERPORT,SERVERHOST,SERVERAUTHTOKEN from SYSIBMTS.TSSERVERS"
                
SERVERPORT   SERVERHOST            SERVERAUTHTOKEN
----------- -----------------------------------------------
55005          beta4.in.ibm.com        mcQPjAU=
1 record(s) selected.

The values in the TSSERVERS table should match the values returned by the configTool. After successfully updating the SYSIBMTS.TSSERVERS table, execute the SYSTS_CONFIGURE procedure as shown in Listing 8. This stored procedure SYSPROC.SYSTS_CONFIGURE creates db2tssrv.cfg under absolute-path to the configuration folder; this file is used during the starting of the TS server. It contains information like host name, port number, type of Text Search server, and so on.

Listing 8. Execution of stored procedure SYSTS_CONFIGURE
db2 => CALL SYSPROC.SYSTS_CONFIGURE ('', 'en_US',?)
                
Value of output parameters
--------------------------
Parameter Name: MESSAGE
Parameter Value: CIE00001 Operation completed successfully.
Return Status = 0

Text indexes, collections, and their relationship in a partitioned environment

Text search indexes are distributed in a pattern that matches the base tables on which they are created. For each database partition, a text index partition, which is referred to as a collection, is created. A text index comprises as many collections as the number of database partitions spanned by the table that is indexed. This pattern facilitates text search maintenance by allowing text search index updates with parallel execution on all index partitions and allows query optimization by pushing down query execution to localize it per partition. The staging tables used for multi-collection text search index updates are per index rather than per collection. They are distributed in a manner similar to the base table as depicted in Figure 1:

Figure 1. Text Search server in a partitioned environment
Text Search Server in a partitioned environment

Parallel update versus serial update

In a partitioned environment, you can perform index updates either concurrently on all the collections or in a sequential fashion. With parallel mode, the execution of index updates is distributed to the database partitions and executes independently on each node. In serial mode, index updates happen in a sequential fashion (node after node) and abort if a failure is encountered. Serial mode execution usually takes more time but requires significantly less resources. Figure 2 shows both serial and parallel modes:

Figure 2. Serial and parallel update diagrammatic view
SERIAL AND PARALLEL UPDATE DIAGRAMATIC VIEW

The mode of execution is controlled by the column SERIALUPDATE of the system table SYSIBMTS.SYSTSINDEXES. This column takes two values: 0 (indicating a parallel index update) and 1 (indicating a serial update). By default, index update is always concurrent in case of partitioned environment.

Listing 9. Query for index update mode
db2 "select SERIALUPDATE from SYSIBMTS.SYSTSINDEXES"
SERIALUPDATE
------------
0
1 record(s) selected.

To change the index update mode from serial to parallel and vice versa, use the administration command alter, shown in Listing 10:

Listing 10. Altering index update mode
db2ts "alter index db2ts.myTextIndex for text index 
configuration (SERIALUPDATE 1) connect to tssample"
CIE00001 Operation completed successfully

Workload control options

In partitioned environments, you can configure DB2 Text Search to use either an integrated or stand-alone server. In the case of an integrated server, the Text Search server is bundled within the DB2 installation and both the DB2 and text server reside on the same host, which might cause performance problems. With a stand-alone or decoupled text server, the DB2 and text server reside on different hosts, which allows the resource heavy text search server processing to be separated from the host where DB2 server is located. For DPF environments, it is recommended to use remote text server installation.

You can also control the resource utilization due to Text Search by changing the update mode from parallel to serial, as parallel index update is a resource intensive operation and configuration.

In a DPF environment, it is common to add and drop the partitions. After adding/deleting the partitions, the table data and the index data needs to be redistributed. For redistribution of text indexes, you need to run the text index update command with the FOR DATA REDISTRIBUTION option. This is a resource intensive operation and should be carefully planned.


Status checking and monitoring

There are administration tools and commands that you can use to check the status of the text server and monitor the index operations.

The command in Listing 11 checks the DB2 Text Search server status. It also lists the port number to which it is configured.

Listing 11. Check for text server status
db2ts start for text STATUS
CIE00016I DB2 Text Search server "0" - "localhost":"55008":   STARTED

The DB2 catalog table SYSTSINDEXPARTS, shown in Listing 12, contains the details of all the index partitions along with their lock status and update status. If there is no update running on the partition, the LOCKSTATUS value is 0. Along with lock status, lock status time and update status time are also captured in Listing 12.

Listing 12. Query the details of index partitions
db2 "select * from sysibmts.SYSTSINDEXPARTS"
INDEXIDENT. PART. LOCKSTAT. UPDSTAT. UPDTTIME.                  LOCKSTATUSTIME
------------------ ----------- ----------- ------------ -------------------------- -------
TS032018    0     9         2        2013-07-19-23.50.05.302256 2013-07-19-23.53.50.507770
TS032018    1     9         2        2013-07-19-23.50.05.418621 2013-07-19-23.53.51.046285
TS032018    2     9         2        2013-07-19-23.50.05.540262 2013-07-19-23.53.50.795073
TS032018    3     0         2        2013-07-19-23.50.05.624204 2013-07-19-23.53.58.603899
4 record(s) selected.


db2 "select * from sysibmts.SYSTSINDEXPARTS"
INDEXIDENT. PART. LOCKSTAT. UPDSTAT. UPDTTIME.                  LOCKSTATUSTIME
------------------ ----------- ----------- ------------ -------------------------- -------
TS032018    0     0         1        2013-07-19-23.50.05.302256 2013-07-19-23.57.01.022721
TS032018    1     0         1        2013-07-19-23.50.05.418621 2013-07-20-00.00.05.804951
TS032018    2     0         1        2013-07-19-23.50.05.540262 2013-07-19-23.59.53.504174
TS032018    3     0         2        2013-07-19-23.50.05.624204 2013-07-19-23.53.58.603899
4 record(s) selected.

The TSDEFAULTS view, shown in Listing 13, contains the default values of the Text Search configuration. For example, the value of MAXCONCURRENTUPDATES controls the number of collection updates that can be executed in parallel at any given time. For multiple partition setups, the number of collections for each text index is determined according to the table distribution. However, only active partition updates count. Secondly, the value of MAXCONCURRENTCOLLECTIONS controls the number of collections that can be created. For multi-partition setups, the number of collections per text index matches the table distribution.

Listing 13. Query to check the maximum number of concurrent updates that can be performed and number of collections that can be created
db2 "select * from SYSIBMTS.TSDEFAULTS"
DEFAULTNAME                    DEFAULTVALUE
CODEPAGE                         1208
LANGUAGE                         en_US
FORMAT                           TEXT
UPDATEFREQUENCY                  NONE
UPDATEMINIMUM                    1
UPDATEAUTOCOMMIT                 0
CJKSEGMENTATION                  NGRAM
AUXLOGPART                       ON
AUXLOGNORM                       OFF
DOCUMENTRESULTQUEUESIZE          10000
LOGTIMETHRESHOLD                 0
MAXCONCURRENTUPDATES             8
MAXCONCURRENTCOLLECTIONS         160
MAXDOCUMENTSIZEINMB              100
14 record(s) selected

The query in Listing 14 gets the details on the collection, index, and index schema:

Listing 14. Query to get the details on the collection, index, and index schema
db2 "select * from sysibmts.TSCOLLECTIONNAMES"
INDSCHEMA               COLLECTIONNAME                      INDNAME
DB2TS                db2inst1_TSSAMPLE_TS572117_0000      MYTEXTINDEX
DB2TS                db2inst1_TSSAMPLE_TS572117_0001      MYTEXTINDEX
DB2TS                db2inst1_TSSAMPLE_TS572117_0002      MYTEXTINDEX
DB2TS                db2inst1_TSSAMPLE_TS572117_0003      MYTEXTINDEX
4 record(s) selected.

The adminTool, shown in Listing 15, is a useful utility that can check the number of documents indexed per index partition (collection) and size of the each collection.

Listing 15. Query to get the details about the collection and number of documents indexed across the different partitions
adminTool status -configPath /home/db2inst1/sqllib/db2tss/config
CollectionName  IndexSize       NumOfDocuments
Base            18,185B 0
db2inst1_SAMPLE_TS300108_0000      317,274.643K    111,708
db2inst1_SAMPLE_TS300108_0001      354,017.358K    109,700
db2inst1_SAMPLE_TS300108_0002      371,898.271K    113,797
db2inst1_SAMPLE_TS300108_0003      372,592.924K    111,452
db2inst1_SAMPLE_TS300108_0004      376,857.687K    112,589

While the index update is in progress (shown in Listing 16), you can use TSLOCKS to query the command lock information at the database and index level. Each row of this table indicates the administration command in progress. Once the update is successful, the table is empty (shown in Listing 17). This table should not be cleared while an administration command is in progress.

Listing 16. During an index update
db2 "select * from sysibmts.TSLOCKS"
COMMAND       LOCKSCOPE  INDSCHEMA        INDNAME    PARTITION   LOCKCREATETIME
------------------------------ ---------- ----------------------------------- -----------
UPDATE INDEX  INDEX      ADMIN            INDEX      0          2013-07-19-23.53.49.717093
UPDATE INDEX  INDEX      ADMIN            INDEX      0          2013-07-19-23.53.50.507770
UPDATE INDEX  INDEX      ADMIN            INDEX      1          2013-07-19-23.53.51.046285
UPDATE INDEX  INDEX      ADMIN            INDEX      2          2013-07-19-23.53.50.795073
4 record(s) selected.
Listing 17. After successful completion of the index update operation
db2 "select * from sysibmts.TSLOCKS"
                
COMMAND   LOCKSCOPE  INDSCHEMA  INDNAME   PARTITION   LOCKCREATETIME
------------------------------ ---------- --------------------------
0 record(s) selected

Common usage scenario

HOST1:

  1. Install the stand-alone server and configure it for text search.
  2. Start the server running the startup script (startup.sh / startup.bat).

HOST2:

  1. Start the DB2 database manager and DB2 Text Search:
    db2start
    08/23/2013 23:18:30     2   0   SQL1063N  DB2START processing was successful.
    08/23/2013 23:18:30     1   0   SQL1063N  DB2START processing was successful.
    08/23/2013 23:18:37     0   0   SQL5043N  DB2START processing was successful.
    08/23/2013 23:18:37     3   0   SQL5043N  DB2START processing was successful.
    SQL1063N  DB2START processing was successful.
                            
    db2ts "start for text"
    CIE00001 Operation completed successfully.
  2. Create a sample database and enable it for DB2 Text Search:
    db2 "create database tssample"
    DB20000I  The CREATE DATABASE command completed successfully.
                            
    db2ts "enable database for text connect to tssample"
    CIE00001 Operation completed successfully.
                            
    db2 "connect to tssample"
                            
    Database Connection Information
                            
    Database server        = DB2/LINUXX8664 10.1.0
    SQL authorization ID   = ABAKY
    Local database alias   = TSSAMPLE
  3. Update the catalog table sysibmts.tsservers with server information and run the procedure sysproc.systs_configure:
    db2 "UPDATE SYSIBMTS.TSSERVERS
    SET (HOST, PORT, TOKEN) =('beta4.in.ibm.com', 55005, ' mcQPjAU=')"
    DB20000I  The SQL command completed successfully.
                            
    db2 => CALL SYSPROC.SYSTS_CONFIGURE ('', 'en_US',?)
                            
    Value of output parameters
    --------------------------
    Parameter Name  : MESSAGE
    Parameter Value : CIE00001 Operation completed successfully.
                            
    Return Status = 0
  4. Create a table with a text column and insert some data into it
    db2 "CREATE TABLE books (isbn VARCHAR(18) not null PRIMARY KEY,
    author VARCHAR(30), story VARCHAR(50), year Integer)"
    DB20000I  The SQL command completed successfully.
                            
    db2 "INSERT INTO books VALUES 
    ('0-13-086755-2','Mike','The cat hunts some mice.', 2000)"
    DB20000I  The SQL command completed successfully.
                            
    db2 "INSERT INTO books VALUES 
    ('0-13-086755-3','Peter','Some men were standing beside the table.',1999)"
    DB20000I  The SQL command completed successfully.
  5. Create an index on the text column and perform an index update:
    db2ts "CREATE INDEX db2ts.myTextIndex FOR TEXT ON books (story) CONNECT TO tssample"
    CIE00001 Operation completed successfully.
                            
    db2ts "UPDATE INDEX db2ts.myTextIndex FOR TEXT CONNECT TO tssample"
    CIE00001 Operation completed successfully.
  6. Perform a search on the indexed content:
    db2 "SELECT author, story FROM books 
    WHERE CONTAINS (story, '\"cat\"') = 1 AND YEAR >= 2000"
    AUTHOR                         STORY
    
    ------------------------------ -----------------------------
    Mike                           The cat hunts some mice
    1 record(s) selected.

Troubleshooting hints

  1. Problem: Index update fails with an error CIE0460E Parallel index update errors across multiple database partit. SQLSTATE=38H14

    Solution: Update the SYSIBMTS.TSSERVERS including the hostname, port number, and token, and execute the SYSTS_CONFIGURE procedure.

  2. Problem: Stand-alone Text Search server failing to start using the startup script with the error IQQD0056E An error occurred when starting the server at port 8191.IQQG0335E The server cannot start. See the error log file for more details.ExtendedException: D0056E.CONNECT_ERROR - D0056E.CONNECT_ERROR

    Solution: To resolve this error message, perform either of the following steps:

    • Set the port number to a new value by issuing the following command: configTool configureParams -adminHTTPPort new_port_value -configPath ecmts_install_directory\config\config.xml.
    • After setting the port number to a new value, try running the startup script again. Uninstall the stand-alone server and then reinstall the server by setting the port number to a new value that is defined in the ecmts_response.txt response file.
  3. Problem: Index update fails with an error IQQI0061E The document with ID [AAgAAAAAAAohJg==] cannot be indexed due to inadequate disk space or permissions. Verify that there is sufficient disk space and permissions to access the disk. Then run the indexing process again. IQQG0020E java.io.IOException:No space left on device"

    Solution: Increase the disk space and run the index update again.

  4. Problem: Trying to create the index without starting the TS server create index will fail with error CIE00756E Connect failed on host.

    Solution: Run the command to start the TS server by using db2ts "start for Text" and then rerun the create index command.

  5. Problem: IQQG0037W message in a query about a missing collection after a data redistribution.

    Solution: Ensure that the FOR DATA REDISTRIBUTION option is used the next time a text search UPDATE INDEX command is issued.


Conclusion

The article describes the concepts behind the text index partitioning scheme and its impact on administration. It also details the steps to install and configure DB2 Text Search in a partitioned environment, the configuration for text search of a sample partitioned database setup, text indexes and collections, monitoring features, workload control options, and some of the troubleshooting hints for DB2 Text Search partitioned setup.


Acknowledgements

Thanks to Marion Behnen and Sobha Rani Cheruku for reviewing this article.

Resources

Learn

Discuss

  • Get involved in the My developerWorks community . Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=942335
ArticleTitle=Configuring DB2 Text Search in a partitioned environment
publish-date=08292013