Introduction to Database Partitioning Feature and Text Search
The Database Partitioning Feature (DPF) is a value-added option available with DB2 that extends the capability of DB2 into the parallel, multi-partition environment, improving the performance and the scalability of very large databases. A database partition can be either logical or physical. Logical database partitioning is having a partitioned database on a single machine with multiple logical nodes sharing the same resources where physical partitions consist of two or more physical servers, and the database is partitioned across these servers where each partition has its own resources.
With larger amounts of searchable structured, semi-structured, and unstructured data stored in databases, the need for an integrated full-text search capability to find interesting data in partitioned environments has become vital. DB2 Text Search enables DB2 database applications to perform full-text search by using embedded full text-search predicates in SQL and XQuery statements. This allows creating powerful text-retrieval programs. DB2 Text Search supports full-text search in both non-partitioned and partitioned database environments. Partitioned setups are often used for large workloads, and as the text index is partitioned according to the partitioning of the table, you need to plan the configuration and administration tasks carefully to account for search performance and high availability requirements.
Install and configure DB2 Text Search in a partitioned environment
Installing and configuring DB2 Text Search needs few more steps from setting up a partitioned environment for DB2. The DB2 installation configure the local text server. For a partitioned setup, you can use this for testing purposes and small workloads. However, for more control of workload distribution, consider installing and configuring the text server in a stand-alone setup.
Follow these simple steps to have the Text Search up and running in a partitioned environment for Linux®/UNIX® systems:
- Install the DB2 v10.1 server in the same installation path on all the nodes.
- Mount the DB2 instance owner’s home directory across all the nodes.
- Update the /etc/hosts file with the hostnames of participating nodes.
- Create the DB2 instance using db2icrt command with -j option to configure Text Search with the required service name and port number.
- Edit the db2nodes.cfg file to have the hosts added.
- Update the /etc/services file for port entries, svcename with the port information,.rhosts profile with the hostnames, and instance userid.
- Set the required database manager configuration parameters.
- Start the DB2 and the Text Search instance services.
Follow these steps for installing and configuring the stand-alone text search server:
- Create an ECMTS_HOME directory ECMText Search in /opt/ibm (for Linux or UNIX) or in C:\Program Files\IBM (for Windows®). The installation directory should have read, write, and execute permission for Linux or UNIX and for Windows, it needs administrator authority.
- Download the DB2 Accessories Suite for the required platform from the IBM DB2 Accessories Suite for Linux, UNIX, and Windows website. Extract it to a temporary directory, which depends on the different releases and different ECMTS version.
- It depends on the Build number returned by the configuration tool
- Check and edit the license and ecmts_response.txt files to customize your settings.
- Use the setup file to install the stand-alone Text Search server,
using these commands for installing:
- For Linux and UNIX:
./[ecmts_setup_file_name] -i silent -f ecmts_response.txt
- For Windows :
[ecmts_setup_file_name] -i silent -f ecmts_response.txt
- For Linux and UNIX:
- Verify that the installation is successful. Check the installation log file and the bin, lib, config, and resource folders created in the [ECMTS_HOME] directory.
- Change the configuration file config.xml according to the installation path and configpath of the stand-alone server
- Start the server by using the startup.sh script present in the bin folder of the [ECMTS_HOME] directory.
- After the installation of the stand-alone DB2 Text Search server, use the configuration tool to customize some default properties. Ensure that the stand-alone DB2 Text Search server is shut down and that the text search services are stopped before configuring the properties, such as the system level properties and the security properties for the system.
- Enable Standalone TS server by updating the catalog table sysibmts.tsservers with the server info like HOST, PORT, TOKEN, KEY, LOCALE, SERVERTYPE, SERVERSTATUS after the enablement of the database for text search.
- Flush the package cache.
- Execute the SYSTS_CONFIGURE procedure.
The command in Listing 1 is used to check if DB2 Text Search component is installed on a given system. If Text Search is installed, the command returns the product name along with its version; otherwise, the output is empty.
Listing 1. Running db2ls command to check if Text Search component is installed
db2ls -q -b /opt/ibm/db2/V10.1_DPF| grep "TEXT_SEARCH" TEXT_SEARCH 10.1.0.0 0 DB2 Text Search
Listing 2 displays the configuration values like the server build version, build number, authentication token, port number(default port 55000), and so on. In the output of the printAll option, the authentication token is an empty string if DB2 Text Search is not configured; otherwise, there is a valid authentication token that is used to communicate with the server.
Listing 2. Check for DB2 Text Search server configuration
configTool printAll -configPath [absolute-path-to-configuration-folder] Example: configTool printAll –configPath $HOME/sqllib/db2tss/config Where $HOME is the home directory of DB2 instance owner
You can start DB2 Text Search using the command in Listing 3:
Listing 3. Start DB2 Text Search command
db2ts start for text CIE00001 Operation completed successfully
If the command in Listing 3 fails, you can also start the Text Search server via startup.sh, shown in Listing 4, to see more detailed error messages:
Listing 4. Starting Text Search server using startup.sh
db2inst1@beta4:/home/db2inst1/sqllib/db2tss/bin> ./startup.sh IQQG0337I The server started successfully. The server is listening on port 55005
Perform the step in Listing 5 to shut down the Text Search server:
Listing 5. Shutdown script to stop the Text Search server
db2inst1@beta4:/home/db2inst1/sqllib/db2tss/bin> ./shutdown.sh IQQG0088I The shutdown request is being sent: localhost:55005. IQQG0089I The shutdown request was sent successfully with a response code of 200. IQQG0025I Shutdown is complete.
Associate the text server for the database
To perform full-text search on columns containing textual data, you need to
enable the database for DB2 Text Search using the command
db2ts enable database for text or
the SYSPROC.SYSTS_ENABLE stored procedure. This creates administrative tables and
views under the schema SYSIBMTS, which contains the metadata and default settings for
DB2 Text Search. The SYSIBMTS.TSSERVERS table contains information about the text
server that is assigned to serve the full-text search requests in the
database. You must provide the
values for hostname, port number, and token for multi-partitioned
databases and stand-alone text servers, and you have to update them for any scenario each
time the configuration changes.
Sample screen shots for updating the TSSERVERS table and verifying the update has been highlighted in Listing 6 and Listing 7, respectively:
Listing 6. Updating TSSERVERS catalog table
db2 "UPDATE SYSIBMTS.TSSERVERS SET (HOST, PORT, TOKEN) =('beta4.in.ibm.com', 55005, ' mcQPjAU=')" DB20000I The SQL command completed successfully
Listing 7. Verifying TSSERVERS catalog table
db2 "select SERVERPORT,SERVERHOST,SERVERAUTHTOKEN from SYSIBMTS.TSSERVERS" SERVERPORT SERVERHOST SERVERAUTHTOKEN ----------- ----------------------------------------------- 55005 beta4.in.ibm.com mcQPjAU= 1 record(s) selected.
The values in the TSSERVERS table should match the values returned by the configTool. After successfully updating the SYSIBMTS.TSSERVERS table, execute the SYSTS_CONFIGURE procedure as shown in Listing 8. This stored procedure SYSPROC.SYSTS_CONFIGURE creates db2tssrv.cfg under absolute-path to the configuration folder; this file is used during the starting of the TS server. It contains information like host name, port number, type of Text Search server, and so on.
Listing 8. Execution of stored procedure SYSTS_CONFIGURE
db2 => CALL SYSPROC.SYSTS_CONFIGURE ('', 'en_US',?) Value of output parameters -------------------------- Parameter Name: MESSAGE Parameter Value: CIE00001 Operation completed successfully. Return Status = 0
Text indexes, collections, and their relationship in a partitioned environment
Text search indexes are distributed in a pattern that matches the base tables on which they are created. For each database partition, a text index partition, which is referred to as a collection, is created. A text index comprises as many collections as the number of database partitions spanned by the table that is indexed. This pattern facilitates text search maintenance by allowing text search index updates with parallel execution on all index partitions and allows query optimization by pushing down query execution to localize it per partition. The staging tables used for multi-collection text search index updates are per index rather than per collection. They are distributed in a manner similar to the base table as depicted in Figure 1:
Figure 1. Text Search server in a partitioned environment
Parallel update versus serial update
In a partitioned environment, you can perform index updates either concurrently on all the collections or in a sequential fashion. With parallel mode, the execution of index updates is distributed to the database partitions and executes independently on each node. In serial mode, index updates happen in a sequential fashion (node after node) and abort if a failure is encountered. Serial mode execution usually takes more time but requires significantly less resources. Figure 2 shows both serial and parallel modes:
Figure 2. Serial and parallel update diagrammatic view
The mode of execution is controlled by the column SERIALUPDATE of the system table SYSIBMTS.SYSTSINDEXES. This column takes two values: 0 (indicating a parallel index update) and 1 (indicating a serial update). By default, index update is always concurrent in case of partitioned environment.
Listing 9. Query for index update mode
db2 "select SERIALUPDATE from SYSIBMTS.SYSTSINDEXES" SERIALUPDATE ------------ 0 1 record(s) selected.
To change the index update mode from serial to parallel and vice versa,
use the administration command
alter, shown in Listing 10:
Listing 10. Altering index update mode
db2ts "alter index db2ts.myTextIndex for text index configuration (SERIALUPDATE 1) connect to tssample" CIE00001 Operation completed successfully
Workload control options
In partitioned environments, you can configure DB2 Text Search to use either an integrated or stand-alone server. In the case of an integrated server, the Text Search server is bundled within the DB2 installation and both the DB2 and text server reside on the same host, which might cause performance problems. With a stand-alone or decoupled text server, the DB2 and text server reside on different hosts, which allows the resource heavy text search server processing to be separated from the host where DB2 server is located. For DPF environments, it is recommended to use remote text server installation.
You can also control the resource utilization due to Text Search by changing the update mode from parallel to serial, as parallel index update is a resource intensive operation and configuration.
In a DPF environment, it is common to add and drop the partitions. After adding/deleting the partitions, the table data and the index data needs to be redistributed. For redistribution of text indexes, you need to run the text index update command with the FOR DATA REDISTRIBUTION option. This is a resource intensive operation and should be carefully planned.
Status checking and monitoring
There are administration tools and commands that you can use to check the status of the text server and monitor the index operations.
The command in Listing 11 checks the DB2 Text Search server status. It also lists the port number to which it is configured.
Listing 11. Check for text server status
db2ts start for text STATUS CIE00016I DB2 Text Search server "0" - "localhost":"55008": STARTED
The DB2 catalog table SYSTSINDEXPARTS, shown in Listing 12, contains the details of all the index partitions along with their lock status and update status. If there is no update running on the partition, the LOCKSTATUS value is 0. Along with lock status, lock status time and update status time are also captured in Listing 12.
Listing 12. Query the details of index partitions
db2 "select * from sysibmts.SYSTSINDEXPARTS" INDEXIDENT. PART. LOCKSTAT. UPDSTAT. UPDTTIME. LOCKSTATUSTIME ------------------ ----------- ----------- ------------ -------------------------- ------- TS032018 0 9 2 2013-07-19-23.50.05.302256 2013-07-19-126.96.36.1997770 TS032018 1 9 2 2013-07-19-23.50.05.418621 2013-07-19-23.53.51.046285 TS032018 2 9 2 2013-07-19-23.50.05.540262 2013-07-19-188.8.131.525073 TS032018 3 0 2 2013-07-19-23.50.05.624204 2013-07-19-184.108.40.2063899 4 record(s) selected. db2 "select * from sysibmts.SYSTSINDEXPARTS" INDEXIDENT. PART. LOCKSTAT. UPDSTAT. UPDTTIME. LOCKSTATUSTIME ------------------ ----------- ----------- ------------ -------------------------- ------- TS032018 0 0 1 2013-07-19-23.50.05.302256 2013-07-19-23.57.01.022721 TS032018 1 0 1 2013-07-19-23.50.05.418621 2013-07-20-00.00.05.804951 TS032018 2 0 1 2013-07-19-23.50.05.540262 2013-07-19-220.127.116.114174 TS032018 3 0 2 2013-07-19-23.50.05.624204 2013-07-19-18.104.22.1683899 4 record(s) selected.
The TSDEFAULTS view, shown in Listing 13, contains the default values of the Text Search configuration. For example, the value of MAXCONCURRENTUPDATES controls the number of collection updates that can be executed in parallel at any given time. For multiple partition setups, the number of collections for each text index is determined according to the table distribution. However, only active partition updates count. Secondly, the value of MAXCONCURRENTCOLLECTIONS controls the number of collections that can be created. For multi-partition setups, the number of collections per text index matches the table distribution.
Listing 13. Query to check the maximum number of concurrent updates that can be performed and number of collections that can be created
db2 "select * from SYSIBMTS.TSDEFAULTS" DEFAULTNAME DEFAULTVALUE CODEPAGE 1208 LANGUAGE en_US FORMAT TEXT UPDATEFREQUENCY NONE UPDATEMINIMUM 1 UPDATEAUTOCOMMIT 0 CJKSEGMENTATION NGRAM AUXLOGPART ON AUXLOGNORM OFF DOCUMENTRESULTQUEUESIZE 10000 LOGTIMETHRESHOLD 0 MAXCONCURRENTUPDATES 8 MAXCONCURRENTCOLLECTIONS 160 MAXDOCUMENTSIZEINMB 100 14 record(s) selected
The query in Listing 14 gets the details on the collection, index, and index schema:
Listing 14. Query to get the details on the collection, index, and index schema
db2 "select * from sysibmts.TSCOLLECTIONNAMES" INDSCHEMA COLLECTIONNAME INDNAME DB2TS db2inst1_TSSAMPLE_TS572117_0000 MYTEXTINDEX DB2TS db2inst1_TSSAMPLE_TS572117_0001 MYTEXTINDEX DB2TS db2inst1_TSSAMPLE_TS572117_0002 MYTEXTINDEX DB2TS db2inst1_TSSAMPLE_TS572117_0003 MYTEXTINDEX 4 record(s) selected.
The adminTool, shown in Listing 15, is a useful utility that can check the number of documents indexed per index partition (collection) and size of the each collection.
Listing 15. Query to get the details about the collection and number of documents indexed across the different partitions
adminTool status -configPath /home/db2inst1/sqllib/db2tss/config CollectionName IndexSize NumOfDocuments Base 18,185B 0 db2inst1_SAMPLE_TS300108_0000 317,274.643K 111,708 db2inst1_SAMPLE_TS300108_0001 354,017.358K 109,700 db2inst1_SAMPLE_TS300108_0002 371,898.271K 113,797 db2inst1_SAMPLE_TS300108_0003 372,592.924K 111,452 db2inst1_SAMPLE_TS300108_0004 376,857.687K 112,589
While the index update is in progress (shown in Listing 16), you can use TSLOCKS to query the command lock information at the database and index level. Each row of this table indicates the administration command in progress. Once the update is successful, the table is empty (shown in Listing 17). This table should not be cleared while an administration command is in progress.
Listing 16. During an index update
db2 "select * from sysibmts.TSLOCKS" COMMAND LOCKSCOPE INDSCHEMA INDNAME PARTITION LOCKCREATETIME ------------------------------ ---------- ----------------------------------- ----------- UPDATE INDEX INDEX ADMIN INDEX 0 2013-07-19-22.214.171.1247093 UPDATE INDEX INDEX ADMIN INDEX 0 2013-07-19-126.96.36.1997770 UPDATE INDEX INDEX ADMIN INDEX 1 2013-07-19-23.53.51.046285 UPDATE INDEX INDEX ADMIN INDEX 2 2013-07-19-188.8.131.525073 4 record(s) selected.
Listing 17. After successful completion of the index update operation
db2 "select * from sysibmts.TSLOCKS" COMMAND LOCKSCOPE INDSCHEMA INDNAME PARTITION LOCKCREATETIME ------------------------------ ---------- -------------------------- 0 record(s) selected
Common usage scenario
- Install the stand-alone server and configure it for text search.
- Start the server running the startup script (
- Start the DB2 database manager and DB2 Text Search:
db2start 08/23/2013 23:18:30 2 0 SQL1063N DB2START processing was successful. 08/23/2013 23:18:30 1 0 SQL1063N DB2START processing was successful. 08/23/2013 23:18:37 0 0 SQL5043N DB2START processing was successful. 08/23/2013 23:18:37 3 0 SQL5043N DB2START processing was successful. SQL1063N DB2START processing was successful. db2ts "start for text" CIE00001 Operation completed successfully.
- Create a sample database and enable it for DB2 Text Search:
db2 "create database tssample" DB20000I The CREATE DATABASE command completed successfully. db2ts "enable database for text connect to tssample" CIE00001 Operation completed successfully. db2 "connect to tssample" Database Connection Information Database server = DB2/LINUXX8664 10.1.0 SQL authorization ID = ABAKY Local database alias = TSSAMPLE
- Update the catalog table sysibmts.tsservers with server information
and run the procedure
db2 "UPDATE SYSIBMTS.TSSERVERS SET (HOST, PORT, TOKEN) =('beta4.in.ibm.com', 55005, ' mcQPjAU=')" DB20000I The SQL command completed successfully. db2 => CALL SYSPROC.SYSTS_CONFIGURE ('', 'en_US',?) Value of output parameters -------------------------- Parameter Name : MESSAGE Parameter Value : CIE00001 Operation completed successfully. Return Status = 0
- Create a table with a text column and insert some data into it
db2 "CREATE TABLE books (isbn VARCHAR(18) not null PRIMARY KEY, author VARCHAR(30), story VARCHAR(50), year Integer)" DB20000I The SQL command completed successfully. db2 "INSERT INTO books VALUES ('0-13-086755-2','Mike','The cat hunts some mice.', 2000)" DB20000I The SQL command completed successfully. db2 "INSERT INTO books VALUES ('0-13-086755-3','Peter','Some men were standing beside the table.',1999)" DB20000I The SQL command completed successfully.
- Create an index on the text column and perform an index update:
db2ts "CREATE INDEX db2ts.myTextIndex FOR TEXT ON books (story) CONNECT TO tssample" CIE00001 Operation completed successfully. db2ts "UPDATE INDEX db2ts.myTextIndex FOR TEXT CONNECT TO tssample" CIE00001 Operation completed successfully.
- Perform a search on the indexed content:
db2 "SELECT author, story FROM books WHERE CONTAINS (story, '\"cat\"') = 1 AND YEAR >= 2000" AUTHOR STORY ------------------------------ ----------------------------- Mike The cat hunts some mice 1 record(s) selected.
Problem: Index update fails with an error CIE0460E Parallel index update errors across multiple database partit. SQLSTATE=38H14
Solution: Update the SYSIBMTS.TSSERVERS including the hostname, port number, and token, and execute the SYSTS_CONFIGURE procedure.
Problem: Stand-alone Text Search server failing to start using the startup script with the error IQQD0056E An error occurred when starting the server at port 8191.IQQG0335E The server cannot start. See the error log file for more details.ExtendedException: D0056E.CONNECT_ERROR - D0056E.CONNECT_ERROR
Solution: To resolve this error message, perform either of the following steps:
- Set the port number to a new value by issuing the
configTool configureParams -adminHTTPPort new_port_value -configPath ecmts_install_directory\config\config.xml.
- After setting the port number to a new value, try running the startup script again. Uninstall the stand-alone server and then reinstall the server by setting the port number to a new value that is defined in the ecmts_response.txt response file.
- Set the port number to a new value by issuing the following command:
Problem: Index update fails with an error IQQI0061E The document with ID [AAgAAAAAAAohJg==] cannot be indexed due to inadequate disk space or permissions. Verify that there is sufficient disk space and permissions to access the disk. Then run the indexing process again. IQQG0020E java.io.IOException:No space left on device"
Solution: Increase the disk space and run the index update again.
Problem: Trying to create the index without starting the TS server create index will fail with error CIE00756E Connect failed on host.
Solution: Run the command to start the TS server by using db2ts "start for Text" and then rerun the create index command.
Problem: IQQG0037W message in a query about a missing collection after a data redistribution.
Solution: Ensure that the FOR DATA REDISTRIBUTION option is used the next time a text search UPDATE INDEX command is issued.
The article describes the concepts behind the text index partitioning scheme and its impact on administration. It also details the steps to install and configure DB2 Text Search in a partitioned environment, the configuration for text search of a sample partitioned database setup, text indexes and collections, monitoring features, workload control options, and some of the troubleshooting hints for DB2 Text Search partitioned setup.
Thanks to Marion Behnen and Sobha Rani Cheruku for reviewing this article.
- Some key topics from the Information Center include:
- In "DB2 Text Search, Part 1: Full text search (developerWorks, April 2010), get the details on the basics of DB2 text search, including text index creation and administration. Explore specific features of DB2 full-text search like setting up a database to support text search and walking through a scenario to get some experience for setting up your own text searches.
- "Enable C++ applications for Web services using XML-RPC " (developerWorks, June 2006) is a step-by-step guide to exposing C++ methods as services.
- In the XML area on developerWorks, get the resources you need to advance your XML skills, including DTDs, schemas, and XSLT.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Follow developerWorks on Twitter .
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
- Get involved in the My developerWorks community . Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.