DB2 Text Search enables an IBM DB2 for Linux, UNIX, and Windows database user to create applications with full text-search capabilities by embedding full text-search clauses in SQL and XQuery statements. In this tutorial, you'll set up a database to support text search, and walk through a scenario to get some experience for setting up your own text searches.

Note: This tutorial has been updated to better present examples and code commands to help prevent users less familiar with DB2 from encountering problems.

Prashant Shettar (pshettar@in.ibm.com), System Software Engineer, IBM

Prashant Shettar is a Systems Software Engineer at IBM’s Information Management Group (part of IBM Software Group) in Bangalore, India.



Marion Behnen (mbehnen@us.ibm.com), DB2 Text Search Development, IBM

Marion Behnen is a Senior Software Engineer within the IBM Software Group and works as a technical lead for full-text search in DB2. She has more than 20 years of experience with database application development, business reporting, data warehousing, and business process integration. Prior to joining IBM, she was involved with many aspects of business process and data integration, in particular for the manufacturing industry.


developerWorks Contributing author
        level

Shantanu Mundkur (mundkur@us.ibm.com), DB2 Text Search Development, IBM

Shantanu Mundkur is an Advisory Software Engineer at IBM's Silicon Valley Laboratory. He is part of the IBM DB2 for Linux, UNIX and, Windows development team and works in the area of full-text search. He has a broad range of software product development experience, and has designed and implemented features for various database servers including DB2 and Informix.



15 April 2010 (First published 16 October 2008)

Also available in Vietnamese

Before you start

Learn what to expect from this tutorial, and how to get the most out of it.

About this tutorial

DB2 Text Search is an integrated component of DB2 9.5 and is powered by the IBM OmniFind™ Text Search server. It provides the following features:

  • Full text search in text, HTML, and XML documents, including Boolean and wildcard search
  • Fully integrated SQL, SQL/XML, and XQuery support, including XPath syntax subset to search XML documents
  • Linguistic processing with optional synonyms definition
  • Asynchronous index update with scheduling option

This tutorial demonstrates how to use the basic features of DB2 Text Search to search in plain text and XML documents stored in the database and guides you through the following tasks:

  • Preparing the database setup
  • Creating text-search indices for text data and XML documents
  • Populating text-search indices
  • Searching in plain text
  • Determining the relevance of results
  • Searching in XML documents
  • Updating text-search indices
  • Cleaning up

System requirements and prerequisites

To complete the following steps, you must have at least IBM DB2 9.5 for Linux, UNIX, and Windows, Fix Pack 1 installed on the system. Note that you cannot use DB2 Text Search in a partitioned database environment on a DB2 instance with multiple partitions.

To proceed through the tutorial using a step-by-step approach, the commands can be executed as DB2 CLP interactive commands using the db2 -t command. The db2ts text-search index administration commands have to be run at the operating system prompt. However, these commands can be executed in the same window by using the shell command (!). In this mode, every statement would also have to use the statement termination character—a semicolon (;).

To run the examples in this article, make sure that you have DBADM authority on the DB2 database server that you want to use. If you create the database as described in the next section, this authority is automatically assigned. The DBADM authority is needed to manage the text-search instance services and to create and modify the database used in the examples.


Set up a database for text search

Steps to set up a test database

  1. Specify the default database. DB2 Text Search provides the option to set an environment variable to specify the database that should be used for text-search index administration commands. If the necessary user authorizations are in place, you can issue db2ts commands without specifying a connection clause. The environment variable can be set as follows:
    Listing 1. Setting DB2DBDFT to specify the default database
    Linux/UNIX (ksh):
        export DB2DBDFT=mytstest
    Windows:
        SET DB2DBDFT=mytstest
  2. Create a database. In this step, a database is created that is suitable to store XML documents. Issue the following command at the DB2 CLP command prompt:
    Listing 2. Command to create the database
    CREATE DATABASE mytstest USING CODESET UTF-8 TERRITORY US
  3. Start the text-search instance services. To start the text-search instance services, use the command:
    Listing 3. Command to start text-search services
    db2ts "START FOR TEXT"

    On Linux and UNIX operating systems, this command starts a daemon. This daemon controls the scheduling of text-search index updates on the DB2 database server. The text-search server is also started and runs as an independent process. The daemon and the text-search server are started under the authorization name of the DB2 instance owner.

    On Windows operating systems, this command starts the DB2TS service. The DB2TS service starts the text-search server and henceforth controls the scheduling of text-search index updates. The user running the command must meet the Windows requirements for starting a service.

  4. Enable the database for text search. DB2 Text Search manages information about text-search indices to support administration tasks like index updates or gathering status. Enabling the database creates the system tables and views that contain this information.
    Listing 4. Command to enable the database for text search
    db2ts "ENABLE DATABASE FOR TEXT"

    Note that if DB2DBDFT has not been set, the enablement command needs to include the connection information, as shown in the following example:

    Listing 5. Command to enable the database for text with specific connection information
    db2ts "ENABLE DATABASE FOR TEXT CONNECT TO mytstest USER db2admin USING mypswd"
  5. Connect to the database mytstest:
    Listing 6. Connect to the database mytstest
    CONNECT TO mytstest
  6. Create and populate a table:
    Listing 7. Table books
    CREATE TABLE books ( isbn VARCHAR(18) NOT NULL PRIMARY KEY,
                         author VARCHAR(30), 
                         title VARCHAR(128), 
                         year INTEGER, 
                         bookinfo XML)

    Note that to enable creating text-search indices and running text-search queries, a primary key must be specified for the data table.

  7. Insert the following data into the books table:
    Listing 8. Insert data into table books
    INSERT INTO books VALUES
        ('123-014014014', 'Joe Climber', 'Climber''s Mountain Tops', 1995,
         XMLPARSE(DOCUMENT
             '<bookinfo>  
              <author>Joe Climber</author> 
              <title>Climber''s Mountain Tops</title>
              <story>This vivid description of Joe Climber''s experiences 
                  when tackling the mountains of his native Mountainland lets you 
                  hold your breath when you follow Joe on his adventures in the 
                  regions where the air is thin and the weather is treacherous. 
                  Includes beautiful color photos of Mountainland''s mountain 
                  ranges.
              </story> 
              <year>1995</year> 
              <price>16.00</price>
              <pages>176</pages>
              </bookinfo>'))
    
    INSERT INTO books VALUES
        ('678-014014078', 'Joe Smith', 'The Range', 1991, 
         XMLPARSE(DOCUMENT
             '<bookinfo> 
              <author>Joe Smith</author>
              <title>The Range</title>
              <story>All you need to know about kitchen ranges. A pictured 
                  description based on the most recent ergonomics studies for 
                  everybody who is involved in food preparation in the home.
              </story> 
              <year>1991</year> 
              <price>6.00</price>
              <pages>76</pages>
              </bookinfo>'))
    
    INSERT INTO books VALUES
        ('111-223334444', 'Sam Climber', 'Top of the Mountain: Mountain Lore', 1966,
         XMLPARSE(DOCUMENT
             '<bookinfo> 
              <author>Sam Climber</author>
              <title>Top of the Mountain: Mountain Lore</title>
              <story>Sam Climber has traveled through the world to gather 
                  stories about mountains. This compendium includes the best 
                  stories and is beautifully illustrated.
              </story>
              <year>1966</year>
              <price>20.00</price>
              <pages>449</pages>
              </bookinfo>'))
     
     INSERT INTO books VALUES
         ('777-010101010', 'Samantha Smitt', 'The Database Compendium', 2001,
          XMLPARSE(DOCUMENT
              '<bookinfo> 
               <author>Samantha Smitt</author> 
               <title>The Database Compendium</title>
               <story>Follow Samantha into the world of database management. 
                   Covers a wide range of the most popular database architectures.
               </story> 
               <year>2001</year> 
               <price>19.00</price>
               <pages>222</pages>
               </bookinfo>'))
    
      INSERT INTO books VALUES
          ('123-918273645', 'Joanne Miller', 'The Travel Companion', 2005, 
           XMLPARSE(DOCUMENT
               '<bookinfo> 
                <author>Joanne Miller</author> 
                <title>The Travel Companion </title>
                <story>If you like to travel, this is the book for you. 
                    Joanne Miller has compiled her funny and often surprising diary 
                    entries about places and people which were previously published 
                    as travel log in ''Here and There''
                </story> 
                <year>2005</year>
                <price>31.00</price>
                <pages>321</pages>
                </bookinfo>'))

Create text-search indices

After the database has been enabled for text search, you can create text-search indices on text data types and XML data types. For each text-search index, a text-search collection is created on the file system, which holds the significant terms that are extracted from the documents.

DB2 Text Search supports the following document formats:

  • Text: Plain text document format
  • HTML: Hypertext Markup Language
  • XML: Extensible Markup Language

To create a plain text-search index on the "title" column, run the following command:

Listing 9. Command to create a text-search index on column 'title'
db2ts "CREATE INDEX myschema.mytitleidx FOR TEXT ON books(title)"

To create an index on the "bookinfo" XML document column, run:

Listing 10. Command to create a text-search index on column 'bookinfo'
db2ts "CREATE INDEX myschema.myxmlidx FOR TEXT ON books(bookinfo)"

Note that a text-search index that is created on an XML data type column is automatically created using the document format "XML", which enables using XQUERY/XPath syntax in your search queries. For other data types, the default format is "TEXT."

After completion of the command, you find an empty index, new insert, update, and delete triggers on the data column to record the changes to the data, and additional system catalog tables to manage the new text-search index.

To view the index characteristics, use the informational views provided by DB2 text search:

Listing 11. Query informational views
SELECT * FROM sysibmts.tscollectionnames
SELECT * FROM sysibmts.tsindexes

Note that an event view and a staging view have been created for the new text-search index. The event view holds informational and error messages about index administration operations, while the staging view represents the change log for the data updates, which is used to synchronize the content of the data table with the content of the text-search index.

Populate text-search indices

After creating a text-search index, an empty index structure has been prepared. Updating the text-search index for the first time (initial update) will load and index the data from the column for which the text-search index was created. DB2 Text Search uses linguistic processing to parse the texts and documents.

To populate the text-search indices, use the following commands:

Listing 12. Command to populate the text-search indices
db2ts "UPDATE INDEX myschema.mytitleidx FOR TEXT" 
db2ts "UPDATE INDEX myschema.myxmlidx FOR TEXT"

After successful completion, the text-search index is marked such that each subsequent update command is executed as an incremental update that uses the change log recorded in the staging view. Incremental updates are applied asynchronously, that is, only when an UPDATE INDEX command is issued, the text search index is synchronized with the base table rows. The update can be started manually, as shown here, or controlled through the scheduling mechanism. For details about scheduling options, see the DB2 Text Search User Guide.


Search with DB2 Text Search

The following built-in DB2 search functions are available for full text search in DB2:

  • Query for a word or phrase with CONTAINS() or xmlcolumn-contains()
  • Get the relevance of a document with SCORE()

Words or phrases can be combined with Boolean operators (AND, OR, NOT) and masked with wildcards (?, *) to limit or extend the search scope.

Search in plain text

Use the CONTAINS keyword to submit a search query:

Listing 13. Basic syntax for text search
SELECT author, year, SUBSTR(title,1,30)  
FROM books 
WHERE CONTAINS(title, 'mountain') = 1



AUTHOR                         YEAR        3
------------------------------ ----------- ------------------------------
Joe Climber                           1995 Climber's Mountain Tops
Sam Climber                           1966 Top of the Mountain: Mountain

  2 record(s) selected.

The CONTAINS function searches the text-search index for the specified search argument and returns whether or not a match was found. If the document contains a match for the search argument, the result is 1, otherwise it is 0.

DB2 Text Search automatically uses stemmed forms in the search. In the preceding example, the search therefore returns not only occurrences of the term "mountain" but of "mountains" and other derivative terms.

To find rows that contain a combination of words, use operators AND, OR, and NOT to include or exclude documents, with AND being the default to combine multiple terms. The following query returns rows that contain both the terms "mountain" and "top" in the column title, no matter where each term occurs in the column.

Listing 14. Use of default Boolean operator AND in the text-search function
SELECT author, year, SUBSTR(title,1,30)
FROM books
WHERE CONTAINS(title, 'mountain top') = 1 
 
 
AUTHOR                         YEAR        3
------------------------------ ----------- ------------------------------
Joe Climber                           1995 Climber's Mountain Tops
Sam Climber                           1966 Top of the Mountain: Mountain

  2 record(s) selected.

Note that embedding the terms in quotes will search for the exact phrase "mountain tops":

Listing 15. Finding an exact match by using quotes for the search argument
SELECT author, year, SUBSTR(title,1,30)
FROM books
WHERE CONTAINS(title, '"mountain tops"') = 1
   
   
AUTHOR                         YEAR        3
------------------------------ ----------- ------------------------------
Joe Climber                           1995 Climber's Mountain Tops

  1 record(s) selected.

The wildcards ? (for a single character) and * (for multiple characters) return documents that match a partial term:

Listing 16. Search using wildcards
SELECT author, year, SUBSTR(title,1,30)
FROM books
WHERE CONTAINS(title, 'comp*') = 1


AUTHOR                         YEAR        3
------------------------------ ----------- ------------------------------
Samantha Smitt                        2001 The Database Compendium
Joanne Miller                         2005 The Travel Companion

  2 record(s) selected.

Using a wildcard at the beginning of a search term should be avoided, as it has a significant adverse impact on query performance.

If you want to retrieve a sample of results, you can limit the number of documents with the RESULTLIMIT argument:

Listing 17. Limiting the number of documents with the RESULTLIMIT argument
SELECT author, year, SUBSTR(title,1,30)
FROM books
WHERE CONTAINS(title, 'mountain', 'RESULTLIMIT=1') = 1 


AUTHOR                         YEAR        3
------------------------------ ----------- ------------------------------
Joe Climber                           1995 Climber's Mountain Tops

  1 record(s) selected.

Retrieve results based on their relevance

Use the SCORE function to determine how well a result matches the search term(s) compared to other results for the query.

Listing 18. Getting the relevance of documents by using the SCORE function
SELECT title
FROM books
WHERE CONTAINS(title, 'mountain') = 1
ORDER BY SCORE(title, 'mountain') DESC
 
 
TITLE
-------------------------------------------------------------------
Top of the Mountain: Mountain Lore
Climber's Mountain Tops

  2 record(s) selected.

The score for a document is expressed as a value between 0 and 1 and takes into account the number of occurrences of the search term(s) in a document compared with other documents in the result set, that is, it is dynamically calculated for the current result set.

Search XML documents

You can use several different options to search in XML documents:

  • Searching with SQL/XML:
    Listing 19. Use of text-search functionality in combination with XMLQUERY()
    SELECT 
    xmlquery('$bi//author' passing bookinfo as "bi")  
    FROM books WHERE CONTAINS(bookinfo, 'range') = 1 
     
     
    1
    -------------------------------------------------------------------
    <author>Joe Climber</author>
    <author>Joe Smith</author>
    <author>Samantha Smitt</author>
      3 record(s) selected.
  • Searching with XQuery
    Listing 20. Text search in the XQuery context
    xquery db2-fn:xmlcolumn-contains('BOOKS.BOOKINFO', 'range')/bookinfo/author
     
    
     1
     -------------------------------------------------------------------
    <author>Joe Climber</author>
    <author>Joe Smith</author>
    <author>Samantha Smitt</author>
      3 record(s) selected.
  • Searching with XPath clauses
    Listing 21. Text search with XPath expression
    Example: 1
    SELECT author, year, SUBSTR(title,1,30)
    FROM books 
    WHERE CONTAINS(bookinfo,
                   '@xpath:''/bookinfo/story [. contains("range")]''') = 1 
    
    
    
    AUTHOR                         YEAR        3
    ------------------------------ ----------- ------------------------------
    Joe Climber                           1995 Climber's Mountain Tops
    Joe Smith                             1991 The Range
    Samantha Smitt                        2001 The Database Compendium
    
      3 record(s) selected.    
    
    Example: 2
    xquery 
    db2-fn:xmlcolumn-contains(
        'BOOKS.BOOKINFO',
        '@xpath:''/bookinfo/story[. contains("range")]''')/bookinfo/author
    
    
    <author>Joe Climber</author>
    <author>Joe Smith</author>
    <author>Samantha Smitt</author>
    
      3 record(s) selected

Update text-search indices

Changes to the data column (insert, update, delete of a row) are tracked in the staging view and applied to the text-search index with an incremental index update.

Listing 22. Insert another row into the table 'books'
INSERT INTO books VALUES
    ('456-456456456', 'John Doe', 'The Database Book', 2005, 
     XMLPARSE(DOCUMENT
         '<bookinfo> 
          <author>John Doe</author> 
          <title>The Database Book</title>
          <story>The ultimate book about contemporary databases.
          </story>
          <year>2005</year>
          <price>55.00</price>
          <pages>176</pages>
          </bookinfo>'))

To synchronize the text-search index with the data column, update the indices as follows:

Listing 23. Populate the text-search indices
db2ts "UPDATE INDEX myschema.mytitleidx FOR TEXT"
db2ts "UPDATE INDEX myschema.myxmlidx FOR TEXT"

In case of warnings or errors, review the event view for the text search index:

Listing 24. Review the event view for the text-search index
SELECT * FROM <eventviewschema>.<eventviewname>

with <eventviewschema> and <eventviewname> values as listed by db2 "SELECT eventviewschema, eventviewname FROM sysibmts.tsindexes".

After a successful index update, the new data is available in the text search index and can be searched:

Listing 25. Text search with XPath expression
SELECT author, year, SUBSTR(title,1,30)
FROM books
WHERE CONTAINS(bookinfo,
               '@xpath:''/bookinfo/story [. contains("database")]''') = 1 



AUTHOR                         YEAR        3
------------------------------ ----------- ------------------------------
Samantha Smitt                        2001 The Database Compendium
John Doe                              2005 The Database Book

  2 record(s) selected.

Clean up

Dropping a table or a database does not automatically drop the text search collections. Before dropping a table or database, use the following steps to ensure all artifacts are removed:

Listing 26. Command to drop text-search indices
db2ts "DROP INDEX myschema.mytitleidx FOR TEXT"
db2ts "DROP INDEX myschema.myxmlidx FOR TEXT"

Disabling the database for text-search drops the system tables and views that maintain text-search metadata:

Listing 27. Command to disable the database for text search
db2ts "DISABLE DATABASE FOR TEXT"

The disable command fails if a text-search index still exists in the database. You can enforce the command by using the force option disable database for text force, but it is recommended to remove text search indices with the drop index for text command.

Stop the text-search instance services to stop the processes that were started for supporting full text search:

Listing 28. Command to stop text-search services
db2ts "STOP FOR TEXT"

Finally, you can drop the test database to conclude this tutorial:

Listing 29. Disconnect from and drop the test database
 DISCONNECT CURRENT
 DROP DATABASE mytstest

Conclusion

This tutorial has guided you through the setup of the DB2 Text Search environment, preparing and updating text-search indices, and using the CONTAINS and SCORE functions. To find out more about DB2 Text Search capabilities, refer to the DB2 Text Search User Guide.

Acknowledgements

The authors would like to thank Roland Seuchter for his help in developing this tutorial.


Download

DescriptionNameSize
Sample scripts for this tutorialsamplescripts.zip8KB

Resources

Learn

Get products and technologies

  • Download a free trial version of DB2 9 for Linux, UNIX, and Windows.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from IBM Information Management, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=482473
ArticleTitle=DB2 Text Search, Part 1: Full text search
publish-date=04152010