Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

DB2 Text Search, Part 1: Full text search

Prashant Shettar (pshettar@in.ibm.com), System Software Engineer, IBM
Prashant Shettar is a Systems Software Engineer at IBM’s Information Management Group (part of IBM Software Group) in Bangalore, India.
Marion Behnen (mbehnen@us.ibm.com), DB2 Text Search Development, IBM
Marion Behnen is a Senior Software Engineer within the IBM Software Group and works as a technical lead for full-text search in DB2. She has more than 20 years of experience with database application development, business reporting, data warehousing, and business process integration. Prior to joining IBM, she was involved with many aspects of business process and data integration, in particular for the manufacturing industry.
Shantanu Mundkur (mundkur@us.ibm.com), DB2 Text Search Development, IBM
Shantanu Mundkur is an Advisory Software Engineer at IBM's Silicon Valley Laboratory. He is part of the IBM DB2 for Linux, UNIX and, Windows development team and works in the area of full-text search. He has a broad range of software product development experience, and has designed and implemented features for various database servers including DB2 and Informix.

Summary:  DB2 Text Search enables an IBM DB2 for Linux, UNIX, and Windows database user to create applications with full text-search capabilities by embedding full text-search clauses in SQL and XQuery statements. In this tutorial, you'll set up a database to support text search, and walk through a scenario to get some experience for setting up your own text searches.

Note: This tutorial has been updated to better present examples and code commands to help prevent users less familiar with DB2 from encountering problems.

View more content in this series

Date:  15 Apr 2010 (Published 16 Oct 2008)
Level:  Introductory PDF:  A4 and Letter (250 KB | 15 pages)Get Adobe® Reader®

Activity:  51512 views
Comments:  

Create text-search indices

After the database has been enabled for text search, you can create text-search indices on text data types and XML data types. For each text-search index, a text-search collection is created on the file system, which holds the significant terms that are extracted from the documents.

DB2 Text Search supports the following document formats:

  • Text: Plain text document format
  • HTML: Hypertext Markup Language
  • XML: Extensible Markup Language

To create a plain text-search index on the "title" column, run the following command:


Listing 9. Command to create a text-search index on column 'title'

db2ts "CREATE INDEX myschema.mytitleidx FOR TEXT ON books(title)"

To create an index on the "bookinfo" XML document column, run:


Listing 10. Command to create a text-search index on column 'bookinfo'

db2ts "CREATE INDEX myschema.myxmlidx FOR TEXT ON books(bookinfo)"

Note that a text-search index that is created on an XML data type column is automatically created using the document format "XML", which enables using XQUERY/XPath syntax in your search queries. For other data types, the default format is "TEXT."

After completion of the command, you find an empty index, new insert, update, and delete triggers on the data column to record the changes to the data, and additional system catalog tables to manage the new text-search index.

To view the index characteristics, use the informational views provided by DB2 text search:


Listing 11. Query informational views

SELECT * FROM sysibmts.tscollectionnames
SELECT * FROM sysibmts.tsindexes

Note that an event view and a staging view have been created for the new text-search index. The event view holds informational and error messages about index administration operations, while the staging view represents the change log for the data updates, which is used to synchronize the content of the data table with the content of the text-search index.

Populate text-search indices

After creating a text-search index, an empty index structure has been prepared. Updating the text-search index for the first time (initial update) will load and index the data from the column for which the text-search index was created. DB2 Text Search uses linguistic processing to parse the texts and documents.

To populate the text-search indices, use the following commands:


Listing 12. Command to populate the text-search indices
				
db2ts "UPDATE INDEX myschema.mytitleidx FOR TEXT" 
db2ts "UPDATE INDEX myschema.myxmlidx FOR TEXT"

After successful completion, the text-search index is marked such that each subsequent update command is executed as an incremental update that uses the change log recorded in the staging view. Incremental updates are applied asynchronously, that is, only when an UPDATE INDEX command is issued, the text search index is synchronized with the base table rows. The update can be started manually, as shown here, or controlled through the scheduling mechanism. For details about scheduling options, see the DB2 Text Search User Guide.

3 of 10 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=482473
TutorialTitle=DB2 Text Search, Part 1: Full text search
publish-date=04152010
author1-email=pshettar@in.ibm.com
author1-email-cc=
author2-email=mbehnen@us.ibm.com
author2-email-cc=
author3-email=mundkur@us.ibm.com
author3-email-cc=