DB2 crawlers

You use the DB2 crawler to include IBM® DB2® databases in a collection.

If you use IBM Federation Server to federate and create nickname tables for the following database system types, you can use the DB2 crawler to crawl the tables through the nicknames:
  • CA-Datacom
  • IBM DB2 for z/OS®
  • DB2 for iSeries
  • IBM Informix®
  • IMS
  • Oracle
  • Microsoft SQL Server
  • Software AG Adabas
  • Sybase
  • VSAM

You must configure a separate crawler for each database server that you want to crawl. When you configure the crawler, you specify options for how the crawler is to crawl all databases on the same server. You also select the specific tables that you want to crawl in each database.

The tables that you select for crawling must be database tables, nickname tables, or views. The DB2 crawler does not support joined tables.

Crawler connection credentials

When you create the crawler, you can specify credentials that allow the crawler to connect to the sources to be crawled. You can also configure connection credentials when you specify general security settings for the system. If you use the latter approach, multiple crawlers and other system components can use the same credentials. For example, the search servers can use the credentials when determining whether a user is authorized to access content.

Crawler server configuration

Before you can crawl database tables, you must install the 64-bit version of the DB2 Client on the crawler server. You must then run the escrdb2 script on the crawler server. This script, which is provided with Watson Explorer Content Analytics, enables the DB2 crawler to communicate with database servers.

Event publishing

If you use IBM Data Event Publisher, and if you associate the databases that you want to crawl with publishing queue maps, the DB2 crawler can use the maps to crawl updates to the database tables.

A publishing queue map identifies a WebSphere MQ queue that receives XML messages when updates to a database table are published. The crawler listens to the queue for information about these published events and updates the crawl space when tables are updated (the first time that the crawler crawls a table, the crawler crawls all of the documents).

Event publishing allows new and changed documents to become available for searching on a faster basis than documents that the crawler crawls according to the crawler schedule.

If some or all of the tables are configured to use event publishing, you can specify information that enables the crawler to access WebSphere MQ and the publishing queue maps when you configure the crawler.

You must also ensure that WebSphere MQ and Data Event Publisher are configured on the server to be crawled, and that the WebSphere MQ client module is configured on the crawler server. Complete the following tasks to use event publishing with a DB2 crawler:

Configuration overview

When you create the crawler, a wizard helps you do these tasks:
  • Specify properties that control how the crawler operates and uses system resources. The crawler properties control how the crawler crawls all of the databases on a particular database server.
  • Specify information about the types of databases that you want to crawl.

    If you plan to crawl remote databases that are not cataloged on the local database server, you must start the DB2 Administration Server on the remote server before you can use the DB2 crawler to crawl those databases. You must also specify the host name and port of the remote database server when you configure the crawler.

  • Specify the databases that you want to crawl.

    If you set up a DB2 crawler with an IBM DB2 client that does not support Configuration Assistant, databases cannot be discovered by the discovery processes. You must specify the name of the databases to crawl when you configure the crawler.

  • Specify user IDs and passwords that enable the crawler to access databases that use access controls.
  • Set up a schedule for crawling the databases.
  • Select the tables that you want to crawl in each database.
    Attention: To optimize the performance of the discovery processes (and to prevent the crawler configuration process from timing out), choose to crawl all tables only if the database does not contain many tables or if the tables do not contain many columns. If you select some tables to crawl now, you can edit the crawl space later and add more tables to the collection.
  • Select the tables that are to be crawled when updates to them are published in an event publishing queue, and specify information that enables the crawler to access the event publishing queue.
  • Configure document-level security options. If security was enabled when the collection was created, the crawler can associate security data with documents in the index. This data enables applications to enforce access controls based on the stored access control lists or security tokens.
  • Specify options for making the columns in specific tables searchable. For example, you can enable certain columns to be used in parametric queries or specify which columns can be returned in the search results.
  • Specify options for searching document content. Each table can have one column that can be treated as though it contains the main content of the document. To help applications find and retrieve this content, such as content in .doc or .pdf documents, provide the crawler with information about the content.

    For example, if a column in the table contains large binary content (such as documents with the BLOB or CLOB data type), you can configure the crawler to treat one of those columns as content. BLOB or CLOB columns are not treated like content fields unless you configure the crawler to handle them as such. If a binary column is configured to be a content field, the binary content is analyzed as text and treated like the body of the document. Users can query the content with free text queries and view the content in document summaries in the results.

    You can improve retrievability by specifying the MIME type, such as application/msword or application/pdf, and by specifying options for determining the code page that was used to encode the content.