Crawler administration

You configure crawlers for the different types of data that you want to include in a collection. A single collection can contain any number of crawlers.

A crawler has two primary functions. When you configure a crawler, the discovery processes determine which sources are available in a data source, such as the names of all of the views and folders in a Lotus Notes® database. After you start a crawler, the crawler processes copy data from the sources that you select to a data store so that the parser can prepare the content for indexing and searching.

Configuring crawlers

You use the administration console to create, edit, and delete crawlers. Typically, an expert in the type of data being crawled configures the crawler. For example, to set up a crawler to crawl Lotus Notes data sources, the collection administrator should either be a Notes® administrator or work closely with someone who is knowledgeable about the databases that are being crawled.

When you create a crawler, a wizard helps you specify crawler properties (options that describe the crawler and set limits on how it uses system resources) and add sources to the crawl space (the set of data sources that a particular crawler is to crawl).

You can make changes to existing crawlers at any time. You can edit crawler properties or parts of the crawl space as needed. Crawler wizards also help you to make these changes.

Populating a new crawler with base values

You can create a crawler by using the system default values or by copying values that are specified for another crawler of the same type. If you use an existing crawler as the base for a new crawler, you can quickly create multiple crawlers that have similar properties and then configure them, for example, to crawl different sources or operate on different crawling schedules.

By copying a crawler, you can divide the crawling workload among multiple crawlers that use the same crawling rules. For example, you might copy a Notes crawler because you want to use the same properties and field crawling rules with a different Lotus Notes server. The only differences might be the databases that each crawler crawls and document-level security settings.

Combining crawler types in a collection

Crawlers are designed to gather information from specific types of data sources. When you configure crawlers for a collection, you must decide how to combine these different data source types so that users can easily search your enterprise data. For example, if you want users to be able to search Microsoft Windows file systems and Microsoft Exchange Server public folders with a single query, create a collection that includes Windows file system crawlers and Exchange Server crawlers.

When you combine multiple types of crawlers in a single collection, ensure that all of the crawlers can use the same static ranking method. (You specify the static ranking method when you create the collection.) For example, if you combine Web sources (which use document links as a ranking factor) and NNTP sources (which typically use the document date as a ranking factor), the quality of the search results might be degraded.

Configuring document-level security

If you enable security for a collection when you create it, you can configure document-level security options. Each crawler can associate security tokens with the documents that it crawls. If you specify that you want to use document-level security when you configure the crawler, the crawler associates the security tokens that you specify with each document, and these tokens are added to the index with the documents.

If you enable security in your custom enterprise search applications, your applications can use the security tokens that the crawlers associated with documents to authenticate users. This capability enables you to restrict access to some documents in a collection and to allow other documents to be searched by all users. For example, in one collection you might allow all users to access all of the documents in your Microsoft Exchange Server public folders, but allow only users with specific user IDs to access documents in your Lotus Notes databases.

You can apply custom business rules to determine the value of the security tokens by encoding the rules in a Java™ class. When you configure crawler properties, you specify the name of the plug-in that you want the crawler to use when it crawls documents. The security tokens that your plug-in adds are stored in the index and can be used to control access to documents.

When you configure certain types of crawlers, you can specify additional security controls. For example, you can specify that you want to validate users during query processing. If you enable this option, the user's credentials are compared to current access control lists that are maintained by the data sources to be searched. This validation of current credentials can be done instead of or in addition to validation that is based on security tokens in the index.