FileNet P8 crawlers

You can configure a FileNet P8 crawler to include IBM® FileNet® P8 folders and document classes in a collection.

Crawler connection credentials

When you create the crawler, you can specify credentials that allow the crawler to connect to the sources to be crawled. You can also configure connection credentials when you specify general security settings for the system. If you use the latter approach, multiple crawlers and other system components can use the same credentials. For example, the search servers can use the credentials when determining whether a user is authorized to access content.

Social search

If you create a collection that supports social search, you can collect person information that enables users to explore relationships between people, documents, and tags. For example, users can see person cards for people relevant to a query, see recommendations for related documents and people, explore relationships through a social network graph, and explore tags through a weighted tag cloud. For more information, read about support for social search and creating person crawlers.

Compound documents

If a document contains multiple parts, and you want all parts of the document to be treated as a single document in the search results, you can configure the crawler to support compound documents. In this case, a parent document that contains child documents can be searched as a single document. If the search terms are found, all of the child documents are listed with the parent document in the search results. If support for compound documents is not enabled in the crawler configuration, the parent and child documents are searched separately and returned as separate documents in the search results. For more information, read about support for crawling compound documents.

Crawler server configuration

To integrate IBM FileNet P8 with Watson Explorer Content Analytics, you must install the IBM FileNet Content Engine Client on the crawler server. Before you use the administration console to configure a FileNet P8 crawler, complete the following task:

Configuration overview

When you configure the crawler, you specify options for crawling documents in folders and document classes. To create or change a FileNet P8 crawler, log in to the administration console.

When you create the crawler, a wizard helps you do these tasks:
  • Specify properties that control how the crawler operates and uses system resources. The crawler properties control how the crawler crawls all of the documents that you add to the crawl space.
  • Specify information that enables the crawler to access the FileNet P8 server. When you configure the connection, you specify the Content Engine Web Service URL and a valid user name and password. You can also specify the location of an SSL keystore file on local file system of the crawler server. This value is set as the javax.net.ssl.trustStore property for the SSL connection to a Content Engine Web Service server and it is used to authenticate a server by using trusted certificate authority (CA).
  • Select the folders or document classes that you want to include in the crawl space. When you select the targets to be crawled, the top level of the hierarchy tree shows the object stores on the server. On second level, the Root Folder and Root Class are displayed. If you select the Root Folder, a list of root folders in the object store is displayed on third level. If you select a Root Class, a list of root document classes in the object store is displayed on third level. After expanding the root folders and root classes, select one or more folders or document classes to crawl.
  • Specify options for including and excluding specific documents, and options for how content can be searched.
  • Select the metadata fields that you want to include and map them to index fields:
    • When you crawl document classes, the crawler detects properties of the classes. You can populate these classes as crawler metadata fields by specifying field properties.
    • When you crawl folders, the crawler automatically adds four common properties as metadata fields: content_size, crawled_date, docname, filename, and lastmodified_date. To add other properties as crawler metadata, edit the crawl space and specify the property names as crawler metadata fields.
  • Set up a schedule for crawling individual folders and document classes.
  • Configure document-level security options. If security was enabled when the collection was created, the crawler can associate security data with documents in the index. This data enables applications to enforce access controls based on the access control lists or security tokens.

    You can also select an option to validate user credentials when a user submits a query. In this case, instead of comparing user credentials to indexed security data, the system compares the credentials to current access control lists that are maintained by the original data source.