IBM Content Analytics with Enterprise Search, Version 3.0.0

Crawler plug-ins for non-web sources

Data source crawler plug-ins are Java applications that can change the content or metadata of crawled documents. You can configure a data source crawler plug-in for all non-web crawler types.

With the crawler plug-in for non-web data source crawlers, you can add, change, or delete crawled content or metadata. You can also create a plug-in for extracting files from archive files and extend that plug-in to enable users to view the extracted content when they view the search results.

When you specify the Java class as the new crawler plug-in, the crawler calls the class for each document that it crawls.

For each document, the crawler passes to your Java classes the document identifier, the security tokens, the metadata, and the content that was specified by an administrator. Your Java class can return a new or modified set of security, metadata, and content.

Restriction: The crawler plug-in allows you to add security tokens, but it does not allow you to access the native access control lists (ACLs) that are collected by the crawlers that are provided with IBM® Content Analytics with Enterprise Search.

Feedback