IBM Content Analytics with Enterprise Search, Version 3.0.0                  

Creating a crawler plug-in for SharePoint, FileNet P8, and Agent for Windows file system data sources

You can create a Java class to programmatically update the value of metadata and the document content of SharePoint, FileNet P8, and Agent for Windows file system data sources.

When the crawler is started, a CrawlerPlugin object is instantiated with the default constructor and the init method is called one time. When the crawler is stopped, the term method is called and the object is destroyed. Unlike other non-web crawler plug-ins, the plug-in process is not forked for SharePoint, FileNet P8, and Agent for Windows file system data sources. The plug-in always runs in the same process of the crawler.

To create a Java class for use as a crawler plug-in:
  1. Extend com.ibm.ilel.crawler.plugin.CrawlerPlugin and implement the following methods:
    init()
    term()
    updateDocument()

    The CrawlerPlugin class is an abstract class. The init method and the term method are implemented to do nothing. The updateDocument method is an abstract method, so you must implement it.

    For name resolution, use one of the following JAR files:
    • AIX® or Linux: $ES_INSTALL_ROOT/lib/ilel-crawler.jar
    • Windows: %ES_INSTALL_ROOT%\lib\ilel-crawler.jar
  2. Compile the implemented code and create a JAR file for it. Add the ilel-crawler.jar file to the class path when you compile.
  3. In the administration console, follow these steps:
    1. Edit the appropriate collection.
    2. Select the Crawl page and edit the crawler properties for the crawler that will use the custom Java class.
    3. Specify the following items:
      • The fully qualified class name of the implemented Java class, for example, com.ibm.plugins.MyPlugin. When you specify the class name, ensure that you do not specify the file extension, such as .class or .java.
      • The fully qualified class path for the JAR file and the directory in which all files that are required by the Java class are located. Ensure that you include the name of the JAR file in your path declaration, for example, C:\plugins\Plugins.jar. If you need to specify multiple JAR files, ensure that you use the correct separator depending on your platform, as shown in the following examples:
        • AIX or Linux: /home/esadmin/plugins/Plugins.jar:/home/esadmin/plugins/3rdparty.jar
        • Windows: C:\plugins\Plugins.jar;C:\plugins\3rdparty.jar
  4. On the Crawl page, click Monitor. Then, click Stop and Start to restart the session for the crawler that you edited. Click Details and start a full crawl.
If the crawler stops when it is loading the plug-in, view the log file and verify that:
Metadata field definitions: If you want to add a new metadata field in your crawler plug-in, you must create an index field and add the metadata field to the collection by configuring parsing and indexing options in the administration console. Ensure that the name of the metadata field is the same as the name of the index field.

Feedback

Last updated: May 2012

© Copyright IBM Corporation 2004, 2012.
This information center is powered by Eclipse technology. (http://www.eclipse.org)