You can create a Java class
to programmatically update the value of metadata and the document
content of SharePoint, FileNet P8, and Agent for Windows file system
data sources.
When the crawler is started, a CrawlerPlugin object
is instantiated with the default constructor and the init method
is called one time. When the crawler is stopped, the term method
is called and the object is destroyed. Unlike other non-web crawler
plug-ins, the plug-in process is not forked for SharePoint, FileNet
P8, and Agent for Windows file system data sources. The plug-in always
runs in the same process of the crawler.
To create a Java class
for use as a crawler plug-in:
- Extend com.ibm.ilel.crawler.plugin.CrawlerPlugin and
implement the following methods:
init()
term()
updateDocument()
The CrawlerPlugin class
is an abstract class. The init method and the term method
are implemented to do nothing. The updateDocument method
is an abstract method, so you must implement it.
For name resolution,
use one of the following JAR files:
- AIX® or Linux: $ES_INSTALL_ROOT/lib/ilel-crawler.jar
- Windows: %ES_INSTALL_ROOT%\lib\ilel-crawler.jar
- Compile the implemented code and create a JAR file for
it. Add the ilel-crawler.jar file to the class
path when you compile.
- In the administration console, follow these steps:
- Edit the appropriate collection.
- Select the Crawl page and edit
the crawler properties for the crawler that will use the custom Java class.
- Specify the following items:
- The fully qualified class name of the implemented Java class, for example, com.ibm.plugins.MyPlugin.
When you specify the class name, ensure that you do not specify the
file extension, such as .class or .java.
- The fully qualified class path for the JAR file and the directory
in which all files that are required by the Java class are located.
Ensure that you include the name of the JAR file in your path declaration,
for example, C:\plugins\Plugins.jar. If you
need to specify multiple JAR files, ensure that you use the correct
separator depending on your platform, as shown in the following examples:
- AIX or Linux: /home/esadmin/plugins/Plugins.jar:/home/esadmin/plugins/3rdparty.jar
- Windows: C:\plugins\Plugins.jar;C:\plugins\3rdparty.jar
- On the Crawl page, click Monitor.
Then, click Stop and Start to
restart the session for the crawler that you edited. Click Details and
start a full crawl.
If the crawler stops when it is loading the plug-in, view
the log file and verify that:
- The class name and class path that you specified in the crawler
properties page are correct.
- All necessary libraries are specified for the plug-in class path.
- The crawler plug-in does not throw a CrawlerPluginException error.
Metadata field definitions: If you want to add a new metadata field in your crawler
plug-in, you must create an index field and add the metadata field
to the collection by configuring parsing and indexing options in the
administration console. Ensure that the name of the metadata field
is the same as the name of the index field.