You can create a Java class
to programmatically update the value of security tokens, metadata,
and the document content of data sources other than web.
When the crawler is started, the plug-in process is forked.
An AbstractCrawlerPlugin object is instantiated
with the default constructor and the init, isMetadataUsed,
and isContentUsed methods are called once. When
the crawler is stopped, the term method is called
and the object is destroyed.
To create a Java class for use as
a crawler plug-in with content-related functions:
- Extend com.ibm.es.crawler.plugin.AbstractCrawlerPlugin and
implement the following methods:
init()
isMetadataUsed()
isContentUsed()
term()
updateDocument()
The AbstractCrawlerPlugin class
is an abstract class. The init method and the term method
are implemented to do nothing. The isMetadataUsed method
and isContentUsed method are implemented to return
false by default. The updateDocument method is
an abstract method, so you must implement it.
For name resolution,
use the ES_INSTALL_ROOT/lib/dscrawler.jar file.
- Compile the implemented code and make a JAR file
for it. Add the ES_INSTALL_ROOT/lib/dscrawler.jar file
to the class path when you compile.
- In the administration console, follow these steps:
- Edit the appropriate collection.
- Select the Crawl page and edit
the crawler properties for the crawler that will use the custom Java class.
- Specify the following items:
- The fully qualified class name of the implemented Java class, for example, com.ibm.plugins.MyPlugin.
When you specify the class name, ensure that you do not specify the
file extension, such as .class or .java.
- The fully qualified class path for the JAR file and the directory
in which all files that are required by the Java class are located.
Ensure that you include the name of the JAR file in your path declaration,
for example, C:\plugins\Plugins.jar. If you
need to specify multiple JAR files, ensure that you use the correct
separator depending on your platform, as shown in the following examples:
- AIX® or Linux: /home/esadmin/plugins/Plugins.jar:/home/esadmin/plugins/3rdparty.jar
- Windows: C:\plugins\Plugins.jar;C:\plugins\3rdparty.jar
- On the Crawl page, click Monitor.
Then, click Stop and Start to
restart the session for the crawler that you edited. Click Details and
start a full crawl.
If the crawler stops when it is loading the plug-in, view
the log file and verify that:
- The class name and class path that you specified in the crawler
properties page are correct.
- All necessary libraries are specified for the plug-in class path.
- The crawler plug-in does not throw a CrawlerPluginException error.
Tip: If a crawler gets NullPointerException
after it is configured to use a custom crawler plug-in, override com.ibm.es.crawler.plugin.AbstractCrawlerPlugin#isMetadataUsed() to
return true instead of false.
Metadata field definitions: If you
want to add a new metadata field in your crawler plug-in, you must
create an index field and add the metadata field to the collection
by configuring parsing and indexing options in the administration
console. Ensure that the name of the metadata field is the same as
the name of the index field.
The following methods in the FieldMetadata
class are deprecated. These field characteristics are overwritten
by field definitions in the parser configuration:
public void setSearchable(boolean b)
public void setFieldSearchable(boolean b)
public void setParametricSearchable(boolean b)
public void setAsMetadata(boolean b)
public void setResolveConflict(String string)
public void setContent(boolean b)
public void setExactMatch(boolean b)
public void setSortable(boolean b)