IBM Content Analytics with Enterprise Search, Version 3.0.0

Extending the archive plug-in to view extracted files

You can create a crawler plug-in that enables users to view documents that are extracted from archive files, such as .zip, .tar, or .rar files.

IBM® Content Analytics with Enterprise Search provides Java APIs for implementing a crawler plug-in that extracts archive entries from archive files that are crawled by data source crawlers. The fetch capabilities, however, do not allow users to view the extracted files. You can extend the archive plug-in so that users can fetch and view documents that are extracted from archive files. To implement the plug-in, you use the same implementation that you use for other data source crawler plug-ins.

You cannot use this plug-in with the Agent for Windows file systems, FileNet P8, and SharePoint crawlers.

To register the plug-in, update the customcommunication.properties file and add the following properties:

es.ext.dirs.type=classpath
archive.plugin.type=classname;.extension

where:

type: Specifies the identifier of the archive document type, such as .rar or .lzh. You can also choose your own type.
classpath: Specifies the list of paths for the class path that is required to run your archive plug-in. Separate the paths by a semicolon (;) on Windows or a colon (:) on AIX® or Linux.
classname: Specifies the class name of your archive plug-in.
extension: Specifies the file extension. Your archive plug-in is invoked for the files that match this extension.

The following example shows a sample customcommunication.properties file that registers an archive plug-in named RarFile to view documents extracted from .rar files:

# extension files and directories
es.ext.dirs=C:\\Program Files\\IBM\\es\\lib\\es.repo.jar;C:\\Program 
Files\\IBM\\es
\\lib\\rdsutil.jar;C:\\Program Files\\IBM\\es\\lib\\ESSearchServer.jar;C:\\Program
Files\\IBM\\es
\\lib\\trevi.tokenizer.jar;C:\\Program Files\\IBM\\es\\lib\\es.workmgr.jar;
C:\\Program Files\\IBM\\es\\lib\\dscrawler.jar;

es.ext.dirs.rar=C:\\rarplugin;C:\\rarplugin\rarplugin.jar;
archive.plugin.rar=RarFile;.rar

Feedback