You can create a crawler plug-in that enables users to view documents that are extracted from archive files, such as .zip, .tar, or .rar files.
IBM® Content Analytics with Enterprise Search provides Java APIs for implementing a crawler plug-in that extracts archive entries from archive files that are crawled by data source crawlers. The fetch capabilities, however, do not allow users to view the extracted files. You can extend the archive plug-in so that users can fetch and view documents that are extracted from archive files. To implement the plug-in, you use the same implementation that you use for other data source crawler plug-ins.
You cannot use this plug-in with the Agent for Windows file systems, FileNet P8, and SharePoint crawlers.
es.ext.dirs.type=classpath
archive.plugin.type=classname;.extension
where:# extension files and directories
es.ext.dirs=C:\\Program Files\\IBM\\es\\lib\\es.repo.jar;C:\\Program
Files\\IBM\\es
\\lib\\rdsutil.jar;C:\\Program Files\\IBM\\es\\lib\\ESSearchServer.jar;C:\\Program
Files\\IBM\\es
\\lib\\trevi.tokenizer.jar;C:\\Program Files\\IBM\\es\\lib\\es.workmgr.jar;
C:\\Program Files\\IBM\\es\\lib\\dscrawler.jar;
es.ext.dirs.rar=C:\\rarplugin;C:\\rarplugin\rarplugin.jar;
archive.plugin.rar=RarFile;.rar