Crawler plug-ins are Java™ application programming interfaces (APIs) that you can use to change content or metadata in crawled documents.
You can apply business and security rules to enforce document-level security and add, update, or delete the crawled metadata and document content that is associated with documents in an index. The data source crawler plug-in APIs cannot be used with the web crawler.
You can also create a plug-in that extracts entries from archive files. The extracted files can then be parsed individually and included in collections.
You can add fields to the HTTP request header that is sent to the origin server to request a document. You can also view the content, security tokens, and metadata of a document after the document is downloaded. You can add to, delete from, or replace any of these fields, or stop the document from being parsed.
Web crawler plug-ins support two kinds of filtering: prefetch and postparse. You can specify only a single Java class to be the web crawler plug-in, but because the prefetch and postparse plug-in behaviors are defined in two separate Java interfaces and because Java classes can implement any number of interfaces, the web crawler plug-in class can implement either or both behaviors.
For detailed information about each plug-in API, see the Javadoc documentation in the following directory: ES_INSTALL_ROOT/docs/api/.