Invoking the Light Crawler

The introduction to the Light Crawler in Infrastructure for Watson Explorer Engine API Applications highlighted the fact that the Light Crawler is essentially a special mode of the standard Watson Explorer Engine Crawler. When in Light Crawler mode, the Crawler Service tracks less data about URLs that have been enqueued and are being processed, and enables URL status information to be deleted from the Watson Explorer Engine logs. Capabilities such as these make the Light Crawler attractive for use in embedded or other OEM Watson Explorer Engine API applications.

Light Crawler mode is enabled by default in collections that are based on the default-push or default-broker-push collections. The next two sections explain how you can activate Light Crawler mode for collections that you create in the Watson Explorer Engine administration tool or that you create programmatically.

warning: After creating a collection that uses Light Crawler mode and indexing the data that it is associated with, you should not remove the Light Crawler options from that collection and continue to use it with the standard Watson Explorer Engine crawler. This scenario is unsupported.