WebCenter Content Advanced Configuration and Performance Tuning

This section describes advanced configuration and performance tuning techniques and methods that can help optimize the connector for your environment.

Increasing Crawler Aggressiveness
Another way to improve the performance of a connector is to enable multithreading and then reduce the delay between requests. This will put more load on the server where the resource that you are crawling is stored, but will allow the connector to crawl that resource more quickly.
Note: Enabling multithreading is not supported in all connectors, and will increase Watson™ Explorer Engine and resource server memory consumption. Consider increasing the size of the Java heap to prevent Java OutOfMemory errors.

To optimize for speed set the value of the Delay setting to 0 in the Global Settings > Crawling aggressiveness > Configuration > Crawling tab for the associated search collection. Setting this value to 0 will eliminate any delay between successive calls to the resource server, and will also cause the connector to create as many threads as it can in order to submit and service those requests.

Note: Setting the Delay option to 0 can cause additional errors to be introduced because the resource server or Watson Explorer Engine may not be able to keep up with incoming requests. However, it is still useful to try this setting when tuning a connector for performance, because this setting will provide the theoretical maximum performance for the crawl.

To tune for more balanced speed, adjust the value of the Delay setting to a value greater than 0 and less than the default value of 100 (This value is expressed in milliseconds). You may also want to adjust the value of the Concurrent requests to the same host setting in the Crawling aggressiveness section to a greater value than the default value of 1. This setting controls how many threads the connector creates when starting.

Note: Some of these settings are replicated in the configuration settings for certain connectors, both to highlight their relevance and to enable setting connector-specific Delay values. Settings that are replicated in the seed for a connector take precedence over the crawler settings, but only apply to URLs that are destined for that connector. This enables the use of different settings in multiple connectors that contribute to a single search collection.