Connector Performance Tuning

Increasing Crawler Aggressiveness
Another way to improve the performance of a connector is to enable multithreading and then reduce the delay between requests. This will put more load on the server where the resource that you are crawling is stored, but will allow the connector to crawl that resource more quickly.
Note: Enabling multithreading is not supported in all connectors, and will increase Watson™ Explorer Engine and resource server memory consumption. Consider increasing the size of the Java heap to prevent Java OutOfMemory errors.

To optimize for speed set the value of the Delay setting to 0 in the Global Settings > Crawling aggressiveness > Configuration > Crawling tab for the associated search collection. Setting this value to 0 will eliminate any delay between successive calls to the resource server, and will also cause the connector to create as many threads as it can in order to submit and service those requests.

Note: Setting the Delay option to 0 can cause additional errors to be introduced because the resource server or Watson Explorer Engine may not be able to keep up with incoming requests. However, it is still useful to try this setting when tuning a connector for performance, because this setting will provide the theoretical maximum performance for the crawl.

To tune for more balanced speed, adjust the value of the Delay setting to a value greater than 0 and less than the default value of 100 (This value is expressed in milliseconds). You may also want to adjust the value of the Concurrent requests to the same host setting in the Crawling aggressiveness section to a greater value than the default value of 1. This setting controls how many threads the connector creates when starting.

Note: Some of these settings are replicated in the configuration settings for certain connectors, both to highlight their relevance and to enable setting connector-specific Delay values. Settings that are replicated in the seed for a connector take precedence over the crawler settings, but only apply to URLs that are destined for that connector. This enables the use of different settings in multiple connectors that contribute to a single search collection.
Threading
The number of threads the connector uses for crawling can be adjusted to balance memory allocation and speed. There are many variables that determine how aggressively you can connect to the associate resource server(s). You will develop a comfort level for your particular environment by adjusting the number of threads that the connector uses and monitoring its performance. Typically, using between 3 and 5 threads is sufficient for most environments. The default value for threading is 1
Minimizing Error Level Logging
Another setting that can have a performance impact is error level logging. By default, error level logging is turned off. When turned on, be aware that the accumulation of large log files can cause connector performance to suffer. In most cases, you should only enable debug mode and trace level logging when doing error level logging. It is not recommended to simply let log files build over time in a production environment.
Tip: For more detailed information about advanced logging configuration settings, see the online resources for Log4j.
Analyzing applications with JConsole
JConsole is a graphical, JMX-compliant tool that connects to a running JVM and can therefore be used to analyze information about a connector. For more detailed instructions on using JConsole, see the online JConsole documentation.
Analyzing applications with JMX
The Java Management Extensions (JMX) supply tools for managing and monitoring applications, system objects, devices and service oriented networks. To use JMX, you must first enable the management port in a connector seed. Once that port is enabled, you can use a variety of JMX-compliant tools to analyze exactly what the connector is doing when in operation.

For example, Java virtual machine (JVM), which has built-in instrumentation, can enable you to monitor and manage the performance of a connector using JMX. To enable the JMX agent and configure its operation, you must set certain system properties when you start the Java virtual machine (JVM). For detailed instruction, see the online resources for using JMX.

Profiling applications with Visual VM
VisualVM is a another visual tool that integrates several command line Java Development Kit (JDK) tools and offers lightweight profiling capabilities such as monitoring the memory use of a connector over time. For more detail on using VisualVM, consult the online Visual VM resources.
Reducing Memory Footprint
If you want to reduce the memory footprint of a connector you may opt to turn off caching. However, caching can have a dramatic impact on speed. Therefore, instead of disabling cache, you can adjust values in the advanced cache settings portion of the connector seed. Conversely, if you have a lot of memory, you can opt to increase the cache settings and heap size to prevent out of memory errors. Another setting to consider is to flush the cache once security updates to the resource that you are crawling have been indexed, which will help improve the overall performance of the connector.