This section describes connector debugging techniques and methods that can help resolve
common connector problems.
- Validate user and password
- Often simple connector issues can be attributed to basic account permission errors.
Confirm that you are using an account with the appropriate permissions to crawl your
repository and that you are using the right password for it.
- Check the documentation
- Be sure that you have correctly configured all installed Watson Explorer components and your connector, and that there are no missing steps, or incorrect configuration settings, which might be causing a problem in using the connector.
Tip: Rights functions for user collections are common connector pitfalls.
- Eliminate resource-side errors
- It is a good tactical step to "assume" the issue is with a Watson Explorer Engine connector but, at the
same time, to make the administrator of the resource that you are crawling aware of any
problems crawling that resource. The administrator may be aware of the issue and have a
patch available. It never hurts to check.
- Test multi-threaded versus single-threaded
- To determine if a connector issue is related to multithreading, set the thread count to
1 and then test a new crawl. If an error is encountered,
multithreading is not the source of the problem. Setting the thread count to
1 also has the benefit of making the log easier to read.
- Enable bootstrap logging
- If a connector is not starting at all, enable bootstrap logging to determine where the
failure occurs when the connector is initiated. Bootstrap logging can be enabled in the
Watson Explorer Engine administration tool's seed
configuration screen.
To activate bootstrap logging do the following:
- From the seed configuration page of your site collection, go to . The crawling configuration page displays.
- In the Seeds section, click edit and
expand the Advanced - Logging collapsible menu.
- Check the enable connector bootstrap logging box.
Additionally, enter Log4j settings in the Connector Logging
Configuration text box.
- Click OK.
- Enable connector logging
- If Bootstrap Logging is not available, you can enable a logging
condition. To add a logging condition to the connector seed, do the following:
- In the Watson Explorer Engine administration tool,
select Add A New Condition from the section.
A pop-up window displays with a list of new
conditions.
- Scroll down and select connector logging.
Your goal is to capture a stack trace, which can help pinpoint what might be
causing your connector problems.
- Enable Log4J logging levels
- Log4j enables you to activate different levels of logging without modifying the
application binary thus avoiding a heavy performance cost. Logging behavior can be
controlled by editing a Log4J configuration file.
Key logging levels that can be
applied using the Log4j utility are the following:
- OFF - The OFF level has the highest possible rank
and is intended to turn off logging.
- FATAL - The FATAL level designates very severe
error events that will presumably lead the application to abort.
- ERROR - The ERROR level designates error events
that might still allow the application to continue running.
- WARN - The WARN level designates potentially
harmful situations.
- INFO - The INFO level designates informational
messages that highlight the progress of the application at coarse-grained level.
- DEBUG - The DEBUG Level designates fine-grained
informational events that are most useful to debug an application.
- TRACE - The TRACE Level designates finer-grained
informational events than the DEBUG
- ALL - The ALL has the lowest possible rank and is
intended to turn on all logging.
For more detailed information about Log4j and its configuration, see the online
resources for Log4j.
- Enable Oakland HTTP wire logging
- Enabling logging for wire-level activity is useful for Watson Explorer Engine connectors that use HTTP connections. This is
because the wire log records all data transmitted to and from your server(s) when
executing HTTP requests. The wire log uses the org.apache.http.wire
logging category, which should only be enabled to debug problems. Be aware that wire
logging will produce a large amount of log data.
- Check for missing JAR files
- Be sure that you have all the JAR files needed. If the connector was installed correctly, the necessary JAR files should
have been copied to the right location by default.
- Open JMX port to profile resources
- Java Management Extensions (JMX) supply tools for managing and monitoring applications, system
objects, devices and service oriented networks.
To enable the JMX agent and configure its
operation, you must set certain system properties when you start the Java virtual machine
(JVM). For detailed instruction, consult help resources for using JMX and other JMX
compliant tools.
- Packet trace with Wireshark
- If you are familiar with Wireshark and its advanced packet trace capabilities, it can
be used instead of, or to augment, any packet tracing capabilities in the connector that
you are using. Consult your Wireshark help resources for using the more powerful features
of Wireshark tracing.
- Profile resources
- Use common performance testing methods to determine how fast the connector performs under a particular workload. Profiling
the resources used under various work loads serve to pinpoint bugs relating to scalability, reliability, and resource usage.
- Replicate in development environment
- Replicate the
production environment issue in your development environment and test for the same bug.
- Reproduce without connector
- Another simple test to determine if the connector is the source of the error, is to
attempt to probe the remote resource without it. If you are unable to contact the remote
resource without the connector, there may be a problem with your environment rather than
with the connector. Common tools used to help in this regard include the following:
- Curl is a command line tool for sending and receiving files using URL syntax. Since
Curl is used by many Watson Explorer Engine connectors, it
is a great tool to help pinpoint the source of problems when crawling associated
resource sites.
- Check that your problems are not browser specific. To do so, attempt to display
search results in modern browsers such as Firefox, Internet Explorer, Chrome, and
Safari. Test in the browser versions that are relevant to your users.
- Ping and Traceroute can be used to send packets of information to the remote data
resource for the purpose of retrieving information, which can useful for testing your
internet connection. Consult your operating system documentation on how to locate and
execute the ping and traceroute utilities that are available in your environment.
- Adjust crawler delay
- In , set the Delay value to 1.
This will increase requests on your server to help identify potential problems.
Note: We do not recommend setting the delay to 0. Doing so
can cause excessive resource usage on your crawling server, repository server, or
both.
- Validate web services
- Check that all web services are performing correctly and that all the needed web
services are activated in the server(s) where the data you are crawling is hosted. You can
use a Web test to test Web services. Check online resources for writing specific web tests
based on your environment.