OneDrive for Business Connector Debug Reference

This section describes connector debugging techniques and methods that can help resolve common connector problems.

Validate user and password
Often simple connector issues can be attributed to basic account permission errors. Confirm that you are using an account with the appropriate permissions to crawl your repository and that you are using the right password for it.
Check the documentation
Be sure that you have correctly configured all installed Watson™ Explorer components and your connector, and that there are no missing steps, or incorrect configuration settings, which might be causing a problem in using the connector.
Tip: Rights functions for user collections are common connector pitfalls.
Eliminate resource-side errors
It is a good tactical step to "assume" the issue is with a Watson Explorer Engine connector but, at the same time, to make the administrator of the resource that you are crawling aware of any problems crawling that resource. The administrator may be aware of the issue and have a patch available. It never hurts to check.
Test multi-threaded versus single-threaded
To determine if a connector issue is related to multithreading, set the thread count to 1 and then test a new crawl. If an error is encountered, multithreading is not the source of the problem. Setting the thread count to 1 also has the benefit of making the log easier to read.
Enable bootstrap logging
If a connector is not starting at all, enable bootstrap logging to determine where the failure occurs when the connector is initiated. Bootstrap logging can be enabled in the Watson Explorer Engine administration tool's seed configuration screen.

To activate bootstrap logging do the following:

  1. From the seed configuration page of your site collection, go to Configuration > Crawling. The crawling configuration page displays.
  2. In the Seeds section, click edit and expand the Advanced - Logging collapsible menu.
  3. Check the enable connector bootstrap logging box. Additionally, enter Log4j settings in the Connector Logging Configuration text box.
  4. Click OK.
Enable connector logging
If Bootstrap Logging is not available, you can enable a logging condition. To add a logging condition to the connector seed, do the following:
  1. In the Watson Explorer Engine administration tool, select Add A New Condition from the Configuration > Crawling > Conditional Settings section.

    A pop-up window displays with a list of new conditions.

  2. Scroll down and select connector logging.

Your goal is to capture a stack trace, which can help pinpoint what might be causing your connector problems.

Enable Log4J logging levels
Log4j enables you to activate different levels of logging without modifying the application binary thus avoiding a heavy performance cost. Logging behavior can be controlled by editing a Log4J configuration file.

Key logging levels that can be applied using the Log4j utility are the following:

  • OFF - The OFF level has the highest possible rank and is intended to turn off logging.
  • FATAL - The FATAL level designates very severe error events that will presumably lead the application to abort.
  • ERROR - The ERROR level designates error events that might still allow the application to continue running.
  • WARN - The WARN level designates potentially harmful situations.
  • INFO - The INFO level designates informational messages that highlight the progress of the application at coarse-grained level.
  • DEBUG - The DEBUG Level designates fine-grained informational events that are most useful to debug an application.
  • TRACE - The TRACE Level designates finer-grained informational events than the DEBUG
  • ALL - The ALL has the lowest possible rank and is intended to turn on all logging.

For more detailed information about Log4j and its configuration, see the online resources for Log4j.

Enable Oakland HTTP wire logging
Enabling logging for wire-level activity is useful for Watson Explorer Engine connectors that use HTTP connections. This is because the wire log records all data transmitted to and from your server(s) when executing HTTP requests. The wire log uses the org.apache.http.wire logging category, which should only be enabled to debug problems. Be aware that wire logging will produce a large amount of log data.
Check for missing JAR files
Be sure that you have all the JAR files needed. If the connector was installed correctly, the necessary JAR files should have been copied to the right location by default.
Open JMX port to profile resources
Java Management Extensions (JMX) supply tools for managing and monitoring applications, system objects, devices and service oriented networks.

To enable the JMX agent and configure its operation, you must set certain system properties when you start the Java virtual machine (JVM). For detailed instruction, consult help resources for using JMX and other JMX compliant tools.

Packet trace with Wireshark
If you are familiar with Wireshark and its advanced packet trace capabilities, it can be used instead of, or to augment, any packet tracing capabilities in the connector that you are using. Consult your Wireshark help resources for using the more powerful features of Wireshark tracing.
Profile resources
Use common performance testing methods to determine how fast the connector performs under a particular workload. Profiling the resources used under various work loads serve to pinpoint bugs relating to scalability, reliability, and resource usage.
Replicate in development environment
Replicate the production environment issue in your development environment and test for the same bug.
Reproduce without connector
Another simple test to determine if the connector is the source of the error, is to attempt to probe the remote resource without it. If you are unable to contact the remote resource without the connector, there may be a problem with your environment rather than with the connector. Common tools used to help in this regard include the following:
  • Curl is a command line tool for sending and receiving files using URL syntax. Since Curl is used by many Watson Explorer Engine connectors, it is a great tool to help pinpoint the source of problems when crawling associated resource sites.
  • Check that your problems are not browser specific. To do so, attempt to display search results in modern browsers such as Firefox, Internet Explorer, Chrome, and Safari. Test in the browser versions that are relevant to your users.
  • Ping and Traceroute can be used to send packets of information to the remote data resource for the purpose of retrieving information, which can useful for testing your internet connection. Consult your operating system documentation on how to locate and execute the ping and traceroute utilities that are available in your environment.
Adjust crawler delay
In Global Settings > Crawler Aggressiveness, set the Delay value to 1. This will increase requests on your server to help identify potential problems.
Note: We do not recommend setting the delay to 0. Doing so can cause excessive resource usage on your crawling server, repository server, or both.
Validate web services
Check that all web services are performing correctly and that all the needed web services are activated in the server(s) where the data you are crawling is hosted. You can use a Web test to test Web services. Check online resources for writing specific web tests based on your environment.