Using the Audit Log

Watson™ Explorer Engine is designed to guarantee that all content that is enqueued will be processed by the Watson Explorer Engine crawler. However, simply enqueueing a URL does not guarantee successful indexing of the data that the URL points to, because problems can still arise when retrieving and converting the data that is associated with that enqueue request. To provide a record of the final status of processing any enqueue (or delete) operation, Watson Explorer Engine supports a log known as the audit log, which must be enabled on a per-collection basis and can only be accessed and managed via the Watson Explorer Engine SOAP/REST API. The audit log provides historical information about enqueue or delete operations, while the normal crawler logs only provide information about the state of the current operation(s).

The type of data that is recorded in the audit log for any enqueue or delete operation in the associated collection is determined by the value of the audit-log option, which you can set in the Advanced section of the Configuration > Crawling tab for a search collection, or as a crawl-option if you are writing an application using the Watson Explorer Engine platform SOAP/REST API. Possible values for the audit-log option are all, unsuccessful, and none (the default value). Suggestions for when to use different values are the following:

  • If you want to check the results of all enqueues or URLs crawled, you should use the all value for the audit-log option so that you can determine when or if data was indexed.

    This value is particularly useful when using Light Crawler mode and enqueue synchronization modes other than indexed or indexed-no-sync. There are two reasons for this:

    • Light Crawler mode disables normal crawler logging, which means that it is not possible to check the status of an enqueue by browsing the normal crawler logs.
    • Enqueues that use synchronization modes other than indexed and indexed-no-sync will not contain status information in the synchronous reply, which means that the user will not know the time at which this data is indexed.
  • If you are only interested in being able to identify unsuccessfully processed enqueues so that you can resubmit them, you should use the unsuccessful value for the audit-log option.
  • If you are primarily interested in minimizing disk space consumption and/or maximizing raw application performance, you should use the none value for the audit-log option. Using this value (the default value) disables the audit log.

The amount of data that is recorded in the audit log for any enqueue or delete operation for a given collection is determined by the value of the audit-log-detail option, which you can set in the Advanced section of the Configuration > Crawling tab for a search collection. Possible values for the audit-log-detail option are full (the default value), medium, and minimal. Which of these values you specify is determined by the amount of information that you want to be able to see in the log. More detailed logging consumes more disk space:

  • If you want to see full details and statistics for all audit log entries, you should use the full value.
  • If you only want to see full details and statistics for audit log entries corresponding to errors or warnings, you should use the medium value.
  • If you only want to see whether an audit log entry corresponds to a success or a failure, you should use the minimal value.

The point at which entries are written to the audit log is determined by the value of the audit-log-when option, which you can set via the Audit log recorded option in the Advanced section of the Configuration > Crawling tab for a search collection, or as a crawl-option if you are writing an application using the Watson Explorer Engine platform SOAP/REST API. Possible values for the audit-log-when option are the following:

  • finished (the default value) - the audit log entry is written when the crawler finishes processing a URL
  • replicated - the audit log entry is written when any index updates and metadata updates associated with a URL are replicated to all subscribed clients
  • finished-or-replicated - an audit log entry is written when the crawler finishes processing a URL, and another is written when any index updates associated with that URL are replicated to all subscribed clients

Suggestions for when to use different values are the following:

  • If you are only interested in when the associated URL for an audit log entry has been indexed on the server on which you enqueued the data, select finished.
  • If you are only interested in when index data and associated metadata for a URL was replicated successfully to all clients of the server on which you enqueued the data, select replicated.
  • If your application logic requires that you know when a URL has been indexed on the server and replicated to all subscribed clients, select finished-or-replicated.

See the section of the Watson Explorer Engine SOAP/REST API Developer's Guide entitled Logging and Examining Enqueue/Delete Status in Watson Explorer Engine for additional information about the audit log, including a table that provides detailed information about different audit log settings and the level of detail provided for each logged event.

Tip: Whenever you are enqueueing to a single collection from multiple applications, make sure that you specify a different value for the originator attribute in each application. When querying or purging the audit log, unique originator values will enable you to uniquely identify the audit log entries for the items that each application enqueued.