Logging and Examining Enqueue/Delete Status

Watson Explorer Engine is designed to guarantee that all content that is enqueued will be processed by the Watson Explorer Engine Crawler. Simply enqueueing a URL does not guarantee successful indexing, because problems can still arise in retrieving and converting the data that is associated with that enqueue request, but a record of the final status of processing any enqueue (or delete) operation will always be available in a Watson Explorer Engine log known as the audit log.

Enqueue operations are only written to the audit log once the enqueued data has been added to the index (and is therefore searchable), regardless of the synchronization mode that was used on the enqueue operation.

The type of data that is recorded in the audit log for any enqueue or delete operation is determined by the value of the audit-log crawler option. (In the Watson Explorer Engine administration tool, this option is located on the Configuration > Crawling tab of a search collection, in the Advanced section.) Possible values for the audit-log option are the following:

In addition to being able to configure the types of entries that are written to the audit log, you can also control the level of detail associated with those entries. The level of detail for those entries is determined by the value of the audit-log-detail option on the crawler for a given collection. Possible values for the audit-log-detail option are the following:

For example, the following audit log entry does not provide any detailed information about the crawl-url or index-atomic operation that was specified in the enqueue request. This is referred to as a simple audit log entry:

<audit-log-entry enqueue-id="1" originator="MyApp" status="unsuccessful"/>

The following audit log entry provides detailed information about an index-atomic operation that was specified in the enqueue request. This is referred to as a detailed audit log entry:

<audit-log-entry enqueue-id="1" originator="MyApp" status="unsuccessful">
  <index-atomic state="error" ... >
    <crawl-url url="http://somewhere.com/file1" state="error" ... />
    <crawl-url url="http://somewhere.com/file2" state="success" ... />
  </index-atomic>
</audit-log-entry>

Combining different values for the audit-log and audit-log-detail options provides a great deal of control over the type and amount of data that is preserved in the audit log for a collection. The following table summarizes these combinations to simplify locating the combination that best satisfies your requirements.

Table 1. Audit Log Settings and Associated Detail Levels
audit-log Setting audit-log-detail - Full audit-log-detail - Medium audit-log-detail - Minimal
all Detailed information about every enqueue/delete Detailed information for unsuccessful enqueues/deletes. Simple information for successful enqueues/deletes. Simple information for all enqueues/deletes, regardless of success or failure
unsuccessful Detailed information for unsuccessful enqueues/deletes. No information for successful enqueues/deletes. Detailed information for unsuccessful enqueues/deletes. No information for successful enqueues/deletes. Simple information for unsuccessful enqueues/deletes. No information for successful enqueues/deletes.
none No information for any enqueue/delete No information for any enqueue/delete No information for any enqueue/delete

In addition to the level of detail associated with each audit log entry for local enqueue and delete operations, Watson Explorer Engine provides an audit log configuration option that is designed for use with distributed indexing. Certain values of the audit-log-when setting cause audit log entries to track the replication status of the operations associated with those entries, adding information to the audit log when an operation is replicated to the other server(s) that share some or all of a distributed index. Possible values for the audit-log-when option are the following:

See the next section for information about configuring any of the options that were discussed in this section.