Managing Log Data for Collections

The Watson™ Explorer Engine platform crawler and indexer both use log data to provide per-collection status and history information for enqueued URLs. Each Watson Explorer Engine search collection directory also contains a central log file, log.sqlt. This file is an SQLite database that stores a variety of administrative and internal information about the crawling process for that search collection. See the section of the Watson Explorer Engine API Developer's Guide entitled Per-Collection Log Data for more detailed information about the contents of this file. The data contained in this log file is used internally by the crawler and indexer, and should not be modified.

Note: The size of an existing log.sqlt file can be reduced manually, but only with guidance from customer support. Depending on the state of the associated search collection, this may involve deleting rows, dropping and recreating tables, and vacuuming the database. This procedure can not be tested in every possible Watson Explorer Engine deployment, and should therefore only be done as a last resort because it can result in a corrupted or unusable index.

Watson Explorer Engine platform applications also support a log file that enables applications to identify the status of an enqueue or delete operation for any URL. Watson Explorer Engine is designed to guarantee that all content that is enqueued will be processed by the Watson Explorer Engine Crawler. Simply enqueueing a URL does not guarantee successful indexing, because problems can still arise in retrieving and converting the data that is associated with that enqueue request, but a record of the final status of processing any enqueue (or delete) operation will always be available in a Watson Explorer Engine log known as the audit log. The type of data that is recorded in the audit log for any enqueue or delete operation is determined by the value of the audit-log option, which is located on the Configuration > Crawling tab of a search collection, in the Advanced section.

Unlike the crawler or indexer log files, the content in a collection's audit log can be deleted by calling the search-collection-audit-log-purge function, as explained in the Watson Explorer Engine API Developer's Guide.

See Using the Audit Log for general information about configuring and using the audit log. See the section of the Watson Explorer Engine API Developer's Guide entitled Logging and Examining Enqueue/Delete Status in Watson Explorer Engine for detailed information about API functions that are associated with the Watson Explorer Engine audit log.