Monitoring indexing with IBM Content Collector P8 Content Search Services Support

You can improve the indexing performance by monitoring the indexing process, monitoring queues, and adjusting configuration parameters of an IBM Content Search Services server accordingly.

Monitor the following information:

The IBM Content Search Services log files
The Content Engine log files, in particular the p8_server_error.log file
The IBM Content Collector P8 Content Search Services Support log files
The input and output queues for indexing
Specific database tables

IBM Content Search Services log files

Enable IBM Content Search Services server logging as described in the section about configuring server logging in the FileNet® P8 documentation.

The IBM Content Search Services server generates logging information during server startup, indexing, and searching. By default, IBM Content Search Services log files are written to the <ContentSearchServicesInstallDir>\log directory.

IBM FileNet Content Engine log files

Content Engine logs indexing failures to the p8_server_error.log file by default. The Content Engine server will provide trace logging in the p8_server_trace.log file if you enable the CBR subsystem for tracing. For details see the section about Content Engine log files in the FileNet P8 documentation.

To monitor indexing throughput, enable CBR summary tracing. You can then parse the p8_server_trace.log file to find out how many objects were sent to the IBM Content Search Services server for indexing like in the following example:

for filename in $(ls -r /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/FileNet/server1/*trace*)
do
   grep "Dispatching" $filename > /qaTools/results/$HONAME.ce2cbrDispatch.$(basename $filename)
   grep "Deleted" $filename > /qaTools/results/$HONAME.ce2cbrDelUpdates.$(basename $filename)
   grep "NumberOfObjects" $filename > /qaTools/results/$HONAME.ce2cbrIndexing.$(basename $filename)
   grep "Dispatching" $filename >> /qaTools/results/$HONAME.ce2cbrDispatch.txt
   grep "Deleted" $filename >> /qaTools/results/$HONAME.ce2cbrDelUpdates.txt
   grep "NumberOfObjects" $filename >> /qaTools/results/$HONAME.ce2cbrIndexing.txt
done

Import the results into a spreadsheet and create graphs of the data to get an impression of the indexing throughput.

Content Search Services Support log files

To enable logging, tracing, and dump options for debug purposes in Content Search Services Support, you must set the respective options in the Content Search Services Support configuration in the IBM Content Collector Configuration Manager.

The log, trace, and timing options for Content Search Services Support are common to all of the configured source document preprocessors and when enabled apply to all documents that are preprocessed by Content Search Services Support. Dump options can be set specifically for each source document preprocessor.

Logging

Enable logging by selecting a log level of either Information or Trace in the common log settings configuration window. If logging is enabled, Content Search Services Support writes log information to the specified log file directory. The log files use timestamps in their names.

Timing

Enable timing to have additional timing information written to the log files. Timing information will only be written to log files if logging is enabled.

Tracing

Enable tracing by setting the indexing configuration option Tracing. If tracing is enabled, detailed trace information is written to the log files. The log file names containing trace information also use timestamps in the file names.

Also specify the number of trace files that are generated by setting the indexing configuration option TraceFileCount and determine the maximum size of a trace file by setting the indexing configuration option TraceFileLimitInMB.

Dump files

Enable the creation of dump files by selecting one or more options:

Create dump files for all input documents: Content Search Services Support writes dump files of all the input documents in the doc subdirectory of the specified dump directory. A copy of the original document is created in the original (binary) format and encoding; each dump file name contains the document identifier.
Create dump files of the XML file for indexing: Content Search Services Support writes dump files of all generated XML documents in the xml subdirectory of the specified dump directory. A copy of the generated XML documents is created; each dump file name contains the document identifier.
Create dump files for the textual content of attachments: Content Search Services Support writes dump files of all the textual content of the embedded attachments in the txt subdirectory of the specified dump directory. A copy of the textual content of the embedded attachment is created; each dump file names contains the document identifier and the attachment name as returned by the document conversion filter services in IBM Content Search Services.

Also specify the name of the directory to which dump files are written in corresponding subdirectories. The default directory is ./log/dump (The dump directory in the log subdirectory where IBM Content Search Services is installed).

Monitoring queues

Monitoring queues while documents are being indexed can help you identify areas for tuning by adapting specific parameters in the IBM Content Search Services server configuration. You can also use the information to create a graph of the throughput with a finer granularity with regard to what the IBM Content Search Services server actually does.

To monitor queues, add the element <monitorQueues>value</monitorQueues> to the <ContentSearchServicesInstallDir>\config\config.xml file. For the value, specify a non-negative integer that indicates the print frequency (in seconds). Then, restart the IBM Content Search Services server.

Queue status information is written to a CSV file in the <ContentSearchServicesInstallDir>\log directory. The QueueStatus.csv file provides the following columns of information:

Current time
Total number of processed documents
Total size of processed documents (in KB)
Number of documents in the input queue
Input queue size (in bytes)
Number of documents in the output queue
Output queue size (in bytes)
Number of documents that are waiting for preprocessing
Number of documents that are currently being preprocessed
Number of documents that are waiting to be indexed
Number of documents that are currently being indexed

The QueueStatus.csv file size continues to grow when queue monitoring is enabled. You can disable queue monitoring by specifying a value of zero for the monitorQueues parameter or by removing the <monitorQueues>value</monitorQueues> element from the config.xml file. Remember that any changes to the configuration file require a restart of the IBM Content Search Services server.

Important: The QueueStatus.csv file is re-created each time the server is started. If you want to store the information, create a backup copy of the file before you restart the server.

Monitoring database tables

Use database queries to monitor these tables regularly:

IndexRequests
ContentQueue

As a best practice, run queries against these tables in a loop and collect the output in a log file.