Monitoring indexing with IBM Content Collector P8 Content Search Services Support
You can improve the indexing performance by monitoring the indexing process, monitoring queues, and adjusting configuration parameters of an IBM Content Search Services server accordingly.
- The IBM Content Search Services log files
- The Content Engine log files, in particular the p8_server_error.log file
- The IBM Content Collector P8 Content Search Services Support log files
- The input and output queues for indexing
- Specific database tables
IBM Content Search Services log files
Enable IBM Content Search Services server logging as described in the section about configuring server logging in the FileNet® P8 documentation.
The IBM Content Search Services server generates logging information during server startup, indexing, and searching. By default, IBM Content Search Services log files are written to the <ContentSearchServicesInstallDir>\log directory.
IBM FileNet Content Engine log files
Content Engine logs indexing failures to the p8_server_error.log file by default. The Content Engine server will provide trace logging in the p8_server_trace.log file if you enable the CBR subsystem for tracing. For details see the section about Content Engine log files in the FileNet P8 documentation.
for filename in $(ls -r /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/FileNet/server1/*trace*)
do
grep "Dispatching" $filename > /qaTools/results/$HONAME.ce2cbrDispatch.$(basename $filename)
grep "Deleted" $filename > /qaTools/results/$HONAME.ce2cbrDelUpdates.$(basename $filename)
grep "NumberOfObjects" $filename > /qaTools/results/$HONAME.ce2cbrIndexing.$(basename $filename)
grep "Dispatching" $filename >> /qaTools/results/$HONAME.ce2cbrDispatch.txt
grep "Deleted" $filename >> /qaTools/results/$HONAME.ce2cbrDelUpdates.txt
grep "NumberOfObjects" $filename >> /qaTools/results/$HONAME.ce2cbrIndexing.txt
done
Import the results into a spreadsheet and create graphs of the data to get an impression of the indexing throughput.
Content Search Services Support log files
To enable logging, tracing, and dump options for debug purposes in Content Search Services Support, you must set the respective options in the Content Search Services Support configuration in the IBM Content Collector Configuration Manager.
The log, trace, and timing options for Content Search Services Support are common to all of the configured source document preprocessors and when enabled apply to all documents that are preprocessed by Content Search Services Support. Dump options can be set specifically for each source document preprocessor.
- Logging
- Enable logging by selecting a log level of either Information or Trace in the common log settings configuration window. If logging is enabled, Content Search Services Support writes log information to the specified log file directory. The log files use timestamps in their names.
- Timing
- Enable timing to have additional timing information written to the log files. Timing information will only be written to log files if logging is enabled.
- Tracing
- Enable tracing by setting the indexing configuration option Tracing.
If tracing is enabled, detailed trace information is written to the
log files. The log file names containing trace information also use
timestamps in the file names.
Also specify the number of trace files that are generated by setting the indexing configuration option TraceFileCount and determine the maximum size of a trace file by setting the indexing configuration option TraceFileLimitInMB.
- Dump files
- Enable the creation of dump files by selecting one or more options:
- Create dump files for all input documents
- Content Search Services Support writes dump files of all the input documents in the doc subdirectory of the specified dump directory. A copy of the original document is created in the original (binary) format and encoding; each dump file name contains the document identifier.
- Create dump files of the XML file for indexing
- Content Search Services Support writes dump files of all generated XML documents in the xml subdirectory of the specified dump directory. A copy of the generated XML documents is created; each dump file name contains the document identifier.
- Create dump files for the textual content of attachments
- Content Search Services Support writes dump files of all the textual content of the embedded attachments in the txt subdirectory of the specified dump directory. A copy of the textual content of the embedded attachment is created; each dump file names contains the document identifier and the attachment name as returned by the document conversion filter services in IBM Content Search Services.
Also specify the name of the directory to which dump files are written in corresponding subdirectories. The default directory is ./log/dump (The dump directory in the log subdirectory where IBM Content Search Services is installed).
Monitoring queues
Monitoring queues while documents are being indexed can help you identify areas for tuning by adapting specific parameters in the IBM Content Search Services server configuration. You can also use the information to create a graph of the throughput with a finer granularity with regard to what the IBM Content Search Services server actually does.
To monitor queues, add the element <monitorQueues>value</monitorQueues>
to
the <ContentSearchServicesInstallDir>\config\config.xml file.
For the value, specify a non-negative integer that
indicates the print frequency (in seconds). Then, restart the IBM Content Search Services server.
- Current time
- Total number of processed documents
- Total size of processed documents (in KB)
- Number of documents in the input queue
- Input queue size (in bytes)
- Number of documents in the output queue
- Output queue size (in bytes)
- Number of documents that are waiting for preprocessing
- Number of documents that are currently being preprocessed
- Number of documents that are waiting to be indexed
- Number of documents that are currently being indexed
The QueueStatus.csv file size continues
to grow when queue monitoring is enabled. You can disable queue monitoring
by specifying a value of zero for the monitorQueues parameter
or by removing the <monitorQueues>value</monitorQueues>
element
from the config.xml file. Remember that any changes
to the configuration file require a restart of the IBM Content Search Services server.
Monitoring database tables
- IndexRequests
- ContentQueue
As a best practice, run queries against these tables in a loop and collect the output in a log file.