IBM FileNet P8, Version 5.2            

Text extraction worker threads (Content Platform Engine)

The indexing rate is affected by the number of concurrent worker threads that perform text extraction. To achieve an optimal indexing rate, set this number to be as high as system resources permit.

Important: This topic is concerned with the text extraction that Content Platform Engine performs as part of the indexing process. In some cases, IBM® Content Search Services can perform the text extraction step. For more information, see Indexable document types and text extraction.

The following table shows the resources that can be overtaxed if the number of concurrent worker threads for a Content Platform Engine server is too high.

Table 1. System resources that can be overtaxed by text extraction workers
Resource Comment
CPU Each text extraction worker consumes CPU cycles.

Use your operating system tools to monitor CPU consumption by text extraction processes during peak usage. The name of these processes is ibmfndcm. Set the number of concurrent worker threads so that CPU utilization is roughly 70%.

Memory Each text extraction worker consumes some amount of memory. The amount depends on the size of the documents that are being indexed. Larger documents require more memory.

The consumed memory is free physical memory as opposed to the memory that you allocated to the Content Platform Engine JVM. Free physical memory is used because Content Platform Engine delegates text extraction work to an external process. For information about this external process, see Indexable document types and text extraction.

Use your operating system tools to monitor the amount of free physical memory that is available during peak usage. Set the number of concurrent worker threads so that the free physical memory is mostly but not wholly consumed.

Important: If the demand for free physical memory by text extraction workers is greater than the available supply, your system can become swamped and unusable.

If the indexing rate is low even though the number of concurrent worker threads is as high as system resources permit, other tuning might be required. For more information, see Parameters that influence performance.

Administration console properties

In the administration console, the following properties determine the maximum number of concurrent text extraction workers per Content Platform Engine server:

Table 2. Properties that determine the maximum number of text extraction workers
Property Effect on number of workers per server
Maximum worker threads for extracting This property directly determines the maximum number of concurrent text extraction worker per server. For example, if you set the value of this property to 20, the maximum number of concurrent text extraction workers per server is 20.
Maximum worker threads per batch for extracting This property determines the number of concurrent workers for the server in the following way:

Concurrent workers = (value of this property) * (number of concurrent batches)

For example, suppose that the server is concurrently processing three index batches. If the value of this property is 2, the number of concurrent text extraction workers is 6.

Per index area, the number of index batches that the server can concurrently process is limited by the Maximum worker threads for indexing property. Suppose the following values:
  • Maximum worker threads for indexing: 4
  • Number of index areas that are serviced by the server: 3
If the value of this property is 2, the maximum number of concurrent text extraction workers for the server is given by the following calculation:

2 * 4 * 3 = 24 maximum number of concurrent workers

As described, these properties determine the maximum number of text extraction workers either directly or indirectly. The lesser maximum determines the actual maximum for the server. For example, if the directly determined maximum is 20, and the indirectly determined maximum is 24, the actual maximum is 20.

Tip: For information about accessing these properties, see Accessing subsystem configuration properties.


Feedback

Last updated: October 2013
p8ppt313.htm

© Copyright IBM Corporation 2014.
This information center is powered by Eclipse technology. (http://www.eclipse.org)