Index specific parameters for Db2 Text Search index updates

You can configure the following collection-specific parameters to improve performance:
  • MaxMergeDocs
  • MaxMergeMB
  • MergeFactor
  • BufferSize

You can modify indexing parameters for a particular collection by editing the ECMTS_HOME\config\collections\collection_name\collection.xml file. To modify the default settings for future collections that are created, set the values of these parameters in the ECMTS_HOME\config\defaults\collection.xml file.

  • The MaxMergeDocs parameter defines the largest segment (measured by the number of documents) that can be merged with other segments in the index. There is a trade-off between overall indexing throughput and segment merge time.

    If you specify a low value for the MaxMergeDocs parameter (for example, 100,000 documents), your segments will be limited in size. In this case, segment merges are quicker and indexing flows more smoothly without time-outs. However, if your content is very large, there will be numerous segments and a degradation in indexing throughput over time.

    If you specify a high value for the MaxMergeDocs parameter (for example, 100,000,000 or 500,000,000 documents), you get fewer segments (until the index becomes very large) and the overall indexing throughput is better. However, segment merges take more time and you might encounter time-outs during indexing.

    Typically the value of MaxMergeDocs should be higher for collections of small documents and lower for collections of larger documents.

  • The MaxMergeMB parameter defines the largest segment, measured by the physical size of the file, that can be merged with other segments in the index.

    There is a trade-off between overall indexing throughput and segment merge time. If you specify a low value for the MaxMergeMB parameter, for example 500 MB, your segments will be limited in size. In this case, segment merges are quicker and indexing flows more smoothly. However, if your content is very large, there will be numerous segments and a degradation in indexing throughput over time, as well as degradation in search performance.

    If you specify a high value for the MaxMergeMB parameter, for example 50,000 MB or 100,000 MB, you get fewer segments (until the index becomes very large) and the overall indexing throughput is better. However, segment merges take more time and you might encounter time-outs during indexing.

  • The MergeFactor parameter defines the number of segments that are merged at a time and also controls the total number of segments that can accumulate in the index. There is a trade-off between frequent, small merges (for example, two at a time) and less frequent, large merges (for example, 10 at a time). You can specify a smaller value for the MergeFactor parameter to avoid time-outs. Modifying the merge factor does not typically impact performance.

  • The BufferSize parameter specifies the amount of RAM that can be used for buffering added documents before the documents are flushed as a new segment. There is a trade-off between frequent, small flushes to disk and less frequent, large flushes to disk. In some cases you can improve performance by increasing the value of the BufferSize parameter. For example, when you index a single collection of small documents, increasing the buffer size will improve performance, especially for the first 100,000 documents in the index.