Text extractors administration

IBM® Content Classification analyzes content items such as documents and emails by using predefined text extractors for different types of data formats. To improve performance, you can configure additional text-extraction servers.

When you install Content Classification, you must configure at least one text-extraction server per computer. In some cases, you might want to configure additional text-extraction servers. For example, if you run multiple knowledge base and decision plan instances on a computer and all instances use the same text-extraction server, the text-extraction process can become a bottleneck. Configuring additional text-extraction servers can improve performance by helping ensure that there are sufficient internal resources for extracting content.

However, adding too high a number of text-extraction servers can negatively affect the overall performance of your classification system. If the computer does not have sufficient system resources, adding additional text-extraction servers might reduce the resources available for the knowledge base and decision plan instances.

In general, if you run only one knowledge base or decision plan instances on a computer, or if your content to classify includes few emails and PDF documents, you do not need to configure additional text-extraction servers.