Managing text indexing preprocessor definitions

Create and manage text indexing preprocessor definitions to control which documents use custom preprocessing instead of the normal text extraction process.

About this task

A text indexing preprocessor definition associates a text indexing preprocessor action with a document class. When documents of that class with content-based retrieval (CBR) enabled enter the indexing queue, the associated preprocessor action runs instead of the normal text extraction process, and the preprocessor output is then indexed.

A text indexing preprocessor definition specifies:
  • The document class to which the preprocessing applies
  • The text indexing preprocessor action that performs the preprocessing
  • Whether the definition is enabled or disabled
  • An optional display name to identify the definition
When you work with text indexing preprocessor definitions, note the following characteristics:
  • If you set multiple text indexing preprocessor definitions for a class, each definition must reference a different action. If you try to add a second definition referencing the same action, you receive an error when you try to save the class definition.
  • Text indexing preprocessor definitions are applied implicitly to subclasses in the class hierarchy. If multiple definitions within a class hierarchy reference the same action, the preprocessor for that action is called only once.
  • You can update or remove a text indexing preprocessor definition only from the class where it was originally defined. (You cannot update or remove the definition from subclasses of that class.)
  • Text indexing preprocessor definitions apply to documents with content-based retrieval (CBR) enabled that enter the indexing queue.
Text indexing preprocessor definitions provide the following benefits:
  • Control which document classes use custom preprocessing
  • Apply custom text processing logic to specific document types
  • Improve search accuracy by replacing or augmenting the text extract
  • Standardize text preprocessing across all documents of a particular class
  • Enable or disable preprocessing without modifying the underlying preprocessor action code

When working with text indexing preprocessor definitions, administrators are typically required to perform the following tasks.