Managing text indexing preprocessor definitions
Create and manage text indexing preprocessor definitions to control which documents use custom preprocessing instead of the normal text extraction process.
About this task
A text indexing preprocessor definition associates a text indexing preprocessor action with a document class. When documents of that class with content-based retrieval (CBR) enabled enter the indexing queue, the associated preprocessor action runs instead of the normal text extraction process, and the preprocessor output is then indexed.
A text indexing preprocessor definition specifies:
- The document class to which the preprocessing applies
- The text indexing preprocessor action that performs the preprocessing
- Whether the definition is enabled or disabled
- An optional display name to identify the definition
When you work with text indexing preprocessor definitions, note the following characteristics:
- If you set multiple text indexing preprocessor definitions for a class, each definition must reference a different action. If you try to add a second definition referencing the same action, you receive an error when you try to save the class definition.
- Text indexing preprocessor definitions are applied implicitly to subclasses in the class hierarchy. If multiple definitions within a class hierarchy reference the same action, the preprocessor for that action is called only once.
- You can update or remove a text indexing preprocessor definition only from the class where it was originally defined. (You cannot update or remove the definition from subclasses of that class.)
- Text indexing preprocessor definitions apply to documents with content-based retrieval (CBR) enabled that enter the indexing queue.
Text indexing preprocessor definitions provide the following benefits:
- Control which document classes use custom preprocessing
- Apply custom text processing logic to specific document types
- Improve search accuracy by replacing or augmenting the text extract
- Standardize text preprocessing across all documents of a particular class
- Enable or disable preprocessing without modifying the underlying preprocessor action code
When working with text indexing preprocessor definitions, administrators are typically required to perform the following tasks.