Managing text indexing preprocessor actions

Create and manage text indexing preprocessor actions that contain the Java™ code for preprocessing documents.

About this task

A text indexing preprocessor action contains the Java code that performs the document preprocessing. The preprocessor runs instead of the normal text extraction process, and the preprocessor output is then indexed. The code is stored outside the action object, either as a code module or as a class or JAR file.

When you work with text indexing preprocessor actions, note the following characteristics:

  • Text indexing preprocessor actions are independently persistable, so that they can be created separately from other objects and then assigned to one or more text indexing preprocessor definitions.
  • A Java-implemented text indexing preprocessor must be placed in a code module. It can coexist with other action handler types, such as an event action, lifecycle action, or document classifier.
  • If your Java-based action is not included in a code module, the class or JAR file must be located on the local Content Platform Engine application server. In addition, it must be included in the application server class path.
  • A text indexing preprocessor action can be enabled or disabled. If you disable an action, all classes in the system with text indexing preprocessor definitions that reference the disabled action are affected. Therefore, if you want to disable text indexing preprocessing for a class, disable the text indexing preprocessor definition. If you want to disable a particular action everywhere in the system, disable the text indexing preprocessor action.

When working with text indexing preprocessor actions, administrators are typically required to perform the following tasks.