Working with Text Indexing Preprocessors

For an overview of text indexing preprocessors, see Text Indexing Preprocessors.

Creating a Text Indexing Preprocessor Handler

To create a text indexing preprocessor handler, you must implement the preprocess method of the Java™ TextIndexingPreprocessor interface. The following examples show Java and JavaScript implementations, each of which extracts custom metadata from document properties and adds it to the indexing fields for enhanced search capabilities.

Note: The Content Engine uses a default constructor to create a text indexing preprocessor handler object. Do not include a constructor in your implementation.
For more information, see Action Handlers. To view a sample source implementation of TextIndexingPreprocessorHandler, go to this Content Engine directory:
  • Windows: C:\Program Files\Filenet\Content Engine\samples
  • non-Windows: /opt/IBM/FileNet/ContentEngine/samples

Java Example


package sample.actionhandler;

import java.util.Map;
import java.util.Set;
import java.util.HashSet;
import com.filenet.api.engine.TextIndexingPreprocessor;
import com.filenet.api.core.Document;
import com.filenet.api.property.Properties;

public class TextIndexingPreprocessorHandler implements TextIndexingPreprocessor {
    
    public void preprocess(Document document, Map fields) {
        // Get document properties
        Properties props = document.getProperties();
        
        // Extract invoice number and add to indexing fields
        String invoiceNumber = props.getStringValue("InvoiceNumber");
        if (invoiceNumber != null && !invoiceNumber.isEmpty()) {
            Set<String> values = new HashSet<String>();
            values.add(invoiceNumber);
            fields.put("invoice_number", values);
        }
        
        // Extract customer ID and add to indexing fields
        String customerId = props.getStringValue("CustomerID");
        if (customerId != null && !customerId.isEmpty()) {
            Set<String> values = new HashSet<String>();
            values.add(customerId);
            fields.put("customer_id", values);
        }
        
        // Extract invoice date and add to indexing fields
        String invoiceDate = props.getStringValue("InvoiceDate");
        if (invoiceDate != null && !invoiceDate.isEmpty()) {
            Set<String> values = new HashSet<String>();
            values.add(invoiceDate);
            fields.put("invoice_date", values);
        }
    }
}

JavaScript Example


// TextIndexingPreprocessor implementation in JavaScript
function preprocess(document, fields) {
    // Get document properties
    var props = document.getProperties();
    
    // Extract invoice number and add to indexing fields
    var invoiceNumber = props.getStringValue("InvoiceNumber");
    if (invoiceNumber != null && invoiceNumber.length > 0) {
        var values = new java.util.HashSet();
        values.add(invoiceNumber);
        fields.put("invoice_number", values);
    }
    
    // Extract customer ID and add to indexing fields
    var customerId = props.getStringValue("CustomerID");
    if (customerId != null && customerId.length > 0) {
        var values = new java.util.HashSet();
        values.add(customerId);
        fields.put("customer_id", values);
    }
    
    // Extract invoice date and add to indexing fields
    var invoiceDate = props.getStringValue("InvoiceDate");
    if (invoiceDate != null && invoiceDate.length > 0) {
        var values = new java.util.HashSet();
        values.add(invoiceDate);
        fields.put("invoice_date", values);
    }
}

Creating a TextIndexingPreprocessorAction Object

A CmTextIndexingPreprocessorAction object identifies the text indexing preprocessor handler to start on a document of a class that is associated with a text indexing preprocessor. The following Java and C# examples show how to create two CmTextIndexingPreprocessorAction objects, one that references a handler that is implemented with Java and one that references a handler that is implemented with JavaScript.

For the first CmTextIndexingPreprocessorAction object that is created, the handler that is implemented with Java is contained within a code module, requiring retrieval of the CodeModule object and setting it on the CodeModule property of the CmTextIndexingPreprocessorAction object.

For the second CmTextIndexingPreprocessorAction object that's created, the text indexing preprocessor handler is implemented in JavaScript, requiring that the script is set on the object's ScriptText property.

Java Example


// Create text indexing preprocessor action for Java handler.
CmTextIndexingPreprocessorAction tpaJava = Factory.CmTextIndexingPreprocessorAction.createInstance(os, "CmTextIndexingPreprocessorAction");

// Get CodeModule object with Java component.
CodeModule cm = Factory.CodeModule.getInstance(os, "CodeModule",
                new Id("{1DFCEDCC-B734-45AD-93D6-03874E8F1288}") ); 

// Set CodeModule property.
tpaJava.set_CodeModule(cm);

// Set ProgId property with fully qualified name of handler class.
tpaJava.set_ProgId("sample.actionhandler.TextIndexingPreprocessorHandler");

// Set other properties and save.
tpaJava.set_IsEnabled(false);
tpaJava.set_DisplayName("Invoice metadata extraction");
tpaJava.save(RefreshMode.REFRESH);

// Create text indexing preprocessor action for JavaScript handler.
CmTextIndexingPreprocessorAction tpaJavascript = Factory.CmTextIndexingPreprocessorAction.createInstance(os, "CmTextIndexingPreprocessorAction");

// Set ProgId property to script type identifier.
tpaJavascript.set_ProgId("Javascript");

// Call method to read script from a file, and set the script text on the action object.
String inputScript = readScriptText();
tpaJavascript.set_ScriptText(inputScript);

// Set other properties and save.
tpaJavascript.set_IsEnabled(true);
tpaJavascript.set_DisplayName("Invoice metadata extraction");
tpaJavascript.save(RefreshMode.REFRESH);

C# Example


// Create text indexing preprocessor action for Java handler.
ICmTextIndexingPreprocessorAction tpaJava = Factory.CmTextIndexingPreprocessorAction.CreateInstance(os, "CmTextIndexingPreprocessorAction");

// Get CodeModule object with Java component.
ICodeModule cm = Factory.CodeModule.GetInstance(os, "CodeModule",
                 new Id("{1DFCEDCC-B734-45AD-93D6-03874E8F1288}") ); 

// Set CodeModule property.
tpaJava.CodeModule = cm;

// Set ProgId property with fully qualified name of handler class.
tpaJava.ProgId = "sample.actionhandler.TextIndexingPreprocessorHandler";

// Set other properties and save.
tpaJava.IsEnabled = false;
tpaJava.DisplayName = "Invoice metadata extraction";
tpaJava.Save(RefreshMode.REFRESH);

// Create text indexing preprocessor action for JavaScript handler.
ICmTextIndexingPreprocessorAction tpaJavascript = Factory.CmTextIndexingPreprocessorAction.CreateInstance(os, "CmTextIndexingPreprocessorAction");

// Set ProgId property to type of script.
tpaJavascript.ProgId = "Javascript";

// Call method to read script from a file, and set the script text on the action object.
String inputScript = readScriptText();
tpaJavascript.ScriptText = inputScript;

// Set other properties and save.
tpaJavascript.IsEnabled = true;
tpaJavascript.DisplayName = "Invoice metadata extraction";
tpaJavascript.Save(RefreshMode.REFRESH);

Creating a TextIndexingPreprocessorDefinition Object

After you create the text indexing preprocessor handler and the CmTextIndexingPreprocessorAction object, you are ready to create a CmTextIndexingPreprocessorDefinition object. In the following Java and C# examples, the CmTextIndexingPreprocessorDefinition object is set on the SubscribableClassDefinition object that represents the user-defined Invoice class.

Java Example


// Create text indexing preprocessor definition object.
CmTextIndexingPreprocessorDefinition tpDef = Factory.CmTextIndexingPreprocessorDefinition.createInstance(os);
tpDef.set_DisplayName("Invoice - Metadata extraction");
tpDef.set_IsEnabled(true);

// Get CmTextIndexingPreprocessorAction and set on definition object.
CmTextIndexingPreprocessorAction action = Factory.CmTextIndexingPreprocessorAction.getInstance(os, "CmTextIndexingPreprocessorAction",
     new Id("{45A990F2-1B0D-4CF2-AB3E-9B6B06A5410E}") );
tpDef.set_TextIndexingPreprocessorAction(action);

// Create text indexing preprocessor definition list object and add definition object.
CmTextIndexingPreprocessorDefinitionList tpdList=Factory.CmTextIndexingPreprocessorDefinition.createList();
tpdList.add(tpDef);

// Get Invoice class definition and set definition list object on it. 
SubscribableClassDefinition objClassDef = Factory.SubscribableClassDefinition.getInstance(os, new Id("3ADC7781-ED70-43E4-97C3-40CF7DE2D565}") );
objClassDef.set_TextIndexingPreprocessorDefinitions(tpdList);

objClassDef.save(RefreshMode.NO_REFRESH);

C# Example


// Create text indexing preprocessor definition object.
ICmTextIndexingPreprocessorDefinition tpDef = Factory.CmTextIndexingPreprocessorDefinition.CreateInstance(os);
tpDef.DisplayName = "Invoice - Metadata extraction";
tpDef.IsEnabled = true;

// Get ICmTextIndexingPreprocessorAction and set on definition object.
ICmTextIndexingPreprocessorAction action = Factory.CmTextIndexingPreprocessorAction.GetInstance(os, "CmTextIndexingPreprocessorAction"
     new Id("{0551C740-3502-46D1-9DB0-98CBDBD70232}") );
tpDef.TextIndexingPreprocessorAction=action;

// Create text indexing preprocessor definition list object and add definition object.
ICmTextIndexingPreprocessorDefinitionList tpdList=Factory.CmTextIndexingPreprocessorDefinition.CreateList();
tpdList.Add(tpDef);

// Get Invoice class definition and set definition list object on it. 
ISubscribableClassDefinition objClassDef = Factory.SubscribableClassDefinition.GetInstance(os, new Id("3ADC7781-ED70-43E4-97C3-40CF7DE2D565}") );
objClassDef.TextIndexingPreprocessorDefinitions = tpdList;

objClassDef.Save(RefreshMode.NO_REFRESH);

Retrieving Text Indexing Preprocessor Actions

The following Java and C# examples show how to retrieve a collection of CmTextIndexingPreprocessorAction objects from an object store.

Java Example


// Retrieve text indexing preprocessor actions from the object store.
FilterElement fe = new FilterElement(null, null, null, PropertyNames.TEXT_INDEXING_PREPROCESSOR_ACTIONS, null);
PropertyFilter pf = new PropertyFilter();
pf.addIncludeProperty(fe);
os.fetchProperties(pf);
CmTextIndexingPreprocessorActionSet tpActionSet = os.get_TextIndexingPreprocessorActions();

// Iterate the action set.
Iterator iter = tpActionSet.iterator();
while (iter.hasNext() )
{
   CmTextIndexingPreprocessorAction tpAction = (CmTextIndexingPreprocessorAction)iter.next();
   System.out.println("Action: " + tpAction.get_DisplayName() + " | IsEnabled: " + tpAction.get_IsEnabled() );
}

C# Example


// Retrieve text indexing preprocessor actions from the object store.
FilterElement fe = new FilterElement(null, null, null, PropertyNames.TEXT_INDEXING_PREPROCESSOR_ACTIONS, null);
PropertyFilter pf = new PropertyFilter();
pf.AddIncludeProperty(fe);
os.FetchProperties(pf);
ICmTextIndexingPreprocessorActionSet tpActionSet = os.TextIndexingPreprocessorActions;

// Iterate the action set.
foreach (ICmTextIndexingPreprocessorAction tpAction in tpActionSet)
{
   System.Console.WriteLine("Action: " + tpAction.DisplayName + " | IsEnabled: " + tpAction.IsEnabled );
}

Retrieving Text Indexing Preprocessor Definitions

The following Java and C# examples show how to retrieve a list of CmTextIndexingPreprocessorDefinition objects from a class definition.

Java Example


// Fetch Invoice class definition and get list of CmTextIndexingPreprocessorDefinition objects. 
SubscribableClassDefinition scd = Factory.SubscribableClassDefinition.fetchInstance(os, new Id("3ADC7781-ED70-43E4-97C3-40CF7DE2D565}"), null);
CmTextIndexingPreprocessorDefinitionList tpdList = scd.get_TextIndexingPreprocessorDefinitions();

// Iterate the definition list.
Iterator iter = tpdList.iterator();
while (iter.hasNext() )
{
   CmTextIndexingPreprocessorDefinition tpDef = (CmTextIndexingPreprocessorDefinition)iter.next();
   System.out.println("Definition: " + tpDef.get_DisplayName() + " | IsEnabled: " + tpDef.get_IsEnabled() );
}

C# Example


// Fetch Invoice class definition and get list of CmTextIndexingPreprocessorDefinition objects. 
ISubscribableClassDefinition scd = Factory.SubscribableClassDefinition.FetchInstance(os, new Id("3ADC7781-ED70-43E4-97C3-40CF7DE2D565}"), null);
ICmTextIndexingPreprocessorDefinitionList tpdList = scd.TextIndexingPreprocessorDefinitions;

// Iterate the definition list.
foreach (ICmTextIndexingPreprocessorDefinition tpDef in tpdList)
{
   System.Console.WriteLine("Definition: " + tpDef.DisplayName + " | IsEnabled: " + tpDef.IsEnabled );
}