Working with Text Indexing Preprocessors
For an overview of text indexing preprocessors, see Text Indexing Preprocessors.
Creating a Text Indexing Preprocessor Handler
To create a text indexing preprocessor handler,
you must implement the preprocess method
of the Java™ TextIndexingPreprocessor interface. The following examples
show Java and JavaScript implementations, each of which
extracts custom metadata from document properties and adds it to the indexing fields
for enhanced search capabilities.
TextIndexingPreprocessorHandler,
go to this Content Engine directory:- Windows: C:\Program Files\Filenet\Content Engine\samples
- non-Windows: /opt/IBM/FileNet/ContentEngine/samples
Java Example
package sample.actionhandler;
import java.util.Map;
import java.util.Set;
import java.util.HashSet;
import com.filenet.api.engine.TextIndexingPreprocessor;
import com.filenet.api.core.Document;
import com.filenet.api.property.Properties;
public class TextIndexingPreprocessorHandler implements TextIndexingPreprocessor {
public void preprocess(Document document, Map fields) {
// Get document properties
Properties props = document.getProperties();
// Extract invoice number and add to indexing fields
String invoiceNumber = props.getStringValue("InvoiceNumber");
if (invoiceNumber != null && !invoiceNumber.isEmpty()) {
Set<String> values = new HashSet<String>();
values.add(invoiceNumber);
fields.put("invoice_number", values);
}
// Extract customer ID and add to indexing fields
String customerId = props.getStringValue("CustomerID");
if (customerId != null && !customerId.isEmpty()) {
Set<String> values = new HashSet<String>();
values.add(customerId);
fields.put("customer_id", values);
}
// Extract invoice date and add to indexing fields
String invoiceDate = props.getStringValue("InvoiceDate");
if (invoiceDate != null && !invoiceDate.isEmpty()) {
Set<String> values = new HashSet<String>();
values.add(invoiceDate);
fields.put("invoice_date", values);
}
}
}
JavaScript Example
// TextIndexingPreprocessor implementation in JavaScript
function preprocess(document, fields) {
// Get document properties
var props = document.getProperties();
// Extract invoice number and add to indexing fields
var invoiceNumber = props.getStringValue("InvoiceNumber");
if (invoiceNumber != null && invoiceNumber.length > 0) {
var values = new java.util.HashSet();
values.add(invoiceNumber);
fields.put("invoice_number", values);
}
// Extract customer ID and add to indexing fields
var customerId = props.getStringValue("CustomerID");
if (customerId != null && customerId.length > 0) {
var values = new java.util.HashSet();
values.add(customerId);
fields.put("customer_id", values);
}
// Extract invoice date and add to indexing fields
var invoiceDate = props.getStringValue("InvoiceDate");
if (invoiceDate != null && invoiceDate.length > 0) {
var values = new java.util.HashSet();
values.add(invoiceDate);
fields.put("invoice_date", values);
}
}
Creating a TextIndexingPreprocessorAction Object
A CmTextIndexingPreprocessorAction object identifies the
text indexing preprocessor handler to start on a document of a class that
is associated with a text indexing preprocessor. The following Java and C# examples show how to create two CmTextIndexingPreprocessorAction objects,
one that references a handler that is implemented with Java and one that references a handler that
is implemented with JavaScript.
For the first CmTextIndexingPreprocessorAction object
that is created, the handler that is implemented with Java is contained within a code module, requiring
retrieval of the CodeModule object and setting it
on the CodeModule property of the CmTextIndexingPreprocessorAction object.
For the second CmTextIndexingPreprocessorAction object
that's created, the text indexing preprocessor handler is implemented in JavaScript, requiring that
the script is set on the object's ScriptText property.
Java Example
// Create text indexing preprocessor action for Java handler.
CmTextIndexingPreprocessorAction tpaJava = Factory.CmTextIndexingPreprocessorAction.createInstance(os, "CmTextIndexingPreprocessorAction");
// Get CodeModule object with Java component.
CodeModule cm = Factory.CodeModule.getInstance(os, "CodeModule",
new Id("{1DFCEDCC-B734-45AD-93D6-03874E8F1288}") );
// Set CodeModule property.
tpaJava.set_CodeModule(cm);
// Set ProgId property with fully qualified name of handler class.
tpaJava.set_ProgId("sample.actionhandler.TextIndexingPreprocessorHandler");
// Set other properties and save.
tpaJava.set_IsEnabled(false);
tpaJava.set_DisplayName("Invoice metadata extraction");
tpaJava.save(RefreshMode.REFRESH);
// Create text indexing preprocessor action for JavaScript handler.
CmTextIndexingPreprocessorAction tpaJavascript = Factory.CmTextIndexingPreprocessorAction.createInstance(os, "CmTextIndexingPreprocessorAction");
// Set ProgId property to script type identifier.
tpaJavascript.set_ProgId("Javascript");
// Call method to read script from a file, and set the script text on the action object.
String inputScript = readScriptText();
tpaJavascript.set_ScriptText(inputScript);
// Set other properties and save.
tpaJavascript.set_IsEnabled(true);
tpaJavascript.set_DisplayName("Invoice metadata extraction");
tpaJavascript.save(RefreshMode.REFRESH);
C# Example
// Create text indexing preprocessor action for Java handler.
ICmTextIndexingPreprocessorAction tpaJava = Factory.CmTextIndexingPreprocessorAction.CreateInstance(os, "CmTextIndexingPreprocessorAction");
// Get CodeModule object with Java component.
ICodeModule cm = Factory.CodeModule.GetInstance(os, "CodeModule",
new Id("{1DFCEDCC-B734-45AD-93D6-03874E8F1288}") );
// Set CodeModule property.
tpaJava.CodeModule = cm;
// Set ProgId property with fully qualified name of handler class.
tpaJava.ProgId = "sample.actionhandler.TextIndexingPreprocessorHandler";
// Set other properties and save.
tpaJava.IsEnabled = false;
tpaJava.DisplayName = "Invoice metadata extraction";
tpaJava.Save(RefreshMode.REFRESH);
// Create text indexing preprocessor action for JavaScript handler.
ICmTextIndexingPreprocessorAction tpaJavascript = Factory.CmTextIndexingPreprocessorAction.CreateInstance(os, "CmTextIndexingPreprocessorAction");
// Set ProgId property to type of script.
tpaJavascript.ProgId = "Javascript";
// Call method to read script from a file, and set the script text on the action object.
String inputScript = readScriptText();
tpaJavascript.ScriptText = inputScript;
// Set other properties and save.
tpaJavascript.IsEnabled = true;
tpaJavascript.DisplayName = "Invoice metadata extraction";
tpaJavascript.Save(RefreshMode.REFRESH);
Creating a TextIndexingPreprocessorDefinition Object
After you create the text indexing preprocessor
handler and the CmTextIndexingPreprocessorAction object,
you are ready to create a CmTextIndexingPreprocessorDefinition object. In the
following Java and C# examples,
the CmTextIndexingPreprocessorDefinition object is set
on the SubscribableClassDefinition object that represents
the user-defined Invoice class.
Java Example
// Create text indexing preprocessor definition object.
CmTextIndexingPreprocessorDefinition tpDef = Factory.CmTextIndexingPreprocessorDefinition.createInstance(os);
tpDef.set_DisplayName("Invoice - Metadata extraction");
tpDef.set_IsEnabled(true);
// Get CmTextIndexingPreprocessorAction and set on definition object.
CmTextIndexingPreprocessorAction action = Factory.CmTextIndexingPreprocessorAction.getInstance(os, "CmTextIndexingPreprocessorAction",
new Id("{45A990F2-1B0D-4CF2-AB3E-9B6B06A5410E}") );
tpDef.set_TextIndexingPreprocessorAction(action);
// Create text indexing preprocessor definition list object and add definition object.
CmTextIndexingPreprocessorDefinitionList tpdList=Factory.CmTextIndexingPreprocessorDefinition.createList();
tpdList.add(tpDef);
// Get Invoice class definition and set definition list object on it.
SubscribableClassDefinition objClassDef = Factory.SubscribableClassDefinition.getInstance(os, new Id("3ADC7781-ED70-43E4-97C3-40CF7DE2D565}") );
objClassDef.set_TextIndexingPreprocessorDefinitions(tpdList);
objClassDef.save(RefreshMode.NO_REFRESH);
C# Example
// Create text indexing preprocessor definition object.
ICmTextIndexingPreprocessorDefinition tpDef = Factory.CmTextIndexingPreprocessorDefinition.CreateInstance(os);
tpDef.DisplayName = "Invoice - Metadata extraction";
tpDef.IsEnabled = true;
// Get ICmTextIndexingPreprocessorAction and set on definition object.
ICmTextIndexingPreprocessorAction action = Factory.CmTextIndexingPreprocessorAction.GetInstance(os, "CmTextIndexingPreprocessorAction"
new Id("{0551C740-3502-46D1-9DB0-98CBDBD70232}") );
tpDef.TextIndexingPreprocessorAction=action;
// Create text indexing preprocessor definition list object and add definition object.
ICmTextIndexingPreprocessorDefinitionList tpdList=Factory.CmTextIndexingPreprocessorDefinition.CreateList();
tpdList.Add(tpDef);
// Get Invoice class definition and set definition list object on it.
ISubscribableClassDefinition objClassDef = Factory.SubscribableClassDefinition.GetInstance(os, new Id("3ADC7781-ED70-43E4-97C3-40CF7DE2D565}") );
objClassDef.TextIndexingPreprocessorDefinitions = tpdList;
objClassDef.Save(RefreshMode.NO_REFRESH);
Retrieving Text Indexing Preprocessor Actions
The following Java and C# examples show how to retrieve a
collection of CmTextIndexingPreprocessorAction objects
from an object store.
Java Example
// Retrieve text indexing preprocessor actions from the object store.
FilterElement fe = new FilterElement(null, null, null, PropertyNames.TEXT_INDEXING_PREPROCESSOR_ACTIONS, null);
PropertyFilter pf = new PropertyFilter();
pf.addIncludeProperty(fe);
os.fetchProperties(pf);
CmTextIndexingPreprocessorActionSet tpActionSet = os.get_TextIndexingPreprocessorActions();
// Iterate the action set.
Iterator iter = tpActionSet.iterator();
while (iter.hasNext() )
{
CmTextIndexingPreprocessorAction tpAction = (CmTextIndexingPreprocessorAction)iter.next();
System.out.println("Action: " + tpAction.get_DisplayName() + " | IsEnabled: " + tpAction.get_IsEnabled() );
}
C# Example
// Retrieve text indexing preprocessor actions from the object store.
FilterElement fe = new FilterElement(null, null, null, PropertyNames.TEXT_INDEXING_PREPROCESSOR_ACTIONS, null);
PropertyFilter pf = new PropertyFilter();
pf.AddIncludeProperty(fe);
os.FetchProperties(pf);
ICmTextIndexingPreprocessorActionSet tpActionSet = os.TextIndexingPreprocessorActions;
// Iterate the action set.
foreach (ICmTextIndexingPreprocessorAction tpAction in tpActionSet)
{
System.Console.WriteLine("Action: " + tpAction.DisplayName + " | IsEnabled: " + tpAction.IsEnabled );
}
Retrieving Text Indexing Preprocessor Definitions
The following Java and C# examples show how to
retrieve a list of CmTextIndexingPreprocessorDefinition objects
from a class definition.
Java Example
// Fetch Invoice class definition and get list of CmTextIndexingPreprocessorDefinition objects.
SubscribableClassDefinition scd = Factory.SubscribableClassDefinition.fetchInstance(os, new Id("3ADC7781-ED70-43E4-97C3-40CF7DE2D565}"), null);
CmTextIndexingPreprocessorDefinitionList tpdList = scd.get_TextIndexingPreprocessorDefinitions();
// Iterate the definition list.
Iterator iter = tpdList.iterator();
while (iter.hasNext() )
{
CmTextIndexingPreprocessorDefinition tpDef = (CmTextIndexingPreprocessorDefinition)iter.next();
System.out.println("Definition: " + tpDef.get_DisplayName() + " | IsEnabled: " + tpDef.get_IsEnabled() );
}
C# Example
// Fetch Invoice class definition and get list of CmTextIndexingPreprocessorDefinition objects.
ISubscribableClassDefinition scd = Factory.SubscribableClassDefinition.FetchInstance(os, new Id("3ADC7781-ED70-43E4-97C3-40CF7DE2D565}"), null);
ICmTextIndexingPreprocessorDefinitionList tpdList = scd.TextIndexingPreprocessorDefinitions;
// Iterate the definition list.
foreach (ICmTextIndexingPreprocessorDefinition tpDef in tpdList)
{
System.Console.WriteLine("Definition: " + tpDef.DisplayName + " | IsEnabled: " + tpDef.IsEnabled );
}