IBM FileNet P8, Version 5.2

Working with Document Classification-related Objects

To enable documents of a specified MIME type to be automatically classified, you need the following:

A document classifier to apply a Content Engine class to the documents of the specified MIME type. See Creating a Document Classifier.
A DocumentClassificationAction object that associates the specified MIME type with the document classifier. See Creating a DocumentClassificationAction Object.

You can also retrieve DocumentClassificationAction objects.

To submit a document for classification and to view its classification status, you must check in a document with autoclassification enabled. See Autoclassifying a Document.

For an overview of automatic document classification, see Document Classifications.

Creating a Document Classifier

To create a document classifier, you must implement the DocumentClassifier interface as a Java™ or JavaScript component. A classifier implementation determines the Content Engine class to which a checked-in Document object belongs, and then applies the class to the object. Typically, this involves parsing the content of the Document object and mapping metadata from the content to properties of the Content Engine class.

The following examples show Java and JavaScript implementations, each of which classifies documents of MIME type "text/pdf". Retrieving the document's PDF content as an InputStream object, the classify method uses a third-party API to parse the content. It tests the subject field of the PDF content. If the subject indicates that the PDF document is a loan application, then the method uses the changeClass method to apply the "PdfLoanApplication" class to the Document object. Also, the method maps metadata from the PDF content to properties of the "PdfLoanApplication" class. If the PDF document is not a loan application, then the default class of the Document object is maintained.

See Restrictions and Best Practices for additional implementation information. To view a sample source code DocumentClassifier implementation packaged with the Content Engine, go to this Content Engine directory:

Windows: C:\Program Files\Filenet\Content Engine\samples
non-Windows: /opt/IBM/FileNet/ContentEngine/samples

Java Example

package sample.actionhandler;
import com.filenet.api.core.*;
import com.filenet.api.engine.DocumentClassifier;
import com.filenet.api.exception.*;
import java.io.*;
import com.ticdoc.pdfextract.*; // 3rd-party API for parsing PDF documents

public class DocClassifyHandler implements DocumentClassifier
{
    public void classify(Document doc)
    {
        try
        {
            // Get PDF content from the document passed to this method.
            InputStream IS= doc.accessContentStream(0);

            // Use 3rd-party API to get PDF document metadata.
            PDFDocument pdfDoc = PDFDocument.load(IS);
            PDFDocumentInformation pdfProperties = pdfDoc.getDocumentInformation();
            pdfDoc.close();
            
            // Get subject of PDF document.
            String pdfSubject = pdfProperties.getSubject();
            
            // Classify based on PDF subject.
            if ( pdfSubject.equalsIgnoreCase("loan application") )
            {
                // Apply new class.
                doc.changeClass("PdfLoanApplication");

                // Get PDF properties to be mapped to document.
                String pdfloanType = pdfProperties.getLoanType();
                String pdfApplicantName = pdfProperties.getApplicant();
                String pdfDateSubmitted = pdfProperties.getModificationDate().getTime().toString();

                // Set properties for Document stored in object store.
                doc.getProperties().putValue("LoanType", pdfloanType);
                doc.getProperties().putValue("ApplicantName", pdfApplicantName);
                doc.getProperties().putValue("ApplicationDate", pdfDateSubmitted);
                doc.getProperties().putValue("DocumentTitle", "PDF Loan Application");

                // Set security owner based on loan type.
                if ( pdfloanType.equalsIgnoreCase("home loan application") )
                    doc.set_Owner("GEvans");
                else if (pdfloanType.equalsIgnoreCase("auto loan application") )
                    doc.set_Owner("EMesker");
            }
        }
        catch(Exception e)
        {
            throw new RuntimeException(e);
        }
     }
}

JavaScript Example


importPackage(java.lang);
importPackage(Packages.com.filenet.api.core);
importPackage(Packages.com.ticdoc.pdfextract); // 3rd-party API for parsing PDF documents

function classify(doc)
{
   try { 
      // Get PDF content from document passed to this method.
      var IS= doc.accessContentStream(0);
      
      // Use 3rd-party API to get PDF document metadata.
      var pdfDoc = PDFDocument.load(IS);
      var pdfProperties = pdfDoc.getDocumentInformation();
      pdfDoc.close();

      // Get subject of PDF document.
      var pdfSubject = pdfProperties.getSubject();

     // Classify based on PDF subject.
     if ( pdfSubject.equalsIgnoreCase("loan application") )
     {
        // Apply new class.
        doc.changeClass("PdfLoanApplication");

        // Get PDF properties to be mapped to document.
        var pdfloanType = pdfProperties.getLoanType();
        var pdfApplicantName = pdfProperties.getApplicant();
        var pdfDateSubmitted = pdfProperties.getModificationDate().getTime().toString();

        // Set properties for Document stored in object store.
        doc.getProperties().putValue("LoanType", pdfloanType);
        doc.getProperties().putValue("ApplicantName", pdfApplicantName);
        doc.getProperties().putValue("ApplicationDate", pdfDateSubmitted);
        doc.getProperties().putValue("DocumentTitle", "PDF Loan Application");

        // Set security owner based on loan type.
        if ( pdfloanType.equalsIgnoreCase("home loan application") )
           doc.set_Owner("GEvans");
        else if (pdfloanType.equalsIgnoreCase("auto loan application") )
           doc.set_Owner("EMesker");
     }
   }
   catch (e) {
      throw new RuntimeException(e);
   }
}

Creating a DocumentClassificationAction Object

A DocumentClassificationAction object identifies the document classifier to be launched when a document is checked in with autoclassification enabled. The following Java and C# code examples show how to create a DocumentClassificationAction object and set its properties. The MimeType property associates the DocumentClassificationAction object with documents of the same MIME type. This property is set to "text/pdf". So when documents of this MIME type are checked in with autoclassification enabled, then the document classifier associated with this DocumentClassificationAction object will be launched.

A document classifier is associated with a DocumentClassificationAction through the ProgId and, conditionally, CodeModule properties. For a JavaScript-implemented classifier, you must set the ProgId property to "Javascript". For a Java-implemented classifier, you must set the ProgId property to the fully qualified name of the document classifier. The following examples assume a Java-implemented document classifier.

If, as shown in the examples, the document classifier is contained within a CodeModule stored in an object store, you must also get the CodeModule object, then assign it to the CodeModule property of the DocumentClassificationAction object. Note that you cannot set the CodeModule property to a reservation (in progress) version of CodeModule. For more information, see Creating a CodeModule Object.

Note: Do not set the CodeModule property if you set the application server's class path to the location of the document classifier.

When saved, a DocumentClassificationAction object is stored in the Document Classification Actions folder of a Content Engine object store.

Java Example

...
   // Create document classification action.
   DocumentClassificationAction docClassAction = Factory.DocumentClassificationAction.createInstance(os, 
              ClassNames.DOCUMENT_CLASSIFICATION_ACTION);
                          
   // Set MIME type that associates action to documents of same MIME type.
   docClassAction.set_MimeType("text/pdf");
   
   // Set ProgId property with fully qualified name of classifier.
   docClassAction.set_ProgId("sample.actionhandler.DocClassifyHandler");
   
   // Get CodeModule object.
   CodeModule cm = Factory.CodeModule.getInstance( os, 
              ClassNames.CODE_MODULE, new Id("{C45954D4-5DBB-460B-B890-78D6F4CFA40B}") ); 
   // Set CodeModule property.
   docClassAction.set_CodeModule(cm);
   
   docClassAction.set_DisplayName("DocumentClassificationAction");
   docClassAction.save(RefreshMode.REFRESH);
}

C# Example

...
   // Create document classification action.
   IDocumentClassificationAction docClassAction = Factory.DocumentClassificationAction.CreateInstance(os, 
              ClassNames.DOCUMENT_CLASSIFICATION_ACTION);
                          
   // Set MIME type that associates action to documents of same MIME type.
   docClassAction.MimeType = "text/pdf";

   // Set ProgId property with fully qualified name of classifier.
   docClassAction.ProgId = "sample.actionhandler.DocClassifyHandler";

   // Get CodeModule object.
   ICodeModule cm = Factory.CodeModule.GetInstance( os,
               ClassNames.CODE_MODULE, new Id("{C45954D4-5DBB-460B-B890-78D6F4CFA40B}")); 
   // Set CodeModule property.
   docClassAction.CodeModule = cm;

   docClassAction.DisplayName = "DocumentClassificationAction";
   docClassAction.Save(RefreshMode.REFRESH);
}

Retrieving DocumentClassificationAction Objects

You can get a single DocumentClassificationAction object with a Factory.DocumentClassificationAction method. You can also get a collection of DocumentClassificationAction objects (DocumentLifecycleActionSet) by retrieving the DocumentLifecycleActions property on an ObjectStore object.

The following Java and C# examples show how to retrieve a DocumentLifecycleActionSet collection from an object store. The examples iterate the set, and, for each DocumentClassificationAction object in the collection, the examples retrieve the object's MimeType, ProgId, and CodeModule properties. Note that a document classifier referenced by a DocumentClassificationAction object may not be contained within a CodeModule stored in an object store. This is the case for a JavaScript-implemented classifier or a Java-implemented classifier specified in the classpath of the application server where the Content Engine is running.

Java Example

...
   DocumentClassificationActionSet actionSet = os.get_DocumentClassificationActions();
   DocumentClassificationAction actionObject;
   Iterator iter = actionSet.iterator();
   while ( iter.hasNext() ) 
   {
      actionObject = (DocumentClassificationAction)iter.next();
      System.out.println("DocumentClassificationAction: " + 
            actionObject.get_DisplayName() +
            "\n  MimeType is " + actionObject.get_MimeType() +
            "\n  ProgId is " + actionObject.get_ProgId() );
      String cmName = actionObject.get_CodeModule() != null ?
            actionObject.get_CodeModule().getProperties().getStringValue("Name") :
            "not assigned to this action";
      System.out.println("  CodeModule is " + cmName);
   }
}

C# Example

...
   IDocumentClassificationActionSet actionSet = os.DocumentClassificationActions;
   IDocumentClassificationAction actionObject;
   System.Collections.IEnumerator iter = actionSet.GetEnumerator();
   while (iter.MoveNext())
   {
      actionObject = (IDocumentClassificationAction)iter.Current;
      System.Console.WriteLine("IDocumentClassificationAction: " + 
            actionObject.DisplayName +
            "\n  MimeType is " + actionObject.MimeType +
            "\n  ProgId is " + actionObject.ProgId );
      String cmName = actionObject.CodeModule != null ?
            actionObject.CodeModule.Properties.GetStringValue("Name") :
            "not assigned to this action";
      System.Console.WriteLine("  CodeModule is " + cmName);
   }
}

Autoclassifying a Document

You can automatically classify documents with MIME types for which a classification infrastructure has been previously set up. That is, for a particular MIME type, a corresponding document classifier and a DocumentClassificationAction object must exist.

The following Java and C# examples show how to submit a document of MIME type "text/pdf" for automatic classification. The previous sections in this topic include code examples of a document classifier and a DocumentClassificationAction object that support this MIME type.

In the examples, a Document object is created for a PDF document, and the object's properties are set, most notably the ContentElements property and the MimeType property. The ContentElements property is set to the PDF content of the document, and this content will later be parsed by the document classifier. Note that the value of the Document object's MimeType property must match the value of the DocumentClassificationAction object's MimeType property. The Document object is then checked in, with the checkin method specifying the AUTO_CLASSIFY constant.

The examples also include code to monitor the classification process by reading the checked-in document's ClassificationStatus property, which is set to a DocClassificationStatus constant. When an autoclassification request is made, the initial ClassificationStatus value is CLASSIFICATION_PENDING. The code repeatedly checks the status until the property's value changes.

Note that because a document classifier runs as an asynchronous action, an autoclassification request is initially queued, and represented by a DocumentClassificationQueueItem object. This queued state corresponds with the CLASSIFICATION_PENDING status.

Java Example

...
   Document doc = Factory.Document.createInstance(os, "Document");
   FileInputStream fileIS = new FileInputStream("C:\\EclipseWorkspace\\Documents\\loanapplication.pdf");

   // Create content transfer list.
   ContentTransferList contentList = Factory.ContentTransfer.createList();
   ContentTransfer ctNew = Factory.ContentTransfer.createInstance();
   ctNew.setCaptureSource(fileIS);
   contentList.add(ctNew);

   // Set content on Document object.
   doc.set_ContentElements(contentList);
   
   // Set Document properties.
   doc.getProperties().putValue("DocumentTitle", "PDF Document");
   doc.set_MimeType("text/pdf");
   
   // Check in document and commit to server.
   doc.checkin(AutoClassify.AUTO_CLASSIFY, CheckinType.MAJOR_VERSION);
   doc.save(RefreshMode.REFRESH);

   // Check classification status during auto classify.
   while (doc.get_ClassificationStatus() == DocClassificationStatus.CLASSIFICATION_PENDING)
   {
      System.out.println( "Classification status is " + doc.get_ClassificationStatus() );
      doc.refresh();
   }
   System.out.println("Classification status is " + doc.get_ClassificationStatus() );
}

C# Example

...
   IDocument doc = Factory.Document.CreateInstance(os, "Document");
   Stream fileStream = File.OpenRead(@"C:\\EclipseWorkspace\\Documents\\loanapplication.pdf");

   // Create content transfer list.
   IContentTransferList contentList = Factory.ContentTransfer.CreateList();
   IContentTransfer ctNew = Factory.ContentTransfer.CreateInstance();
   ctNew.SetCaptureSource(fileStream);
   contentList.Add(ctNew);

   // Set content on Document object.
   doc.ContentElements = contentList;
   
   // Set Document properties.
   doc.Properties["DocumentTitle"] = "PDF Document";
   doc.MimeType = "text/pdf";

   // Check in document and commit to server.
   doc.Checkin(AutoClassify.AUTO_CLASSIFY, CheckinType.MAJOR_VERSION);
   doc.Save(RefreshMode.REFRESH);

   // Check classification status during auto classify.
   while (doc.ClassificationStatus == DocClassificationStatus.CLASSIFICATION_PENDING)
   {
      System.Console.WriteLine("Classification status is " + doc.ClassificationStatus);
      doc.Refresh();
   }
   System.Console.WriteLine("Classification status is " + doc.ClassificationStatus);
}