Document Classification
Documents that are checked into the Content Engine require a class. A document can be classified manually, by selecting the document's class, or it can be classified automatically when the document is checked in. The Content Engine provides an extensible framework that enables an incoming document of a specified MIME type to be automatically assigned to a target document class, and sets selected properties of that target class based on values that are found in the incoming document. A classification component, or classifier, does the work of assigning a document class. One such classifier that is packaged with the Content Engine is the XML classifier. For information, see Classification Flowchart and Understanding the XML Classifier.
You can also plug custom classifiers that are implemented with JavaScript or Java™ into the document classification framework. For information, see Understanding Automatic Document Classification.
Custom Classifier Requirements
To plug a custom classifier into the document classification framework, follow these tasks:
- Implement the DocumentClassifier interface of the Content Engine Java API. It can be implemented as a JavaScript or Java component. Your classifier must assign
the applicable class to the
Document
object that is passed to it. Typically, this effort involves extracting metadata from the content of theDocument
object and mapping it to the class properties inherited by the object. You can implement your classifier to support one or multiple MIME types.For a classifier that is implemented with Java, you can package the class in a JAR file, and check in your class or JAR file as a CodeModule object in a Content Engine object store. Alternatively, you can specify the classifier in the class path of the application server where the Content Engine is running. A document classifier runs asynchronously on the Content Engine.
- Create a DocumentClassificationAction object. You instantiate
an object with the Factory.DocumentClassificationAction
createInstance
method, then set the following required properties:- The display name of the document classification action to identify it (DisplayName property).
- The MIME type to associate documents of a certain content type
with a classifier implemented to handle documents of that type (MimeType
property). A MIME type can map only to one classifier; therefore,
the MimeType property of one
DocumentClassificationAction
object must be different from the MimeType property of otherDocumentClassificationAction
objects. - The program ID (ProgId property) that identifies the classifier
that is implemented for JavaScript or Java. If you checked in to an object
store a classifier that is implemented for Java, you must also set the
DocumentClassificationAction
object's CodeModule property.
For code examples on implementing a document classifier and
on creating a DocumentClassificationAction
object,
see Working with Document
Classification-related Objects. For more information, see Action Handlers.
Document Check-in Process
You
can automatically classify documents with a content type that matches
the MIME type property of an existing DocumentClassificationAction
object.
To automatically classify a new document, you create a Document object and follow these tasks:
- Set the
Document
object's MimeType property to match the value of theDocumentClassificationAction
's MimeType property. Although theDocument
object's MimeType property is not required, it is recommended that you set it to ensure that the intended document classifier is started. If you don't set the MimeType property, the Content Engine maps the content element's file extension to the MIME type, and the result might be different from what you anticipate. See About MIME Types. - Call the
checkin
method of theDocument
object, specifying theAUTO_CLASSIFY
constant.
The Document
object is checked into the object
store with an initial class, and the object's ClassificationStatus property is set to CLASSIFICATION_PENDING
.
Document classification is an asynchronous action; therefore, the
auto-classification request is queued, represented by a DocumentClassificationQueueItem object.
The Classification Manager is responsible for dequeuing
a classification request and processing it. The Classification Manager
obtains the MIME type from the target document, locates the DocumentClassificationAction
object
that is registered for that MIME type, and starts the classifier that
is identified in the DocumentClassificationAction
object.
A classifier operates with the same access permissions of the user
who initiates the document check-in.
When document classification
is complete, the Document
object's ClassificationStatus
property is updated to indicate success or failure. If classification
fails, the initial class that is assigned to the document remains.
If classification succeeds, a new class is assigned to the document
and a ClassifyCompleteEvent object is triggered.