Understanding automatic document classification

The FileNet® P8 Platform object model describes how objects receive their properties and security from the class on which they are based. The automatic classification framework is a means whereby the values assigned to some of those properties are derived from the content of the document itself.

An incoming document of a specified MIME, or content, type can be automatically assigned to a target document class, as well as setting selected properties of that target class based on values that are found in the incoming document. A classification component, or classifier, does the work of assigning a document class. One such classifier that is packaged with Content Platform Engine is XML Classifier, which provides support for automatically classifying documents of content type text/xml. For more information about the XML Classifier, see Understanding the XML Classifier.

In addition to the XML Classifier, you can add custom classifiers to your site. A custom classifier implements the Java™ DocumentClassifier interface. Custom classifiers are typically coded by a developer and managed by an administrator by using the administration console. For more information about creating custom Java-implemented classifiers, see Document Classification Concepts.

The automatic document classification process

You specify automatic classification when you create a document. After the document is checked in, Document Classification Manager determines which classification component, or classifier, to use to classify the document based on the document's content type (MIME type). Document Classification Manager starts the appropriate classifier; after control returns from the classifier, Document Classification Manager sets the classification status on the target document to indicate success or failure, depending on the error status that is returned by the classifier.

For information about the steps involved when you use the XML Classifier, see Classification flowchart.

Asynchronous classification

Automatic classification is processed asynchronously. Asynchronous processing means that classification occurs within a separate transaction and that the server does not wait until the classification operation has completed before returning control to the client. When you request to automatically classify a document during check-in, the check-in action is typically completed before the automatic classification transaction is completed.

Content Platform Engine ensures execution of asynchronous automatic classification requests. Asynchronous automatic classification requests cannot get lost; that is, requests result in either the successful classification of the target document or the recording of an appropriate error.

Security

A document's security and system-generated properties (such as the date and time and initial storage location) are determined by the document's class, or one of the subclasses that a document is initially assigned to before being automatically classified. These settings are not changed when you assign the document to the target document class by using automatic classification.

Asynchronous automatic classification executes with the access permissions of the user that initiates the classification request. For example, if you request a document to be asynchronously automatically classified, the classifier that carries out the operation executes as if it were invoked by you directly. It is capable of gaining access to any object that you have access rights to and prevents access to any object for which you have insufficient access rights.

Classification status

The document object includes a read-only property called Classification Status whose value is automatically assigned by the system. This property indicates the automatic classification status of a document, thereby providing client applications a means of discovering whether a particular document was subject to automatic classification and whether the operation was successful.

Not classified
The document was not submitted for automatic classification.
Pending
The document was submitted for automatic classification and the result is pending.
Failed
The document was submitted for automatic classification but classification failed.
Complete
The document was submitted for automatic classification and classification was successfully completed.