Using Content Classification to classify documents

IBM® Content Classification is an enterprise platform for a wide range of applications that require unstructured content to be automatically categorized. For more information see the IBM Content Classification information center.

About this task

IBM Content Classification automates the organization of unstructured content by analyzing the full text of documents and email and applying rules. When Content Collector is integrated with IBM Content Classification version 8.6 or higher, you can leverage the classification schema available in an existing Content Classification knowledge base for use with your repository and thus use a knowledge base to classify documents according to their content. With IBM Content Classification version 8.7 or higher, you can benefit from decision plans that use a combination of rules and classification schema to analyze content or to associate digital content against multiple knowledge bases. You can use this to extract metadata from the documents, for example.

Content Classification provides a classification or additional metadata, which can then be used in Content Collector, for example to determine if an item should be captured or not or how the item should be processed as it is captured and filed. With Content Classification, you can use advanced classification rather than relying solely on document metadata or source location metadata.

For example, to meet compliance and record management initiatives, or to prepare collections of data for legal discovery, Content Classification can distinguish important email from email that has no business value. Based on the content analysis, Content Collector determines the appropriate action to be taken. An email that discusses a patent application might be copied to a FileNet® P8f older and be declared as a record in IBM Enterprise Records. To contrast, an email that discusses patent leather might be filtered out and not archived.

If you use a Content Classification decision plan to classify your documents, Content Classification can provide further data in addition to a categorization. You can access all metadata fields that are filled by Content Classification and use them in Content Collector.

For example, if Content Classification extracts a reference number from all documents in one category, this reference number can be used in Content Collector, for example to file a document into a specific folder for this reference number.

To leverage Content Classification in Content Collector, add an IBM Content Classification task to any task route. You can configure the task to use a Content Classification knowledge base or a Content Classification decision plan.

When you add the IBM Content Classification task to a Microsoft SharePoint task route that is configured to process a version series, add a decision point and rule that routes only the last version of the document into the IBM Content Classification task. Having the IBM Content Classification task process all of the versions of a document results in redundant processing and might yield unexpected results later in the task route.

Restriction:

When Microsoft SharePoint list items are processed by the IBM Content Classification task, list item attachments are not included.
For IBM Connections, only documents from the Files application are compatible with Content Classification. Configure your task routes in a way that ensures that only Files documents are passed to the IBM Content Classification task.