IBM® Content Classification uses a combination of text-based analysis and rule-based analysis to determine how each document is classified.
To classify content according to statistics and text-based analysis, you create one or more knowledge bases. A knowledge base consists of categories that can correspond to folders, document classes, item types, or attributes in the source repository. During run time, the classification process analyzes each document and generates scores (from 0 to 100) for each category in the knowledge base. The scores indicate the relevance of each category to the document. The higher the category score, the more likely that the document belongs in that category.
To classify content according to rules and keyword-based analysis, you create a decision plan. Each rule specifies conditions that might cause a document to be classified (classification triggers) and the classification actions that are to be applied. During run time, the classification process evaluates the document content and metadata (document properties). When conditions in the document cause the rule to be triggered, the associated actions are applied, such as moving the document to a folder. With IBM FileNet® Content Manager content, for example, an action might be to declare the document as a record in IBM Enterprise Records.
To use a knowledge base to classify content, you must associate the knowledge base with a decision plan. A decision plan can refer to one or more knowledge bases for a combination of rule-based and text-based classification. You can create a knowledge base, reference the knowledge base in your decision plan, and define rules that are based on whether the score for a suggested category is above or below a specific confidence threshold.
For example, a new document is analyzed and one of the categories in the knowledge base receives a score of 87, meaning that there is an 87% likelihood that the document belongs in that category. If you configured a rule in your decision plan to move documents to a particular folder when a category's score exceeds a threshold of 80%, the document will be moved to the specified folder when it is classified.