Category table: Stealing/Stolen

In some cases, poor knowledge base performance occurs when categories "steal" content items from each other (that is, when inappropriate categories receive high scores for items). Ideally, IBM® Content Classification should assign high scores to the most applicable categories. When this is not the case, it is important to examine which categories are involved.

For a given category, the Stealing/Stolen table shows the percentage of items initially assigned to the selected category, which yielded higher scores in other categories. In addition, the table displays a list of categories whose content items score higher in the selected category, than in the categories initially assigned to them. The numbers in parentheses (after the percentage values) correspond to the actual number of content items.

Figure 1. Stealing/Stolen Table
A sample table that lists the categories with a higher score, the percentage of content items stolen by the category, and the percentage of content items stolen from the category.

Improving performance

When a category is found to be "stealing" a significant percentage of content items from other categories, it is important to examine the categories in question and identify the cause of this behavior. When categories steal from each other, this is an indication that Content Classification does not perceive a clear difference between the categories. In this case, it would be advisable to carefully redefine each of the categories, categorize items again in the content set, and retrain the knowledge base.

The Stealing/Stolen table provides insight into which categories are perceived to be similar by the Content Classification. In some cases, discovering similar categories and the need to differentiate between them can lead to a decision to create a hierarchical knowledge base (using the Knowledge Base Editor), where both categories are placed under a principal node. See Knowledge Base Editor.