Summary report: Knowledge Base Data Sheet

The Knowledge Base Data Sheet summary report provides information that helps you evaluate and troubleshoot your knowledge base.

The report includes the following information:

Number of categories in the knowledge base

The total number of categories specified in the knowledge base.

Number of items in the testing set

The total number of items in the content set used to analyze the knowledge base.

For example, if your main content set contained 1,000 items, and you chose an "even/odd" or "first half/second half" option to create and analyze a knowledge base, the number of items in your testing set is 500.

Total cumulative success

A table that indicates how well the knowledge base suggests the correct categories. This is useful information if you plan to use IBM® Content Classification to suggest a configured number of categories for each incoming item.

For example, the following Total Cumulative Success table indicates that for 83.33% of all items, the correct category was among the top three categories suggested by the Content Classification:

The table shows various percentages for the number of items that were correctly classified. For each percentage, the table shows how many categories were suggested to be the correct category.

For a graphical representation of total cumulative success, refer to the cumulative success summary graph (see Summary graph: Cumulative Success).

Top performing categories (% of testing set/category rating)

A list of the top ten performing categories. For each category, the following information is displayed:

% of content set: The percentage of items in the testing set that belong to the category. Indicates the "size" of the category.
category rating: A percentage value indicating how well each category performed. Higher values indicate better performance; lower values indicate poor performance.

Poorest performing categories (% of testing set/category rating)

A list of ten categories with the poorest performance. Possible reasons for poor performance are that there are too few items in the training content set or that categories were not distinctly defined.

Categories with overlapping intents

A list of five pairs of categories whose intents overlap the most. You might want to unite two overlapping categories, define the categories more distinctly, or position the two similar categories under the same node in the knowledge base.

Categories that may be determined by external factors

A list of categories that might be determined by some external factor, that is, not dependent on natural language processing. For example, a company's knowledge base has a category that is assigned to incoming email if the sender doesn't appear in their database. This category is assigned regardless of the content of the message.

You can add a field that contains the external data, and write a knowledge base rule that determines which conditions must be met for the category to be assigned. For information about rules, see Using rules to differentiate between category sets.