The Knowledge Base Data Sheet summary report provides
information that helps you evaluate and troubleshoot your knowledge
base.
The report includes the following information:
- Number of categories in the knowledge base
- The total number of categories specified in the knowledge base.
- Number of items in the testing set
- The total number of items in the content set used to analyze the
knowledge base.
For example, if your main content
set contained 1,000 items, and you chose an "even/odd" or "first half/second
half" option to create and analyze a knowledge base, the number of
items in your testing set is 500.
- Total cumulative success
- A table that indicates how well the knowledge base suggests the
correct categories. This is useful information if you plan to use IBM® Content
Classification to suggest a configured
number of categories for each incoming item.
For
example, the following Total Cumulative Success table indicates
that for 83.33% of all items, the correct category was among the top
three categories suggested by the Content Classification:
For
a graphical representation of total cumulative success, refer to the
cumulative success summary graph (see Summary graph: Cumulative Success).
- Top performing categories (% of testing set/category
rating)
- A list of the top ten performing categories. For each category,
the following information is displayed:
- % of content set
- The percentage of items in the testing set that belong to the
category. Indicates the "size" of the category.
- category rating
- A percentage value indicating how well each category performed.
Higher values indicate better performance; lower values indicate poor
performance.
- Poorest performing categories (% of testing set/category
rating)
- A list of ten categories with the poorest performance. Possible
reasons for poor performance are that there are too few items in the
training content set or that categories were not distinctly defined.
- Categories with overlapping intents
- A list of five pairs of categories whose intents overlap the most.
You might want to unite two overlapping categories, define the categories
more distinctly, or position the two similar categories under the
same node in the knowledge base.
- Categories that may be determined by external factors
- A list of categories that might be determined by some external
factor, that is, not dependent on natural language processing. For
example, a company's knowledge base has a category that is assigned
to incoming email if the sender doesn't appear in their database.
This category is assigned regardless of the content of the message.
You
can add a field that contains the external data, and write a knowledge
base rule that determines which conditions must be met for the category
to be assigned. For information about rules, see Using rules to differentiate between category sets.