The Categories Pane
The Categories pane is the area in which you can build and manage your categories. This pane is located in the upper left corner of the Categories and Concepts view. After extracting the concepts and types from your text data, you can begin building categories automatically using techniques such as concept inclusion, co-occurrence, and so on or manually. See the topic Building categories for more information.
Each time a category is created or updated, the documents or records can be scored by clicking the Score button to see whether any text matches a descriptor in a given category. If a match is found, the document or record is assigned to that category. The end result is that most, if not all, of the documents or records are assigned to categories based on the descriptors in the categories.
Category Tree Table
The tree table in this pane presents the set of categories, subcategories, and descriptors. The tree also has several columns presenting information for each tree item. The following columns may be available for display:
- Code Lists the code value for each category. This column is hidden by default. You can display this column through the menus: .
- Category. Contains the category tree showing the name of the category and subcategories. Additionally if the descriptors toolbar icon is clicked, the set of descriptors will also be displayed.
- Descriptors. Provides the number of descriptors that make up its definition. This count does not include the number of descriptors in the subcategories. No count is given when a descriptor name is shown in the Categories column. You can display or hide the descriptors themselves in the tree through the menus: .
- Docs After scoring, this column provides the number of documents or records that are categorized into a category and all of its subcategories. So if 5 records match your top category based on its descriptors, and 7 different records match a subcategory based on its descriptors, the total doc count for the top category is a sum of the two-- in this case it would be 12. However, if the same record matched the top category and its subcategory, then the count would be 11.
When no categories exist, the table still contains two rows. The top row, called All Documents, is the total number of documents or records. A second row, called Uncategorized, shows the number of documents/records that have yet to be categorized.
For each category in the pane, a small yellow bucket icon precedes the category name. If you double-click a category , or choose View > Category Definitions in the menus, the Category Definitions dialog box opens and presents all of the elements, called descriptors, that make up its definition, such as concepts, types, patterns, and category rules. See the topic About Categories for more information. By default, the category tree table does not show the descriptors in the categories. If you want to see the descriptors directly in the tree rather than in the Category Definitions dialog box, click the toggle button with the pencil icon in the toolbar. When this toggle button is selected, you can expand your tree to see the descriptors as well.
Scoring Categories
The Docs. column in the category tree table displays the number of documents or records that are categorized into that specific category. If the numbers are out of date or are not calculated, an icon appears in that column. You can click Score on the pane toolbar to recalculate the number of documents. Keep in mind that the scoring process can take some time when you are working with larger datasets.
Selecting Categories in the Tree
When making selections in the tree, you can only select sibling categories -- that is to say, if you select top level categories, you can not also select a subcategory. Or if you select 2 subcategories of a given category, you cannot simultaneously select a subcategory of another category. Selecting a discontiguous category will result in the loss of the previous selection.
Displaying in Data and Visualization Panes
When you select a row in the table, you can click the Display button to refresh the Visualization and Data panes with information corresponding to your selection. If a pane is not visible, clicking Display will cause the pane to appear.
Refining Your Categories
Categorization may not yield perfect results for your data on the first try, and there may well be categories that you want to delete or combine with other categories. You may also find, through a review of the extraction results, that there are some categories that were not created that you would find useful. If so, you can make manual changes to the results to fine-tune them for your particular context. See the topic Editing and Refining Categories for more information.