Extending categories
Extending is a process through which descriptors are added or enhanced automatically to 'grow' existing categories. The objective is to produce a better category that captures related records or documents that were not originally assigned to that category.
The automatic grouping techniques you select will attempt to identify concepts, TLA patterns, and category rules related to existing category descriptors. These new concepts, patterns, and category rules are then added as new descriptors or added to existing descriptors. The grouping techniques for extending include concept root derivation, concept inclusion, semantic networks (English only), and co-occurrence rules. The Extend empty categories with descriptors generated from the category name method generates descriptors using the words in the category names, therefore, the more descriptive the category names, the better the results.
Extending is a great way to interactively improve your categories. Here are some examples of when you might extend a category:
- After dragging/dropping concept patterns to create categories in the Categories pane
- After creating categories by hand and adding simple category rules and descriptors
- After importing a predefined category file in which the categories had very descriptive names
- After refining the categories that came from the TAP you chose
You can extend a category multiple times. For example, if you imported a predefined category file with very descriptive names, you could extend using the Extend empty categories with descriptors generated from the category name option to obtain a first set of descriptors, and then extend those categories again. However, in other cases, extending multiple times may result in too generic a category if the descriptors are extended wider and wider. Since the build and extend grouping techniques use similar underlying algorithms, extending directly after building categories is unlikely to produce more interesting results.
- If you attempt to extend and do not want to use the results, you can always undo the operation (Edit > Undo) immediately after having extended.
- Extending can produce two or more category rules in a category that match exactly the same set of documents since rules are built independently during the process. If desired, you can review the categories and remove redundancies by manually editing the category description. See the topic Editing Category Descriptors for more information.
To extend categories
- In the Categories pane, select the categories you want to extend.
- From the menus, choose Categories > Extend Categories. Unless you have chosen the option to never prompt, a message box appears.
- Choose whether you want to build now or edit the settings first.
- Click Extend Now to begin extending categories using the current settings. The process begins and a progress dialog appears.
- Click Edit to review and modify the settings.
After attempting to extend, any categories for which new descriptors were found are flagged by the word Extended in the Categories pane so that you can quickly identify them. The Extended text remains until you either extend again, edit the category in another way, or clear these through the context menu.
Each of the techniques available when building or extending categories is well suited to certain types of data and situations, but often it is helpful to combine techniques in the same analysis to capture the full range of documents or records. In the interactive workbench, the concepts and types that were grouped into a category are still available the next time you build categories. This means that you may see a concept in multiple categories or find redundant categories.
The following areas and fields are available within the Extend Categories: Settings dialog box:
Extend with. Select what input will be used to extend the categories:
- Unused extraction results. This option enables categories to be built from extraction results that are not used in any existing categories. This minimizes the tendency for records to match multiple categories and limits the number of categories produced.
- All extraction results. This option enables categories to be built using any of the extraction results. This is most useful when no or few categories already exist.
Grouping techniques
For short descriptions of each of these techniques, see Advanced linguistic settings . These techniques include:
- Concept root derivation
- Semantic network (English text only, and not used if the Generalize only option is selected.)
- Concept inclusion
- Co-occurrence and Minimum number of docs suboption.
A number of types are permanently excluded from the semantic networks
technique since those types will not produce relevant results. They include
<Positive>
, <Negative>
, <IP>
,
other non linguistic types, etc.
Maximum search distance Select how far you want the techniques to search before producing categories. The lower the value, the fewer results you will get—however, these results will be less noisy and are more likely to be significantly linked or associated with each other. The higher the value, the more results you might get—however, these results may be less reliable or relevant. While this option is globally applied to all techniques, its effect is greatest on co-occurrences and semantic networks.
Prevent pairing of specific concepts. Select this checkbox to stop the process from grouping or pairing two concepts together in the output. To create or manage concept pairs, click Manage Pairs... See the topic Managing Link Exception Pairs for more information.
Where possible: Choose whether to simply extend, generalize the descriptors using wildcards, or both.
- Extend and generalize. This option will extend the
selected categories and then generalize the descriptors. When you choose to generalize, the product
will create generic category rules in categories using the asterisk wildcard. For example, instead
of producing multiple descriptors such as
[apple tart + .]
and[apple sauce + .]
, using wildcards might produce[apple * + .]
. If you generalize with wildcards, you will often get exactly the same number of records or documents as you did before. However, this option has the advantage of reducing the number and simplifying category descriptors. Additionally, this option increases the ability to categorize more records or documents using these categories on new text data (for example, in longitudinal/wave studies). - Extend only. This option will extend your categories without generalizing. It can be helpful to first choose the Extend only option for manually-created categories and then extend the same categories again using the Extend and generalize option.
- Generalize only. This option will generalize the
descriptors without extending your categories in any other way.Note: Selecting this option disables the Semantic network option; this is because the Semantic network option is only available when a description is to be extended.
Other options for extending categories
In addition to selecting the techniques to apply, you can edit any of the following options:
Maximum number of items to extend a descriptor by. When
extending a descriptor with items (concepts, types, and other expressions), define the maximum
number of items that can be added to a single descriptor. If you set this limit to 10, then no more
than 10 additional items can be added to an existing descriptor. If there are more than 10 items to
be added, the techniques stop adding new items after the tenth is added. Doing so can make a
descriptor list shorter but doesn't guarantee that the most interesting items were used first. You
may prefer to cut down the size of the extension without penalizing quality by using the
Generalize with wildcards where possible option. This option only applies to
descriptors that contain the Booleans &
(AND) or !
(NOT).
Also extend subcategories. This option will also extend any subcategories below the selected categories.
Extend empty categories with descriptors generated from the category name. This method applies only to empty categories, which have 0 descriptors. If a category already contains descriptors, it will not be extended in this way. This option attempts to automatically create descriptors for each category based on the words that make up the name of the category. The category name is scanned to see if words in the name match any extracted concepts. If a concept is recognized, it is used to find matching concept patterns and these both are used to form descriptors for the category. This option produces the best results when the category names are both long and descriptive. This is a quick method for generating category descriptors, which in turn enable the category to capture records that contain those descriptors. This option is most useful when you import categories from somewhere else or when you create categories manually with long descriptive names.
Generate descriptors as. This option only applies if the preceding option is selected.
- Concepts. Choose this option to produce the resulting descriptors in the form of concepts, regardless of whether they have been extracted from the source text.
- Patterns. Choose this option to produce the resulting descriptors in the form of patterns, regardless of whether the resulting patterns or any patterns have been extracted.