Importing Predefined Categories
You can import your predefined categories into IBM® SPSS® Modeler Text Analytics . Before importing, make sure the predefined category file is in an Microsoft Excel (*.xls, *.xlsx) file and is structured in one of the supportive formats. You can also choose to have the product automatically detect the format for you. The following formats are supported:
- Flat list format: See the topic Flat List Format for more information.
- Compact format: See the topic Compact Format for more information.
- Indented format: See the topic Indented Format for more information.
To Import Predefined Categories
- From the interactive workbench menus, choose Categories > Manage Categories > Import Predefined Categories. An Import Predefined Categories wizard is displayed.
- From the Look In drop-down list, select the drive and folder in which the file is located.
- Select the file from the list. The name of the file appears in the File Name text box.
- Select the worksheet containing the predefined categories from the list. The worksheet name appears in the Worksheet field.
- To begin choosing the data format, click Next.
- Choose the format for your file or choose the option to allow the product to
attempt to automatically detect the format. The autodetection works best on the most common
formats.
- Flat list format: See the topic Flat List Format for more information.
- Compact format: See the topic Compact Format for more information.
- Indented format: See the topic Indented Format for more information.
- To define the additional import options, click Next. If you choose to have the format automatically detected, you are directed to the final step.
- If one or more rows contain column headers or other extraneous information, select the row number from which you want to start importing in the Start import at row option. For example, if your category names begin on row 7, you must enter the number 7 for this option in order to import the file correctly.
- If your file contains category codes, choose the option Contains category codes. Doing so helps the wizard properly recognize your data.
- Review the color-coded cells and legend to make sure that the data has been correctly identified. Any errors detected in the file are shown in red and referenced below the format preview table. If the wrong format was selected, go back and choose another one. If you need to make corrections to your file, make those changes and restart the wizard by selecting the file again. You must correct all errors before you can finish the wizard.
- To review the set of categories and subcategories that will be imported and to define how to create descriptors for these categories, click Next.
- Review the set of categories that will be imported in the table. If you do not see the keywords you expected to see as descriptors, it may be that they were not recognized during the import. Make sure they are properly prefixed and appear in the correct cell.
- Choose how you want to handle any pre-existing categories in your
session.
- Replace all existing categories. This option purges all existing categories and then the newly imported categories are used alone in their place.
- Append to existing categories. This option will import the categories and merge any common categories with the existing categories. When adding to existing categories, you need to determine how you want any duplicates handled. One choice (option: Merge) is to merge any categories being imported with existing categories if they share a category name. Another choice (option: Exclude from import) is to prohibit the import of categories if one with the same name exists.
- Import keywords as descriptors is an option to import the keywords identified in your data as descriptors for the associated category.
-
Extend categories by deriving descriptors is an option that will generate
descriptors from the words that represent the name of the category, or subcategory, and/or the words
that make up the annotation. If the words match extracted results, then those are added as
descriptors to the category. This option produces the best results when the category names or
annotations are both long and descriptive. This is a quick method for generating the category
descriptors that enable the category to capture records that contain those descriptors.
- From field allows you to select from what text the descriptors will be derived, the names or categories and subcategories, the words in the annotations, or both.
- As field allows you to choose to create these descriptors in the form of concepts or TLA patterns. If TLA extraction has not taken place, the options of patterns are disabled in this wizard.
- To import the predefined categories into the Categories pane, click Finish.