Using Text Analysis Packages
A text analysis package, also called a TAP, serves as a template for text response categorization. Using a TAP is an easy way for you to categorize your text data with minimal intervention since it contains the prebuilt category sets and the linguistic resources that are needed to code a vast number of records quickly and automatically. Using the linguistic resources, text data is analyzed and mined in order to extract key concepts. Based on key concepts and patterns that are found in the text, the records can be categorized into the category set you selected in the TAP. You can make your own TAP or update one.
A TAP is made up of the following elements:
- Category Set(s). A category set is essentially made up of predefined categories, category codes, descriptors for each category, and lastly, a name for the whole category set. Descriptors are linguistic elements (concepts, types, patterns, and rules) such as the term cheap or the pattern good price. Descriptors are used to define a category so that when the text matches any category descriptor, the document or record is put into the category.
- Linguistic Resources. Linguistic resources are a set of libraries and advanced resources that are tuned to extract key concepts and patterns. These extraction concepts and patterns, in turn, are used as the descriptors that enable records to be placed into a category in the category set.
The following tasks are possible with text analysis packages.
- Make text analysis packages. See Making Text Analysis Packages for more information.
- Load text analysis packages. Or you can load an SPSS® Text Analytics for Surveys project (.tas), which will be converted to a text analysis package. See Loading Text Analysis Packages for more information.
- Update text analysis packages. See Updating Text Analysis Packages for more information.
After you select the TAP and choose a category set, SPSS Modeler Text Analytics can extract and categorize your records.