Using Text Analysis Packages

A text analysis package, also called a TAP, serves as a template for text response categorization. Using a TAP is an easy way for you to categorize your text data with minimal intervention since it contains the prebuilt category sets and the linguistic resources that are needed to code a vast number of records quickly and automatically. Using the linguistic resources, text data is analyzed and mined in order to extract key concepts. Based on key concepts and patterns that are found in the text, the records can be categorized into the category set you selected in the TAP. You can make your own TAP or update one.

A TAP is made up of the following elements:

  • Category Set(s). A category set is essentially made up of predefined categories, category codes, descriptors for each category, and lastly, a name for the whole category set. Descriptors are linguistic elements (concepts, types, patterns, and rules) such as the term cheap or the pattern good price. Descriptors are used to define a category so that when the text matches any category descriptor, the document or record is put into the category.
  • Linguistic Resources. Linguistic resources are a set of libraries and advanced resources that are tuned to extract key concepts and patterns. These extraction concepts and patterns, in turn, are used as the descriptors that enable records to be placed into a category in the category set.

The following tasks are possible with text analysis packages.

After you select the TAP and choose a category set, SPSS Modeler Text Analytics can extract and categorize your records.

Note: TAPs can be created and used interchangeably between SPSS Text Analytics for Surveys and SPSS Modeler Text Analytics . However, note that scoring on rules might be different in SPSS Modeler Text Analytics depending on whether you load a text analysis package (TAP) from SPSS Modeler Text Analytics directly, or whether you load a TAP from IBM® SPSS Text Analytics for Surveys . We recommend that you use TAPs that are made within SPSS Modeler Text Analytics ; this is because TAPs that are made in IBM SPSS Text Analytics for Surveys might be created by using a different version of the linguistic resources.