Selecting Variables

After selecting the data source, the next step is to specify the variables to be imported. Three types of variables can be imported into a project.

Unique ID Variable (Required)

The ID variable is a unique numeric or string key that identifies each respondent. The data file does not need to be ordered by the unique ID variable to successfully read it. After being read into the program, the records can be sorted by various criteria. See the topic Sorting Variables for more information. This ID variable is required to import data. Each imported record (or case) must have a unique ID value.

Two situations will cause the import to fail:

• Duplicate ID values detected

• Records with blank ID values

Note: If a duplicate ID is detected and you have IBM® SPSS® Statistics installed on your computer, you can use the Identify Duplicate Cases procedure in that product to identify duplicates and then use the options to indicate which records should be retained (primary cases).

Open-Ended Text Variable(s) (Required)

The open-ended text variables represent the text responses to the question(s) in the survey. At least one of these variables is required to import data. These variables can be string or long-string variables in SPSS Statistics, columns containing general or text cells in Microsoft Excel, or text or note fields from databases. Each open-ended text variable will be analyzed separately. There is a 4,000-character limit on the size (width) of each text variable imported from a .SAV file.

Reference Variable(s) (Optional)

The reference variables are additional, optional variables, generally categorical, that can be imported for reference purposes. Reference variables are not used in text analysis but provide supplemental information describing the respondent, which may aid understanding and interpretation. Demographic variables are often included as reference variables, since they can contribute to understanding which terms or categories are being used by which groups of individuals. Examples are sex, department, occupation, and course of study (for student and training evaluations). You can view all of the reference variables after importing in the Entire Project view. You can also display reference variables in the Data pane of the Question view. Additionally, you can select reference variables in the bar chart in the visualization pane to be able to drill down to a subset of respondents.

Note: Reference variables read from an SPSS Statistics data file will have variable labels (if supplied) appearing as column headings and their value labels (if supplied) displaying in the cells of the Data pane.

Selecting variables
Selecting variables

To Select Variables and Extraction Options

 From the list of available variables, select the variable that corresponds to the ID variable in your data set and click the arrow button to move it into the Unique ID box. The ID must be a unique number or alphanumeric string that distinguishes one record from another. If your data set contains duplicate IDs, an error message appears. In this case, you must clean your data before trying again.

 From the list of available variables, select one or more variables that correspond to the open-ended response variables and click the arrow button to move the variable(s) into the Open-Ended Text list. The variable(s) will each be imported as a separate question whose responses you will analyze and categorize.

 From the list of available variables, select one or more variables that correspond to the reference variables and click the arrow button to move the variable(s) into the Reference list. Reference variables are not used by the automated category building techniques. However, you can view their content and use them to help you make informed decisions when categorizing your responses.

 To view the variable labels instead of the variable names, click the button below the variable list on the left.

 To change the extraction setting, make a selection in the drop-down list. By default, First question only is selected, which means that if you have selected more than one open ended text variable, the extraction process will start automatically for the first question after the wizard ends. Extraction can take some time with larger data sets. Therefore, you may choose to extract None or All questions depending on the time available.

 Click Next > once you have selected all of your variables.