IBM Support

Recoding a categorical SPSS variable into indicator (dummy) variables

Troubleshooting


Problem

What is the SPSS command to transform a nominal variable of n classification groups into a series of n-1 indicator (or "dummy") variables?

Resolving The Problem

Unfortunately, there is no single command to do this. There are several short command sequences that can do it and examples are provided below. Of these, the DO REPEAT approach is somewhat more general, or at least easier if the reference category is not the highest value.

* creating indicator variables.
* most examples below generate indicators from a nominal variable, called cat, that is present in
the active file.

* create 4 indicator variables for categories 1 to 4
of a 5-category variable called cat.

VECTOR nom(4).
LOOP #i = 1 to 4.
COMPUTE nom(#i) = (cat = #i).
END LOOP.
EXECUTE.

* alternatively .
* create 4 indicator variables for categories 2 to 5
of a 5-category variable called cat.

VECTOR ind(4).
LOOP #i = 1 to 4.
COMPUTE ind(#i) = (cat = #i + 1).
END LOOP.
EXECUTE.

* if you wanted to make the first category the reference
category (0 on all indicator vars) with var names reflecting
the original category : .

NUMERIC dum2 to dum5.
VECTOR dumv = dum2 to dum5.
LOOP #i = 1 to 4.
COMPUTE dumv(#i) = (cat = #i + 1).
END LOOP.
EXECUTE.

* creating similar vars as above but using do repeat command.
DO REPEAT iv = indv2 to indv5
/ c = 2 to 5 .
COMPUTE iv = (cat = c).
END REPEAT.
EXECUTE.

With either the LOOP or DO REPEAT approach, you can assign less generic new variable names that do not have a common stem like 'dum' or 'indv', but you need to list out the names in the NUMERIC or DO REPEAT commands.

NUMERIC Arts Medicine Engineering Law.


VECTOR dumv = Arts to Law.
LOOP #i = 1 to 4.
COMPUTE dumv(#i) = (faculty = #i + 1).
END LOOP.
EXECUTE.

DO REPEAT iv = Arts Medicine Engineering Law
/ c = 2 to 5 .
COMPUTE iv = (faculty = c).
END REPEAT.
EXECUTE.

* if reference category were neither first nor last, but 3rd,
DO REPEAT seems handier than VECTOR and LOOP.

DO REPEAT iv = c3i1 c3i2 c3i4 c3i5 / g = 1 2 4 5 .
COMPUTE iv = (cat = g).
END REPEAT.
EXECUTE.

Suppose that you had a string variable named STATE with 2-character state codes and wanted to create dummy variables for 3 of the states. You would need to place quotes around the string values in the stand-in list, as in the following commands. .

DO REPEAT iv = NewYork California Illinois    / g = 'NY' 'CA' 'IL'  .
COMPUTE iv = (state = g).
END REPEAT.
EXECUTE

The new dummy variables -  NewYork, California, and Illinois -  would be numeric indicator variables.

If you wanted to create indicator variables for all of the n values of a categorical variable, then all of the above command sets could be easily adapted to do so. Suppose that you wanted to use these indicator variables as arguments for the SUM() function in an Aggregate procedure. The first command set, for example, would be revised as follows to produce an indicator for each of the 5 categories in the variable CAT. .

VECTOR nom(5).
LOOP #i = 1 to 5.
COMPUTE nom(#i) = (cat = #i).
END LOOP.
EXECUTE.

For SPSS Statistics versions 18 and above, there is also an extension procedure called SPSSINC CREATE DUMMIES. This procedure creates a set of (0,1) indicator variables representing the distinct values of one or more variables. It can also create dummies for two- and three-way interaction terms. Like the syntax command sets above, it is useful in converting categorical variables into a set of variables appropriate for use in the Regression procedure. It can optionally create macro variables representing sets of dummies. It requires that the Python essentials for SPSS be installed with the program and that the Python language be installed on your computer. (SPSS Statistics versions 22 and above install the Python Essentials and a version of Python by default as part of the SPSS installation.). Once it is installed, the procedure is available from the menu

Transform->Create Dummy Variables.

Learn more about Python extension procedures by opening Help->Topics and entering the search keywords Python and Extension.

[{"Product":{"code":"SSLVMB","label":"SPSS Statistics"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

17482

Document Information

Modified date:
16 April 2020

UID

swg21476169