Transformation commands in IBM SPSS Statistics fall basically into two categories: those that are executed when the flow logic gets to them, and those that are executed immediately when read, i.e., before other transformations are executed.
Transformation commands that define variable properties such as VARIABLE LABELS, VALUE LABELS, and MISSING VALUES are generally executed as soon as they are read. That means that they may be executed out of order compared to the execution of the job stream, and they can't be controlled by transformation logic such as DO IF or DO REPEAT. So you can't define properties conditionally, and you can't iterate over them.
Usually you can ignore this behavior, but here's a case where it matters. Suppose you want to create syntax that will eliminate all the variable labels in a dataset (never mind why you might want to do this - this is just an example). The first try might be this do repeat loop.
But you get this puzzling error message.
Error # 4530. Command name: variable label
>This command is not allowed inside the DO REPEAT/ END REPEAT facility.
That's because metadata commands are not subject to the flow control of a loop. In addition, this is a little ugly, because you had to know the names of the first and last variables (in file order) in order even to write the loop command. So this syntax isn't general.
The second try works, but it's way too ugly to contemplate. I couldn't ever bear to write out the whole command.
It works, because you can separate variable specifications with /, and if there is no label in a section, the label is empty, i.e., removed.
This is too painful to write, but it offers a clue. If we could only generate that slash-separated list automatically, we could feed it to this command. That's where SPSSINC SELECT VARIABLES comes in. This is an extension command available from the SPSS Community (www.ibm.com/developerworks/spssdevcentral if you aren't already reading this from the site) and requires the Python Essentials. This command allows you to define an SPSS macro consisting of a list of variables that meet various criteria. It could be an explicit list, but you could filter on variable type, patterns in names, custom attributes, and/or measurement level. With no filtering, it would select all the variables in the dataset. The macro text replaces the macro name reference when it is encountered in the syntax.
So we're almost there. If I run
spssinc select variables macroname = "!all". (note the quotes)
I get a list of all the variables that I could feed to the VARIABLE LABELS command. But that won't quite work, because I need those slashes to separate the items in the list. This will do the trick.
(It is conventional to start macro names with "!" so that they won't collide with variable names or syntax constructs.)
Now all I have to do is this.
This gives me syntax that will work for any data file regardless of the variable names.
In order to use this command, you have to install the extension command (and the Essentials if you don't already have that), but then you can forget that it is an extension. It works just like the built-in commands. The more you can generalize your syntax, the less work you have to do, and the less the chance of errors creeping in.
A final note: ths example is entirely serendipitous. I created this command for selecting subsets of the variables fof automating analysis tasks. Here it is being used to select all the variables. And I thought the only useful separators between the selected variables would be a blank for ordinary variable lists and + for CTABLES expressions, but here / is required.
Programmability Vs Traditional Syntax in IBM SPSS Statistics: Counting Number of Distinct Values in a Case
Recently this problem was posed on the SPSSX-L listserv (linked on the SPSS Community site): Count the number of distinct values in a set of variables for each case. This led to a lively discussions of alternative solutions. Most used traditional syntax. I used Python programmability.
Datasets sometimes need to be restructured between wide form, where there are multiple measurements on a concept expressed as different variables within a case, and long form, where each repeated measurement is a separate case. Commonly, repeated measures, for example, test scores at different points of time for a subject, are stored in wide form, and IBM SPSS Statistics procedures that focus on repeated measures designs tend to require this form. Most data transformations and simple summary statistics, however, are intended for long form. Thus an easy way to convert between these forms is important.
IBM SPSS Statistics provides the commands CASESTOVARS and VARSTOCASES to convert between these forms. These are used by the Restructure Data Wizard , which appears on the Data menu.
SPSS Statistics has a number of transformation functions that work across variables within a case, such as mean, median, and sum, and the COUNT command that counts occurrences of a particular value, but there is no built-in function for counting distinct values. By converting these data from wide form to long form, the problem can be solved with traditional syntax.
The traditional syntax solution, then, has these steps (mainly worked out by David Marso) The syntax can all be generated using the GUI.
The programmability solution is much simpler. It uses the SPSSINC TRANS extension command along with a two-line Python program to arrive at the same result. Here is the Python program followed by the command syntax.
The program explanation
The Python Essentials and the SPSSINC TRANS extension command can be downloaded from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral if you are not reading this on the site).