IBM SPSS Statistics Version 18 introduced a new variable property: role. The role can be Input, Target, Both, None, Split, or Partition. This new metadata comes from IBM SPSS Modeler and is useful in abstracting and generalizing jobs.
Roles are normally set by the user. Currently, these simply
make initial settings in some dialog boxes. But if the roles are set correctly,
it becomes possible to automate and raise the level of abstraction of repetitive
tasks. For example, you might need to produce a standard set of
analyses/reports across a variety of datasets that have a similar structure
but vary in the exact variables they contain or other details. By abstracting
the logic of a job to use roles, measurement levels, custom attributes and other
variable properties, you can reduce the number of versions of a job that need to be
developed and maintained. This can save time and reduce the number of errors.
seen customer sites where there are huge numbers of job files - syntax,
templates, macros, scripts, etc - that are very similar but duplicated and
modified, because the variables coming in are a little different or the coding
of variables is a little different. Once you build a big set of jobs like this,
making improvements or bug fixes becomes a nightmare, not to mention the extra time it takes to do things this way.
The long-standing macro facility provides some
possibilities for abstraction, but it is static and can't use the metadata in a dataset. In contrast, the SPSSINC SELECT VARIABLES command
allows you to define sets of variables based on the metadata rather than just a
hard-coded list of names. It can use explicit names, patterns in names (all the variable names that contain AGE), measurement level,
type (numeric vs string), custom attributes, and, finally, role, to define sets
of variables that can be used in the job. These sets are embodied in, yes, macros. Of course, you could write your own
code to use this sort of information, but SELECT VARIABLES can do a lot of this
without the need to learn programmability. And it has a dialog box interface as shown here.
For example, suppose you have a mostly standard questionnaire that is used in many
studies, but it has a few custom questions that vary from study to study, or some variables are sometimes omitted. You
need to produce tabulations and estimate similar models for these studies. By
intelligent use of the metadata, including role, you can perhaps have one master
job rather than dozens. This leaves the analyst or researcher free to focus on
the brain work part of the job rather than the tedious mechanical and error prone
parts. If you have a data supplier who collects and prepares your datasets, you can instruct them on what roles and custom attributes should be defined. Then your analysis syntax can at least in part be based on these properties.
Custom attributes, first introduced in SPSS version 14 can also hold metadata such as questionnaire text, interviewer instructions, measurement units, or anything else that is useful in documenting your data or in programmatically manipulating it. In syntax, these can be created with the VARIABLE ATTRIBUTE command. (There is also a DATAFILE ATTRIBUTE command.) Roles can be defined with the VARIABLE ROLE command. Attributes and Roles can also be defined in the Data Editor or the Define Variable Properties dialog. They all persist with the saved data. These can be used in Modeler, too.
In summary, it's all about
generalization and automation. Role is just one more attribute that can be used
in this effort.
SPSSINC SELECT VARIABLES
can be obtained from the SPSS Community and requires the Python Programmability plugin/essentials.