IBM Support

SPSS Command Syntax to delete variables that contain only missing values

Troubleshooting


Problem

I'm using SPSS for Windows and I have a data file with thousands of variables. I'm looking for a way I can automatically or programatically delete all string variables which are blank throughout the entire dataset. In other words, if there are 1000 respondents, then I want to delete all the string variables which are blank for all the 1000 records. I suppose it would be nice to have a way to do this for the numeric or non-string variables as well. Is there any way to do this without scripting?

Resolving The Problem

Yes, there is. This would probably be much easier to do with the SPSS-Python Programmability extension, but it can be performed solely with SPSS command syntax through some creative use of the Output Management System and macros.

NOTE: the following routines assume that you are using at least version 14 because they make use of multiple datasets.

To use the following examples, please create a directory called c:\temp and open the Employee Data.sav sample dataset. Create 2 new variables: 1 string called 'blankstring' and one numeric called 'blanknumeric'. Leave them blank, of course. Save the file as c:\temp\blanktest.sav.

Then, for the deletion of all string variables:

*DeleteEmptyStringVariables.

*the following SET MPRINT command lets you see exactly how the macro
parameters are being passed into the expanded text.
*you may also wish to go to Edit->Options->Viewer and check
'Display commands in the log'.

SET MPRINT = ON.

*first open the file you wish to delete the blank string variables from
and name the dataset.

GET FILE 'c:\temp\blanktest.sav'.
DATASET NAME main.

*The Output Management System (OMS) lets you capture output information
and route it to an external file.
*In this case we want the table result of the DISPLAY DICTIONARY command
to be brought into another SPSS dataset so that we can access the format
information for each variable.

* OMS.
DATASET DECLARE fileinfo.
OMS
/SELECT TABLES
/IF COMMANDS = ["File Information"]
SUBTYPES = ["Variable Information"]
/DESTINATION FORMAT = SAV NUMBERED = TableNumber_
OUTFILE = fileinfo.

DISPLAY DICTIONARY.

OMSEND.

*Now activate the 'fileinfo' dataset and select out those
cases that contain writeformats = Ann, where 'nn' are numbers.
*These will be the string variables in the dataset.


DATASET ACTIVATE fileinfo.

SELECT IF SUBSTR(writeformat,1,1) = 'A' AND INDEX('1234567890',SUBSTR(writeformat,2,1)) > 0.
EXECUTE.

*Now we use the WRITE command to build an external command syntax file
containing lines that will invoke 2 macros per string variable.


WRITE OUTFILE = 'c:\temp\blnkchk.sps'
/1 '!blankcheck ' Var1
/2 '!blankdelete ' Var1
.
EXECUTE.

DATASET CLOSE fileinfo.



*First, we'll define the 'blankcheck' macro to check the variable for blank data and set
another macro value equal to 0 if all blank or * if there were non-blank records.


DEFINE !blankcheck (!POSITIONAL !TOKENS(1))

DATASET ACTIVATE main.
COMPUTE dummy = 1.
COMPUTE flag = 1.
IF !1 = ' ' flag = 0.
EXECUTE.

DATASET DECLARE aggset.
AGGREGATE
/OUTFILE='aggset'
/BREAK=dummy
/blanksum = SUM(flag).

DATASET ACTIVATE aggset.

FORMATS blanksum (F1.0).
WRITE OUTFILE = 'c:\temp\sumdef.sps'
/1 'DEFINE !sumval ()' blanksum ' !ENDDEFINE.'.
EXECUTE.

INCLUDE 'c:\temp\sumdef.sps'.

DATASET ACTIVATE main.
DATASET CLOSE aggset.

!ENDDEFINE.

*Now we define the 'blankdelete' macro to take the 0/* value
created in 'blankcheck' and run a DELETE VARIABLES command
only if it's 0.

DEFINE !blankdelete (!POSITIONAL !TOKEN(1))

!IF ( !EVAL(!sumval) = 0 ) !THEN
DELETE VARIABLES !1.
!IFEND.

!ENDDEFINE.


*And finally, with those two macros now defined, we can invoke them
using the blnkchk.sps command syntax file we wrote earlier.


INCLUDE 'c:\temp\blnkchk.sps'.

**************************************************.

You'll need a slightly different routine to delete the numeric variables, because you'll need a different way to detect a) the numeric variables, and b) whether they are all fully missing (instead of blank).

*DeleteEmptyNonStringVariables.

*the following SET MPRINT command lets you see exactly how the macro
parameters are being passed into the expanded text.
*you may also wish to go to Edit->Options->Viewer and check
'Display commands in the log'.

SET MPRINT = ON.

*first open the file you wish to delete the blank string variables from
and name the dataset.

GET FILE 'c:\temp\blanktest.sav'.
DATASET NAME main.

*The Output Management System (OMS) lets you capture output information
and route it to an external file.
*In this case we want the table result of the DISPLAY DICTIONARY command
to be brought into another SPSS dataset so that we can access the format
information for each variable.

* OMS.
DATASET DECLARE fileinfo.
OMS
/SELECT TABLES
/IF COMMANDS = ["File Information"]
SUBTYPES = ["Variable Information"]
/DESTINATION FORMAT = SAV NUMBERED = TableNumber_
OUTFILE = fileinfo.

DISPLAY DICTIONARY.

OMSEND.

*Now activate the 'fileinfo' dataset and select out those
cases that contain writeformats <> Ann, where 'nn' are numbers.
*These will be the non-string variables in the dataset.


DATASET ACTIVATE fileinfo.

SELECT IF SUBSTR(writeformat,1,1) <> 'A' OR INDEX('1234567890',SUBSTR(writeformat,2,1)) = 0.
EXECUTE.

*Now we use the WRITE command to build an external command syntax file
containing lines that will invoke 2 macros per non-string variable.


WRITE OUTFILE = 'c:\temp\sysmischk.sps'
/1 '!sysmischeck ' Var1
/2 '!sysmisdelete ' Var1
.
EXECUTE.

DATASET CLOSE fileinfo.



*First, we'll define the 'sysmischeck' macro to check the variable for sysmis data and set
another macro value equal to 0 if all sysmis or * if there were non-sysmis records.


DEFINE !sysmischeck (!POSITIONAL !TOKENS(1))

DATASET ACTIVATE main.
COMPUTE dummy = 1.
COMPUTE flag = 1.
IF SYSMIS(!1) flag = 0.
EXECUTE.

DATASET DECLARE aggset.
AGGREGATE
/OUTFILE='aggset'
/BREAK=dummy
/sysmissum = SUM(flag).

DATASET ACTIVATE aggset.

FORMATS sysmissum (F1.0).
WRITE OUTFILE = 'c:\temp\sumdef.sps'
/1 'DEFINE !sumval ()' sysmissum ' !ENDDEFINE.'.
EXECUTE.

INCLUDE 'c:\temp\sumdef.sps'.

DATASET ACTIVATE main.
DATASET CLOSE aggset.

!ENDDEFINE.

*Now we define the 'sysmisdelete' macro to take the 0/* value
created in 'sysmischeck' and run a DELETE VARIABLES command
only if it's 0.

DEFINE !sysmisdelete (!POSITIONAL !TOKEN(1))

!IF ( !EVAL(!sumval) = 0 ) !THEN
DELETE VARIABLES !1.
!IFEND.

!ENDDEFINE.


*And finally, with those two macros now defined, we can invoke them
using the sysmischk.sps command syntax file we wrote earlier.


INCLUDE 'c:\temp\sysmischk.sps'.

DELETE VARIABLES dummy flag.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"18.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

69466

Document Information

More support for:
IBM SPSS Statistics

Software version:
18.0

Document number:
421655

Modified date:
16 April 2020

UID

swg21481480

Manage My Notification Subscriptions