IBM® SPSS® can talk to R. It's something of a wellkept secret,
judging from the low level of activity in the R blogosphere on this point.
The low level of interest is not surprising: SPSS users are, more often
than not, people who use only SPSS for their data analysis; and R users
are accustomed to applying ugly hacks as part of doing business with R. An
R user who wants to analyse data in .sav format typically opens the file
in SPSS, saves it to commaseparated values (CSV) format, and opens the
result in R by using the read.csv()
method. A cleaner way is
to save to SPSS Statistics Portable (POR) format from SPSS and open the
result by using the read.spss()
method from the
foreign
library. This method usually works, in the sense that
only a few dozen lines of R code are then required to cope with
categories, missing values, time variables, and other features that are
either lost or damaged in translation. If you need to return data from R
back to SPSS, the return journey is more awkward.
Tedious data manipulation notwithstanding, you can certainly work both applications without a plugin to connect them. Is the effort of learning the plugin worth the gain in productivity? Is there a gain in productivity, or are the advantages of a different sort? To these questions, I would answer Yes and Yes. Translating from one data format to another is always tricky and time consuming. When you use R from SPSS, you can apply R functions to SPSS data while you maintain the integrity of the original database. Using R from SPSS allows you to apply R functions to SPSS data while you maintain the integrity of the original database.
A further advantage to using the R integration plugin. Where R and SPSS are both used on the same data, use of the R integration plugin fosters reproducible research.
Reproducible research
Reproducible research is mainly an organizational principle. Given the original data file and the syntax file, it is possible to recreate every step of the analysis from these two files. Months later, if you need to return to the problem with additional data or a new analysis, it is possible to rebuild the original project. With SPSS, you can maintain a record of every procedure that is run on the data, be it a transformation of the data, the creation of new variables, or an analysis. If R is to play a role in the analysis, either as an assist in recoding variables or to supply a function not currently available in SPSS, maintaining both SPSS and R syntax in the same syntax file has value. You can run SPSS and R code from the same SPSS syntax file and apply it to the same database. Everything stays together.
Extending the functionality of SPSS
In a previous article, I argued that data analysts should learn R. Briefly, most advances in statistics appear first as R packages before they are added to the dropdown menus. R gives the SPSS user more tools for the job, and although you might implement these tools outside of SPSS by exporting the data, data export is never seamless. With the R plugin, you retain all the features of an SPSS database, particularly the labels of category data and the long descriptors.
R extensions
SPSS allows you to create more menu items and add them to the existing menu bar. In particular, R functions can be bundled as extensions and supplied to you through the menu. You can implement a function in R with no knowledge of R programming. Writing extensions goes beyond the scope of this article, but they are an important reason to learn to use the R plugin. Through this plugin, you can supply R functions to SPSS users who are unfamiliar with R.
Finding and installing the plugin
Installing the plugin is fairly straightforward, but the process does contain a few hurdles. For one thing, you must start several pages before the actual download page. You need to register with IBM developerWorks, if you are not already. It's free.
Another hurdle in the installation is that the plugin works with only one version of R, not necessarily the current one. Which version of R you need depends on the version of SPSS you are running. Unfortunately, the download page does not specify. However, for SPSS version 22, use R2.15. For SPSS version 21, use R2.14.0.
Be warned that the R integration plugin is specific about the R version. For SPSS version 21, for example, you must install R2.14.0. If you install 2.14.1 or 2.14.2, it will not work. During the installation process, the plugin looks for a folder that contains the correct version of R. For example, if you use SPSS version 21 on Windows®, it looks for C:\Program Files\R\R2.14.0. The installer queries you for the location of R if it can't find the folder that it wants. From this query, you can infer the precise version of R you need:
 Obtain the appropriate version of R from rcran, then download and
install it.
If you already use a different version of R and you want to keep it as your default, be sure to clear the Store version number in registry check box. If you want to install R packages to run with SPSS, you need to install them from the version of R that SPSS uses. R packages that are downloaded for the current version are invisible to the R integration package.
 To find the plugin for download, click Help > Working with R from the menu bar in SPSS to reach the opening page.
 Midway down the page, click the link for SPSS plugins.
SPSS has many plugins, but select the one for R. This link brings you to the login screen for IBM downloads.
 On the login page, log in or register (it's free). Proceed to the download page.
 Each version of SPSS has its own plugin. Find the one for your
version, download it, and install it.
At this stage, if you don't have the correct version of R installed, you see a message that the installer can't find it. Install a different version, and try again.
 If installation is successful, the installer displays a large
documentation file.
With the installation of the plugin, this file is available from the SPSS Help menu under Programmability > R plugin. The Working with R menu command now points to more documentation and tutorials.
Using R from SPSS
The R integration plugin does two things: It opens communication between SPSS and R, and it provides R with a package of functions with which to translate SPSS data structures into R objects.
Hello R!
Open a syntax file, and type the following lines. Select and run the command by clicking the green arrow:
BEGIN PROGRAM R. cat("\t\tHello R!\n") END PROGRAM.
The line BEGIN PROGRAM R.
launches R and loads the requisite
library of data management functions. It also sets several option
variables for R that override any options that you might set in your
.First()
function.
The first and last lines here follow the conventions of SPSS syntax code and end with a period (.). All code between those two lines is interpreted as R code and must obey the rules of R syntax, so no period marks the end of a line.
When SPSS meets the END PROGRAM.
statement, it interprets
subsequent commands as SPSS syntax, but it does not quit the R session.
Any variables that an R chunk creates are available to subsequent R chunks
during the SPSS session.
Reading data into R and returning changes to SPSS
R chunks that are called from SPSS can read and write data from external sources in the usual way. But if you run R from SPSS, it's because you want access to an SPSS database. I created a simple test database to illustrate different data types, available with the downloads. Consider the lines in Listing 1.
Listing 1. Read and write a database
BEGIN PROGRAM R. # Pull the data into a data frame testData = spssdata.GetDataFromSPSS() # Pull the data dictionary into another data frame testDict = spssdictionary.GetDictionaryFromSPSS() # Take a look print(testData) print(testDict) # Check what data types the variables of the R data frame have lapply(testData, class) # Set up a new SPSS database with the same dictionary spssdictionary.SetDictionaryToSPSS("Test2",testDict) # Copy the data to the new SPSS database spssdata.SetDataToSPSS("Test2", testData) # Tell SPSS you're done creating data spssdictionary.EndDataStep() END PROGRAM.
When you run this code, the output in Listing 2 should appear in an SPSS output file.
Listing 2. Output reading and writing a database
CustName Age Rating Date Weight 1 Mary 21 1 13594608000 55.2 2 John 45 3 13594694400 73.4 3 Henry 33 2 13563244800 80.0 X1 X2 X3 X4 X5 varName CustName Age Rating Date Weight varLabel Customer Name Age Customer rating Date of first trans Weight varType 20 0 0 0 0 varFormat A20 F8 F6 ADATE10 F5.1 varMeasurementLevel nominal scale ordinal scale scale $CustName [1] "factor" $Age [1] "numeric" $Rating [1] "numeric" $Date [1] "numeric" $Weight [1] "numeric"
What just happened?
The great strength of SPSS as a data vault lies in the detailed data dictionary that you can create. You can store some of this information—variable types and names— as class and variable names in an R data frame but not without some loss of detail. The R integration plugin lets you create two data frames from the active SPSS data set: one for the data and one for the data dictionary.
Data conversion from SPSS to R
Look at each variable in turn from the test database and see what happens when it is read into R:
CustName
. This variable is a string variable of length 20 in SPSS, nominal type. It becomes a factor in R.Age
. This variable is numeric in SPSS, scale type, of length 6 with no decimals. It becomes numeric in R.Rating
. This variable is numeric of type ordinal. The numeric codes were given descriptive labels in SPSS that are lost in translation. (For more about categorical data, see Working with categories.)Date
. This variable is a date, formatted ddmmmyyyy. It becomes numeric in R. (For more about dates, see Working with dates.)Weight
. A numeric variable that is formatted in SPSS to have one decimal. It becomes numeric in R.
The data dictionary
The data dictionary can be imported to a data frame in R, as shown in Listing 1. You don't need this dictionary to work on the data in R, but you do need to build a data dictionary to create an SPSS database. The data dictionary is a data frame of character vectors. It has one column for each variable of the SPSS database and one row for each entry in the dictionary. As you can see from the example in Listing 2, a range of format types is available. The complete list is given in the documentation for the R plugin.
Working with dates
R integration function spssdictionary.GetDictionaryFromSPSS()
,
with no arguments, transforms dates into numbers. The number that you get
is the elapsed time in seconds from midnight, 10 October 1582.
To convert the date variable for use in R, I might add
testData$Date = as.POSIXlt(testData$Date, origin="15821010")
.
Alternatively, I can take advantage of a useful argument of the
GetDataFromSPSS()
function (see Listing
3).
Listing 3. Reading dates from SPSS into R
BEGIN PROGRAM R. # Pull the data into a data frame adjusting for dates testData = spssdata.GetDataFromSPSS(rDate="POSIXct") testDict = spssdictionary.GetDictionaryFromSPSS() print(testData) END PROGRAM. CustName Age Rating Date Weight 1 Mary 21 1 20130731 55.2 2 John 45 3 20130801 73.4
Writing time data to SPSS
The example in Listing 4 shows how to write datetime data back to SPSS from R. File IBM.csv contains a record of NYSE stock market data for IBM stock, obtained from the wellknown finance site on Yahoo.com. Here you see the first few lines of data, reading back from 8 August 2013.
Listing 4. Writing dates from R to SPSS
Date Open High Low Close Volume Adj Close 28/08/2013 182.68 183.47 181.1 182.16 3979200 182.16 27/08/2013 183.63 184.5 182.57 182.74 3179300 182.74 26/08/2013 185.27 187 184.68 184.74 2170400 184.74 23/08/2013 185.34 185.74 184.57 185.42 2292700 185.42 22/08/2013 185.65 186.25 184.25 185.19 2354300 185.19 21/08/2013 184.67 186.57 184.28 184.86 3551000 184.86
I can read the data into SPSS, but the date format is not a format that the SPSS datetime wizard supports. R to the rescue! Using R syntax from SPSS, I can open the file from R, convert the date to an appropriate format, and create an SPSS database with the results. Here are the steps:
 The default working directory for the R integration plugin is somewhere deep in the SPSS program directory tree. That's not what you want. Set the working directory to the location of your data file so that R can find it.
 These lines of code read in the dates, in character format, and convert them to Portable Operating System Interface for UNIX® (POSIX) format, with the correct starting date of 10 October 1582.
 The
spssdictionary.CreateSPSSDictionary()
function automates some features of building up the data dictionary. FormatDATE11
invokes date format 28Aug2013.  Create the database and populate it.
Listing 5 shows how to carry out these steps.
Listing 5. Reading data directly into R and creating an SPSS database from them
BEGIN PROGRAM R. # Set the working directory setwd("C:\\Users\\Catherine\\SPSSWork") # (1) IBM = read.csv("IBM.csv", header=TRUE, stringsAsFactors=FALSE) PosixDate = as.POSIXct(strptime(IBM$Date, format="%d/%m/%Y") , format="dbY",origin="15821010") # (2) IBM.spss = data.frame(Date=PosixDate, IBM[,1]) head(IBM.spss) # Create the data dictionary (3) IBM.dict = spssdictionary.CreateSPSSDictionary(c("Date","Trading date", "0", "DATE11","scale"), c("Open","Opening price","0","F8.2","scale"), c("High","High price","0","F8.2","scale"), c("Low","Low price","0","F8.2","scale"), c("Close","Closing price","0","F8.2","scale"), c("Volume","Trading volume","0","F8.2","scale"), c("AdjClose","Adjusted closing","0","F8.2","scale") ) # Create the new database (4) spssdictionary.SetDictionaryToSPSS("IBM",IBM.dict) spssdata.SetDataToSPSS("IBM",IBM.spss) spssdictionary.EndDataStep() END PROGRAM.
Working with categories
My simple example did not handle the categorical variable
Rating
at all well. R got the numeric codes for that variable
but not the descriptive labels for the different levels the variable might
take: Poor
, Average
, and Excellent
.
You can do something about that issue. The factorMode
argument
that is shown in Listing 6 imports category levels
instead of numeric values.
Listing 6. The factorMode argument
BEGIN PROGRAM R. testData = spssdata.GetDataFromSPSS(rDate="POSIXct", factorMode="labels") testDict = spssdictionary.GetDictionaryFromSPSS() print(testData) END PROGRAM. CustName Age Rating Date Weight 1 Mary 21 Poor 20130731 55.2 2 John 45 Excellent 20130801 73.4 3 Henry 33 Average 20120802 80.0
Building a dictionary for categorical variables
The factorMode
argument gives me a choice, depending on
whether I want numeric codes or values for a categorical variable. But I
need more if I want to create an SPSS database with categorical data. The
solution lies in adding further structure to the data dictionary. The
example in Listing 7 illustrates how to build an SPSS
database from an R data frame with factors.
The famous iris data set is bundled with base R. It is a data frame with four numeric variables and one factor, denoting one of three species of iris. To build a database in SPSS, I complete the following steps:
 Create a data dictionary for the iris data.
This dictionary is a data frame of five columns (one for each variable of the iris set).
 Create a category dictionary for the factor.
The R structure here is complex. It is a list of length 2. The first component contains the names of the factors. The second component is a list of lists. Each item is a list of length 2: one component for the numeric codes and one component for their labels.
 Begin creation of an SPSS database by "setting" the data and category dictionaries.
 Populate the database.
 End the data step.
 Run the code.
Doing so creates a database in SPSS but does not save it to disk. The active database remains whatever it was.
Listing 7. Building a dictionary for categorical data
BEGIN PROGRAM R. data(iris) head(iris) iris.dict = vector(mode="list", length=5) # Name the columns names(iris.dict) = paste("X", 1:5, sep="") # Fill in the numeric variables for(i in 1:4){ iris.dict[[i]] = c(names(iris)[i],"","0","F3.2","scale") } # # Fill information for the category iris.dict[[5]] = c("Species","Species of Iris","0","F3","nominal") # Square it off and add row names iris.dict = data.frame(iris.dict) row.names(iris.dict) = c("varName","varLabel","varType","varFormat", "varMeasurementLevel") # # Now build the category dictionary iris.cat = vector(mode="list",length=2) names(iris.cat) = c("name","dictionary") iris.cat$name = "Species" # Note that the dictionary is a list of lists # With only one category, the first list has length 1 # The dictionary list contains two lists # iris.cat$dictionary = vector(mode="list", length=1) iris.cat$dictionary[[1]] = list(levels=c(1,2,3), labels=levels(iris$Species)) # # Now build the SPSS database. spssdictionary.SetDictionaryToSPSS("Iris", iris.dict, iris.cat) spssdata.SetDataToSPSS("Iris", iris, iris.cat) spssdictionary.EndDataStep() END PROGRAM.
Conclusion
The R integration package contains many functions to provide a seamless transfer from SPSS to R. For instance, SPSS allows greater flexibility in defining missing values than R. The R integration package contains functions for managing missing values so that nothing is lost in passing from SPSS to R and back again. Another important feature is the ability to create SPSS extensions that use R. Menu items can be added to the Analysis menu that enable R functions to be run on the active data set without needing to write explicit code in a syntax file. In this way, you can make R functionality available to users who have no knowledge of R. The R integration package has a lot to offer data analysts who use both SPSS and R.
Download
Description  Name  Size 

Sample R code for this article  Rcodeexamples.zip  33KB 
Resources
Learn
 See The Comprehensive R Archive Network, the main site for the R
project and each R package. The help pages and manuals that are associated
with
optimx
,nlmrt
, andRcgmin
are detailed. Numerous references are provided.  Read Do I need to learn R? (Catherine Dalzell, developerWorks, September 2013) to learn why R is a valuable tool for data analytics that was expressly designed to reflect the way that statisticians think and work.
 Find the resources that you need to improve outcomes and control risk in the developerWorks Business analytics content area.
 Learn more about big data in the developerWorks big data content area. Find technical documentation, howto articles, education, downloads, product information, and more.
 Follow developerWorks on Twitter.
 Watch developerWorks ondemand demos ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.
Get products and technologies
 Download the R plugin for SPSS.
 Learn more about IBM SPSS Statistics.
Discuss
 Be sure to check out the developerWorks SPSS community.
 Join the developerWorks community, a professional network and unified set of community tools for connecting, sharing, and collaborating.
Comments
Dig deeper into Big data and analytics on developerWorks

Bluemix Developers Community
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.

Big data and analytics
Crazy about Big data and analytics? Sign up for our monthly newsletter and the latest Big data and analytics news.

DevOps Services
Software development in the cloud. Register today to create a project.

IBM evaluation software
Evaluate IBM software and solutions, and transform challenges into opportunities.