SPSS Statistics was, as far as I know, the first commercial software to deliver an integration with the R statistical language. It first appeared in SPSS 16, over six years ago at this writing, complementing the Python language integration that first appeared in SPSS 14. This post reviews the rationale and developments in this feature.
R has become the computational language of the statistics profession. It's the way a new statistical algorithm is first published, and the R library contains a vast collection of statistical functions. The R jungle contains many gems, but it has drawbacks, too. This isn't the place to discuss all the good and the bad, but suffice it to say that using R directly imposes a style of doing statistical analysis based on a programming model that does not always suit an analyst, and the output from a R package is usually not in a format suitable for publication. And, while there are various point and click interfaces that can be added on for some R packages, serious usage requires the user to learn the R language, which is not easy.
SPSS Statistics and, as of Version 16, SPSS Modeler, bring to bear the ease of use of these products and their output presentation capabilities that allow a user to work with these products while still tapping the power and packages of R. While a user can write programs in the R language that run within the Statistics or Modeler program, typically the SPSS user takes advantage of R packages that have already been integrated using the published apis and tools for this purpose without the need to learn R or deal with it directly. R packages can extend the statistical capabilities of these products without sacrificing the benefits of SPSS software. The R connection requires an extra installation step (R itself and, via the SPSS Community website, the R Essentials), but all the pieces for this are free. Statistics and Modeler can be a great way to deploy the functionality of R.
Organizations and individuals can do their own, private integrations of R packages, but the SPSS Community site provides a means of sharing integrations with everyone. Instructions for sharing are on the front page of the site. For SPSS Statistics, you can start here to see what has been shared. With Statistics version 22 or later, you can also download and install package integrations from the Utilities menu within Statistics without even visiting the site.
The image also shows extensions implemented in Python.
As of this writing, there are 25 R packages that have been integrated by IBM and 10 contributed by users. Package integrations generally include a dialog box interface produced by the Statistics Custom Dialog Builder and traditional SPSS syntax for the package. They produce their output as SPSS pivot tables and R graphic images that appear in the Statistics Viewer along with other output produced by native SPSS commands. For packages that are included in the R Essentials, the dialog and output are usually translated into all the languages that Statistics itself provides.
For Modeler 16, an adaptation of the Custom Dialog Builder is included, and nodes can build models and provide code to be used with those models for scoring. Using the new Hadoop integration, the scoring can be performed on the Hadoop cluster with big performance benefits. Similar to Statistics, the user of R- based nodes sees the same behavior that comes with native nodes.
Producing a package integration for Statistics is usually easy for someone who knows the R language. It can be as simple as adding a line to fetch the data. Usually the integration will convert plain text R output to one or more pivot tables, and it may create new datasets. That takes a little longer, but it is still typically a few hours to a few days. The apis for all this are covered in detail in the help installed with the R plug-in: Help > Programmability > R Plug-In. Since the package integrations are usually distributed in source form, they can serve as examples. Integration creators can also do translations, but not many are prepared to handle this. The SPSS forums are a good place to ask questions about this technology.
There is a white paper that discusses the benefits and technology of using R with Statistics that provides more details,
In sum, the R integrations for Statistics and Modeler allow access to the large R library but package it in a form that fits in with the native capabilities of these products. It's a win for everyone, and it's all free.