Once we decided on the strategy of embedding an existing programming language, we had to pick one.
- It had to be licensable on acceptable terms, preferably open source.
- It had to be well suited to the functionality we wanted to provide.
- It had to be easy to learn but not limiting.
- It had to meet various technical requirements for embedding.
- It had to be available on all our target platforms - both client and server.
- It should be reasonably familiar to the SPSS audience or in the domain of statistical analysis.
So we picked -- R. As it turned out, though, R didn't really meet the first three criteria. Licensing was a problem (overcome in 16 when we integrated it in an architecturally different way). It was well suited for statistical calculation, but it wasn't a good fit for interacting with and controlling SPSS, and it certainly wasn't easy to learn.
Python, though, met allof the criteria except familiarity. There is an index of popularity of programming languages called the TIOBE index. It tracks the top 100(!) languages, and Python is currently number 7 on the list. If you restrict the choices to scripting languages, it is number 3 and is classified in the A group. Furthermore, Python has been gaining popularity in the scientific and statistical computing community. There is a large open source support community around it, and there are many third-party libraries in a wide range of domains.
The other obvious choice would have been Visual Basic. VB is second among scripting languages on the TIOBE index, and it is familiar to many SPSS users through the SaxBasic scripting facility, but it isn't very cross platform, and, frankly, it just isn't a great language. But we did decide to add .NET support on Windows, particularly for Visual Studio developers, and that enables VB.NET.
My personal opinion is that Python is a truly great language. It's easy to learn, very flexible, coherent, and very expressive. It has a coherence and generality rare in languages due to its control by one very gifted language designer working with an active and mature open source community. It has been around for almost 20 years. And there are excellent IDEs both commercial and free.
The downside is that Python is very different from traditional SPSS syntax. They come from different motivations and purposes, and it can be jarring to go back and forth between them. I've done consulting and training, though, with many users and organizations by now, and those who have been willing to invest some effort in mastering this technology have been very successful and satisfied. By opening SPSS, the product, up in this way, users can control and extend it in a way never before possible.
Many in the SPSS user community, though, have yet to make this investment. Python is a programming language, and SPSS users tend not to be programmers. For non-programmers and those who want a single application language, we created the extension command mechanism. Next time I will write about that.
When we started, SPSS was a little ahead of the curve for once, but we think that Python has passed the tipping point and has become widely accepted.