I have posted to SPSS Developer Central a new Python-based extension command, SPSSINC SUMMARY TTEST, that does t tests when you have only the summary statistics from the samples rather than the full data.Â Besides being useful in its own right, it illustrates some useful techniques in doing programmability computations where you need both scalar computations and some SPSS transformation functions.This command, which is implemented in Python and includes a dialog box interface built with the Custom Dialog Builder, takes as input the counts, means, and standard deviations of the samples and produces several pivot tables with the t test results, confidence intervals, and equal variance test.Â The output includes asymptotic and exact results for both equal variance and unequal variance cases.
The computations are based on standard formulas, but there are a few tactical issues to work out.Â First, the formulas require only standard algebra except that values from the t and F distribution and inverse distribution functions are required.Â Those are not available from the Python standard library, although they are available in some third-party Python libraries.
These are, of course, readily available in the IBM SPSS Statistics transformation system.Â In order to tap the SPSS functions, it is necessary to write a small SPSS dataset with the input values, run some transformation commands on that dataset and retrieve the values.
The dataset tasks are done most easily with the spss Dataset class.Â That, however, has to run within an spss DataStep.Â But the Submit api used to run the transformations cannot be used within a DataStep.Â Furthermore, the output of the procedure consists of pivot tables, and those can only be produced within a StartProcedure...EndProcedure block.Â And StartProcedure cannot be called when a Dataset is active.
So here's the drill:
- Calculate all the scalar quantities needed for the distribution functions
- Start a DataStep and use the Dataset class to populate a tiny dataset
- End the DataStep and Submit the necessary COMPUTE commands
- Start a DataStep and use the Dataset class to retrieve the results
- End the DataStep and start a StartProcedure block
- Produce the pivot tables and close the procedure block
Once you think your way through the constraints, all this takes only a very small amount of code.Â You can look at the source to see the details.Â Although I didn't use it for this example, the spssaux3.py module available from Developer Central includes a function, getComputations, that simplifies the task of getting computational results from SPSS into your Python code.Â It takes as input a set of values and a set of commands and returns a sequence of results.
There is one other interesting issue with implementing this command.Â The command syntax allows for carrying out a sequence of t tests, so all the input parameters can be lists.Â The intermediate calculated values are, therefore, also sequences of values.Â The most straightforward way to process these would simply be to loop the whole process described above.Â While that would work, I wanted to avoid creating and destroying many datasets, and, more importantly, I wanted all the output to appear as a single procedure in the SPSS Viewer.Â That means doing all the preliminary calculations; then generating an SPSS dataset with one row for each test, and then iterating over all those rows to produce the output in a single StartProcedure block.
That gets rather tedious, because you have to first initialize a whole bunch of lists for the intermediate variables, and all the formulas then require subscripts everywhere.Â Ugly.Â Instead, I took advantage of Python's dynamic and flexible class structure.
First, I defined a completely empty class.
Useless, right? But Python allows you to add variables (attributes) to class instances dynamically, so in my loop I could just write
c.var1 = ...
where c is an instance of class C. No tedious list initialization.
Now to deal with the list nature of the computations, in my outer look I also assigned a variable to stand for the list element. So the code starts with this.
c = 
for i in range(numtests):
d = c[i]
Then the computational lines look like this.
d.diff = mean1[i] - mean2[i]
so no subscripting is required on the intermediate results. (I could have packaged up the inputs in the same way but left them subscripted since that relates directly to the inputs).
Nowhere was it necessary to write lengthy definition or initialization code, and the computations are clearer than if they were littered all over with subscripts.
I'd like to acknowledge the assistance of Marta GarcÃa-Granero with the statistical computations and the original inspiration for producing this procedure.
By the way, the test for equal variance is not the Levene test, because that test requires the absolute deviations from the mean, which cannot be computed from the summary statistics.