Extension Output node (SPSS Modeler)

Syntax tab

Select your type of syntax – R, Python, or Python for Spark. Then enter or paste your custom script for outputting data. When your syntax is ready, you can run the node. The following options are available for R syntax:

Convert flag fields. Specifies how flag fields are treated. There are two options: Strings to factor, Integers and Reals to double, and Logical values (True, False). If you select Logical values (True, False) the original values of the flag fields are lost. For example, if a field has values Male and Female, these are changed to True and False.
Convert missing values to the R 'not available' value (NA). When selected, any missing values are converted to the R NA value. The value NA is used by R to identify missing values. Some R functions that you use might have an argument that can control how the function behaves when the data contains NA. For example, the function might allow you to choose to automatically exclude records that contain NA. If this option isn't selected, any missing values are passed to R unchanged, and might cause errors when your R script runs.
Convert date/time fields to R classes with special control for time zones When selected, variables with date or datetime formats are converted to R date/time objects. You must select one of the following options:
- R POSIXct. Variables with date or datetime formats are converted to R POSIXct objects.
- R POSIXlt (list). Variables with date or datetime formats are converted to R POSIXlt objects.
Note: The POSIX formats are advanced options. Use these options only if your R script specifies that datetime fields are treated in ways that require these formats. The POSIX formats don't apply to variables with time formats.

Console Output tab

The Console Output tab contains any output that's received when the R script or Python script runs (for example, if you use an R script, it shows the output that is received from the R console when the R script in the R Syntax field on the Syntax tab runs). This output might include R or Python error messages or warnings that are produced when the R script or Python script is executed. The output can be used, primarily, to debug the script. The Console Output tab also contains the script from the R Syntax or Python Syntax field.

Every time the Extension Import script runs, the content of the Console Output tab is overwritten with the output that is received from the R or Python console. You can't edit the output.

Note: R or Python error messages or warnings that result from running your Extension Output script are always displayed on the Console Output tab.

Statistical tests

You can configure the Extension Output node to run statistical tests on your data. The following examples are some of the test that you can run.

To see samples of these tests, you can download the sample stream extension-output-node-str.zip and import it into SPSS Modeler. For more information about importing, see Importing an SPSS Modeler stream. Then, open the Extension Output node properties to see the example syntax.

T-Tests

Description

The t-test determines whether there is a significant difference between the means of the two groups. The test is valuable when you have small sample sizes where the standard deviation for a population is unknown.

Example scenario

A pharmaceutical company wants to test if a new drug reduces blood pressure more effectively than the standard treatment. They randomly assign 25 patients to each treatment group, and they measure blood pressure reduction after 30 days. A t-test can determine whether the difference in mean reduction between groups is statistically significant.

Python libraries and R functions

Python Libraries

scipy.stats.ttest_ind() - Independent samples t-test (equal or unequal variances)
scipy.stats.ttest_rel() - Paired samples t-test
scipy.stats.ttest_1samp() - One-sample t-test

R Functions

t.test() - Comprehensive function for all t-test types with options for paired, one-sample, and two-sample tests
var.test() - Test equality of variances (prerequisite check)

F-Tests and Analysis of Variance (ANOVA)

Description

The F-test is used to compare variances between two or more groups. It forms the foundation of Analysis of Variance (ANOVA), which extends the t-test concept to situations involving three or more groups. The F-test determines whether the variability between group means is significantly greater than the variability within groups.

Example scenario

A retail chain wants to determine if average customer satisfaction scores differ significantly across five store locations. They collect satisfaction ratings from 50 customers at each location. One-way ANOVA would test if location has a significant effect on satisfaction. If significant, post-hoc tests would identify which specific locations differ from each other.

Python libraries and R functions

Python Libraries:

scipy.stats.f_oneway() - One-way ANOVA F-test
statsmodels.formula.api.ols() - Ordinary Least Squares for ANOVA models
statsmodels.stats.anova.anova_lm() - ANOVA table generation
scipy.stats.levene() - Test for homogeneity of variance

R Functions:

aov() - Analysis of Variance
anova() - ANOVA table for model objects
var.test() - F-test to compare two variances
TukeyHSD() - Tukey's Honest Significant Difference post-hoc test
leveneTest() - Levene's test for homogeneity of variance (car package)

Z-Tests

Description

The Z-test is a statistical hypothesis test. It uses the standard normal distribution (Z-distribution) to determine if there is a significant difference between sample and population parameters. The Z-test is useful when the population standard deviation is known or when sample sizes are large enough to ensure the sampling distribution is approximately normal.

Example Scenario

An online retailer wants to test if a new website design increases the conversion rate from the historical average of 3.5%. After they implement the new design for one week, they observe 2,450 conversions out of 70,000 visitors. A Z-test for proportions can determine whether the observed conversion rate (3.5%) differs significantly from the historical rate.

Python libraries and R functions

Python Libraries:

statsmodels.stats.weightstats.ztest() - Z-test for means
statsmodels.stats.proportion.proportions_ztest() - Z-test for proportions
scipy.stats.norm.cdf() - Calculate p-values from Z-statistics

R Functions:

BSDA::z.test() - Z-test for means (requires BSDA package)
prop.test() - Test for proportions (uses chi-square approximation, equivalent to Z-test for large samples)
Note: Base R doesn't include Z-test; packages like BSDA, TeachingDemos, or custom functions are needed