Extension Output node
With the Extension Output node, you can run scripts that are written in R, Python, or Python for Spark to produce outputs.
After you add the node to your canvas, double-click the node to open its properties.
Syntax tab
- Convert flag fields. Specifies how flag fields are treated.
There are two options: Strings to factor, Integers and Reals to double, and
Logical values (True, False). If you select Logical values (True,
False) the original values of the flag fields are lost. For example, if a field has
values
MaleandFemale, these are changed toTrueandFalse. - Convert missing values to the R 'not available' value (NA). When selected, any missing values are converted to the R NA value. The value NA is used by R to identify missing values. Some R functions that you use might have an argument that can control how the function behaves when the data contains NA. For example, the function might allow you to choose to automatically exclude records that contain NA. If this option isn't selected, any missing values are passed to R unchanged, and might cause errors when your R script runs.
- Convert date/time fields to R classes with special control for time
zones When selected, variables with date or datetime formats are converted to R
date/time objects. You must select one of the following options:
- R POSIXct. Variables with date or datetime formats are converted to R POSIXct objects.
- R POSIXlt (list). Variables with date or datetime formats are converted to R POSIXlt objects.
Note: The POSIX formats are advanced options. Use these options only if your R script specifies that datetime fields are treated in ways that require these formats. The POSIX formats don't apply to variables with time formats.
Console Output tab
The Console Output tab contains any output that's received when the R script or Python script runs (for example, if you use an R script, it shows the output that is received from the R console when the R script in the R Syntax field on the Syntax tab runs). This output might include R or Python error messages or warnings that are produced when the R script or Python script is executed. The output can be used, primarily, to debug the script. The Console Output tab also contains the script from the R Syntax or Python Syntax field.
Every time the Extension Import script runs, the content of the Console Output tab is overwritten with the output that is received from the R or Python console. You can't edit the output.
Statistical tests
You can configure the Extension Output node to run statistical tests on your data. The following examples are some of the test that you can run.
To see samples of these tests, you can download the sample stream extension-output-node-str.zip and import it into SPSS Modeler. For more information about importing, see Importing an SPSS Modeler stream. Then, open the Extension Output node properties to see the example syntax.
- T-Tests
-
- Description
-
The t-test determines whether there is a significant difference between the means of the two groups. The test is valuable when you have small sample sizes where the standard deviation for a population is unknown.
- Example scenario
- A pharmaceutical company wants to test if a new drug reduces blood pressure more effectively than the standard treatment. They randomly assign 25 patients to each treatment group, and they measure blood pressure reduction after 30 days. A t-test can determine whether the difference in mean reduction between groups is statistically significant.
- Python libraries and R functions
-
- Python Libraries
-
scipy.stats.ttest_ind()- Independent samples t-test (equal or unequal variances)scipy.stats.ttest_rel()- Paired samples t-testscipy.stats.ttest_1samp()- One-sample t-test
- R Functions
-
t.test()- Comprehensive function for all t-test types with options for paired, one-sample, and two-sample testsvar.test()- Test equality of variances (prerequisite check)
- F-Tests and Analysis of Variance (ANOVA)
-
- Description
-
The F-test is used to compare variances between two or more groups. It forms the foundation of Analysis of Variance (ANOVA), which extends the t-test concept to situations involving three or more groups. The F-test determines whether the variability between group means is significantly greater than the variability within groups.
- Example scenario
-
A retail chain wants to determine if average customer satisfaction scores differ significantly across five store locations. They collect satisfaction ratings from 50 customers at each location. One-way ANOVA would test if location has a significant effect on satisfaction. If significant, post-hoc tests would identify which specific locations differ from each other.
- Python libraries and R functions
-
Python Libraries:
scipy.stats.f_oneway()- One-way ANOVA F-teststatsmodels.formula.api.ols()- Ordinary Least Squares for ANOVA modelsstatsmodels.stats.anova.anova_lm()- ANOVA table generationscipy.stats.levene()- Test for homogeneity of variance
R Functions:
aov()- Analysis of Varianceanova()- ANOVA table for model objectsvar.test()- F-test to compare two variancesTukeyHSD()- Tukey's Honest Significant Difference post-hoc testleveneTest()- Levene's test for homogeneity of variance (car package)
- Z-Tests
-
- Description
-
The Z-test is a statistical hypothesis test. It uses the standard normal distribution (Z-distribution) to determine if there is a significant difference between sample and population parameters. The Z-test is useful when the population standard deviation is known or when sample sizes are large enough to ensure the sampling distribution is approximately normal.
- Example Scenario
-
An online retailer wants to test if a new website design increases the conversion rate from the historical average of 3.5%. After they implement the new design for one week, they observe 2,450 conversions out of 70,000 visitors. A Z-test for proportions can determine whether the observed conversion rate (3.5%) differs significantly from the historical rate.
- Python libraries and R functions
-
Python Libraries:
statsmodels.stats.weightstats.ztest()- Z-test for meansstatsmodels.stats.proportion.proportions_ztest()- Z-test for proportionsscipy.stats.norm.cdf()- Calculate p-values from Z-statistics
R Functions:
BSDA::z.test()- Z-test for means (requires BSDA package)prop.test()- Test for proportions (uses chi-square approximation, equivalent to Z-test for large samples)- Note: Base R doesn't include Z-test; packages like BSDA, TeachingDemos, or custom functions are needed