Column statistics content model and pairwise statistics content model
The column statistics content model provides access to statistics that can be computed for each field (univariate statistics). The pairwise statistics content model provides access to statistics that can be computed between pairs of fields or values in a field.
The possible statistics measures are:
Count
UniqueCount
ValidCount
Mean
Sum
Min
Max
Range
Variance
StandardDeviation
StandardErrorOfMean
Skewness
SkewnessStandardError
Kurtosis
KurtosisStandardError
Median
Mode
Pearson
Covariance
TTest
FTest
Some values are only appropriate from single column statistics while others are only appropriate for pairwise statistics.
Nodes that will produce these are:
- Statistics node produces column statistics and can produce pairwise statistics when correlation fields are specified
- Data Audit node produces column and can produce pairwise statistics when an overlay field is specified.
- Means node produces pairwise statistics when comparing pairs of fields or comparing a field's values with other field summaries.
Which content models and statistics are available will depend on both the particular node's capabilities and the settings within the node.
ColumnStatsContentModel API
Return | Method | Description |
---|---|---|
List<StatisticType> |
getAvailableStatistics() |
Returns the available statistics in this model. Not all fields will necessarily have values for all statistics. |
List<String> |
getAvailableColumns() |
Returns the column names for which statistics were computed. |
Number |
getStatistic(String column, StatisticType
statistic) |
Returns the statistic values associated with the column. |
void |
reset() |
Flushes any internal storage associated with this content model. |
PairwiseStatsContentModel API
Return | Method | Description |
---|---|---|
List<StatisticType> |
getAvailableStatistics() |
Returns the available statistics in this model. Not all fields will necessarily have values for all statistics. |
List<String> |
getAvailablePrimaryColumns() |
Returns the primary column names for which statistics were computed. |
List<Object> |
getAvailablePrimaryValues() |
Returns the values of the primary column for which statistics were computed. |
List<String> |
getAvailableSecondaryColumns() |
Returns the secondary column names for which statistics were computed. |
Number |
getStatistic(String primaryColumn, String
secondaryColumn, StatisticType statistic) |
Returns the statistic values associated with the columns. |
Number |
getStatistic(String primaryColumn, Object primaryValue,
String secondaryColumn, StatisticType
statistic) |
Returns the statistic values associated with the primary column value and the secondary column. |
void |
reset() |
Flushes any internal storage associated with this content model. |
Nodes and outputs
This table lists nodes that build outputs which include this type of content model.
Node name | Output name | Container ID | Notes |
---|---|---|---|
"means"
(Means node) |
"means" |
"columnStatistics" |
|
"means"
(Means node) |
"means" |
"pairwiseStatistics" |
|
"dataaudit"
(Data Audit node) |
"means" |
"columnStatistics" |
|
"statistics"
(Statistics node) |
"statistics" |
"columnStatistics" |
Only generated when specific fields are examined. |
"statistics"
(Statistics node) |
"statistics" |
"pairwiseStatistics" |
Only generated when fields are correlated. |
Example script
from modeler.api import StatisticType
stream = modeler.script.stream()
# Set up the input data
varfile = stream.createAt("variablefile", "File", 96, 96)
varfile.setPropertyValue("full_filename", "$CLEO/DEMOS/DRUG1n")
# Now create the statistics node. This can produce both
# column statistics and pairwise statistics
statisticsnode = stream.createAt("statistics", "Stats", 192, 96)
statisticsnode.setPropertyValue("examine", ["Age", "Na", "K"])
statisticsnode.setPropertyValue("correlate", ["Age", "Na", "K"])
stream.link(varfile, statisticsnode)
results = []
statisticsnode.run(results)
statsoutput = results[0]
statscm = statsoutput.getContentModel("columnStatistics")
if (statscm != None):
cols = statscm.getAvailableColumns()
stats = statscm.getAvailableStatistics()
print "Column stats:", cols[0], str(stats[0]), " = ", statscm.getStatistic(cols[0], stats[0])
statscm = statsoutput.getContentModel("pairwiseStatistics")
if (statscm != None):
pcols = statscm.getAvailablePrimaryColumns()
scols = statscm.getAvailableSecondaryColumns()
stats = statscm.getAvailableStatistics()
corr = statscm.getStatistic(pcols[0], scols[0], StatisticType.Pearson)
print "Pairwise stats:", pcols[0], scols[0], " Pearson = ", corr