Stream, Session, and SuperNode Parameters

Parameters provide a useful way of passing values at runtime, rather than hard coding them directly in a script. Parameters and their values are defined in the same as way for streams, that is, as entries in the parameters table of a stream or SuperNode, or as parameters on the command line. The Stream and SuperNode classes implement a set of functions defined by the ParameterProvider object as shown in the following table. Session provides a getParameters() call which returns an object that defines those functions.

Table 1. Functions defined by the ParameterProvider object
Method	Return type	Description
`p.parameterIterator()`	Iterator	Returns an iterator of parameter names for this object.
`p.getParameterDefinition( parameterName)`	ParameterDefinition	Returns the parameter definition for the parameter with the specified name, or `None` if no such parameter exists in this provider. The result may be a snapshot of the definition at the time the method was called and need not reflect any subsequent modifications made to the parameter through this provider.
`p.getParameterLabel(parameterName)`	string	Returns the label of the named parameter, or `None` if no such parameter exists.
`p.setParameterLabel(parameterName, label)`	Not applicable	Sets the label of the named parameter.
`p.getParameterStorage( parameterName)`	ParameterStorage	Returns the storage of the named parameter, or `None` if no such parameter exists.
`p.setParameterStorage( parameterName, storage)`	Not applicable	Sets the storage of the named parameter.
`p.getParameterType(parameterName)`	ParameterType	Returns the type of the named parameter, or `None` if no such parameter exists.
`p.setParameterType(parameterName, type)`	Not applicable	Sets the type of the named parameter.
`p.getParameterValue(parameterName)`	Object	Returns the value of the named parameter, or `None` if no such parameter exists.
`p.setParameterValue(parameterName, value)`	Not applicable	Sets the value of the named parameter.

In the following example, the script aggregates some Telco data to find which region has the lowest average income data. A stream parameter is then set with this region. That stream parameter is then used in a Select node to exclude that region from the data, before a churn model is built on the remainder.

The example is artificial because the script generates the Select node itself and could therefore have generated the correct value directly into the Select node expression. However, streams are typically pre-built, so setting parameters in this way provides a useful example.

The first part of the example script creates the stream parameter that will contain the region with the lowest average income. The script also creates the nodes in the aggregation branch and the model building branch, and connects them together.

import modeler.api

stream = modeler.script.stream()

# Initialize a stream parameter
stream.setParameterStorage("LowestRegion", modeler.api.ParameterStorage.INTEGER)

# First create the aggregation branch to compute the average income per region
statisticsimportnode = stream.createAt("statisticsimport", "SPSS File", 114, 142)
statisticsimportnode.setPropertyValue("full_filename", "$CLEO_DEMOS/telco.sav")
statisticsimportnode.setPropertyValue("use_field_format_for_storage", True)

aggregatenode = modeler.script.stream().createAt("aggregate", "Aggregate", 294, 142)
aggregatenode.setPropertyValue("keys", ["region"])
aggregatenode.setKeyedPropertyValue("aggregates", "income", ["Mean"])

tablenode = modeler.script.stream().createAt("table", "Table", 462, 142)

stream.link(statisticsimportnode, aggregatenode)
stream.link(aggregatenode, tablenode)

selectnode = stream.createAt("select", "Select", 210, 232)
selectnode.setPropertyValue("mode", "Discard")
# Reference the stream parameter in the selection
selectnode.setPropertyValue("condition", "'region' = '$P-LowestRegion'")

typenode = stream.createAt("type", "Type", 366, 232)
typenode.setKeyedPropertyValue("direction", "churn", "Target")

c50node = stream.createAt("c50", "C5.0", 534, 232)

stream.link(statisticsimportnode, selectnode)
stream.link(selectnode, typenode)
stream.link(typenode, c50node)

The example script creates the following stream.

Figure 1. Stream that results from the example script

The following part of the example script executes the Table node at the end of the aggregation branch.

# First execute the table node
results = []
tablenode.run(results)

The following part of the example script accesses the table output that was generated by the execution of the Table node. The script then iterates through rows in the table, looking for the region with the lowest average income.

# Running the table node should produce a single table as output
table = results[0]

# table output contains a RowSet so we can access values as rows and columns
rowset = table.getRowSet()
min_income = 1000000.0
min_region = None

# From the way the aggregate node is defined, the first column
# contains the region and the second contains the average income
row = 0
rowcount = rowset.getRowCount()
while row < rowcount:
    if rowset.getValueAt(row, 1) < min_income:
        min_income = rowset.getValueAt(row, 1)
        min_region = rowset.getValueAt(row, 0)
    row += 1

The following part of the script uses the region with the lowest average income to set the "LowestRegion" stream parameter that was created earlier. The script then runs the model builder with the specified region excluded from the training data.

# Check that a value was assigned
if min_region != None:
    stream.setParameterValue("LowestRegion", min_region)
else:
    stream.setParameterValue("LowestRegion", -1)

# Finally run the model builder with the selection criteria
c50node.run([])

The complete example script is shown below.

import modeler.api

stream = modeler.script.stream()

# Create a stream parameter
stream.setParameterStorage("LowestRegion", modeler.api.ParameterStorage.INTEGER)

# First create the aggregation branch to compute the average income per region
statisticsimportnode = stream.createAt("statisticsimport", "SPSS File", 114, 142)
statisticsimportnode.setPropertyValue("full_filename", "$CLEO_DEMOS/telco.sav")
statisticsimportnode.setPropertyValue("use_field_format_for_storage", True)

aggregatenode = modeler.script.stream().createAt("aggregate", "Aggregate", 294, 142)
aggregatenode.setPropertyValue("keys", ["region"])
aggregatenode.setKeyedPropertyValue("aggregates", "income", ["Mean"])

tablenode = modeler.script.stream().createAt("table", "Table", 462, 142)

stream.link(statisticsimportnode, aggregatenode)
stream.link(aggregatenode, tablenode)

selectnode = stream.createAt("select", "Select", 210, 232)
selectnode.setPropertyValue("mode", "Discard")
# Reference the stream parameter in the selection
selectnode.setPropertyValue("condition", "'region' = '$P-LowestRegion'")

typenode = stream.createAt("type", "Type", 366, 232)
typenode.setKeyedPropertyValue("direction", "churn", "Target")

c50node = stream.createAt("c50", "C5.0", 534, 232)

stream.link(statisticsimportnode, selectnode)
stream.link(selectnode, typenode)
stream.link(typenode, c50node)


# First execute the table node
results = []
tablenode.run(results)

# Running the table node should produce a single table as output
table = results[0]

# table output contains a RowSet so we can access values as rows and columns
rowset = table.getRowSet()
min_income = 1000000.0
min_region = None

# From the way the aggregate node is defined, the first column
# contains the region and the second contains the average income
row = 0
rowcount = rowset.getRowCount()
while row < rowcount:
    if rowset.getValueAt(row, 1) < min_income:
        min_income = rowset.getValueAt(row, 1)
        min_region = rowset.getValueAt(row, 0)
    row += 1

# Check that a value was assigned
if min_region != None:
    stream.setParameterValue("LowestRegion", min_region)
else:
    stream.setParameterValue("LowestRegion", -1)

# Finally run the model builder with the selection criteria
c50node.run([])