|
The Binning node automatically creates new nominal (set) fields based on the values of one or
more existing continuous (numeric range) fields. For example, you can transform a continuous income
field into a new categorical field containing groups of income as deviations from the mean. Once you
have created bins for the new field, you can generate a Derive node based on the cut points.
|
Example
node = stream.create("binning", "My node")
node.setPropertyValue("fields", ["Na", "K"])
node.setPropertyValue("method", "Rank")
node.setPropertyValue("fixed_width_name_extension", "_binned")
node.setPropertyValue("fixed_width_add_as", "Suffix")
node.setPropertyValue("fixed_bin_method", "Count")
node.setPropertyValue("fixed_bin_count", 10)
node.setPropertyValue("fixed_bin_width", 3.5)
node.setPropertyValue("tile10", True)
Table 1. binningnode properties
binningnode properties |
Data type |
Property description |
fields
|
[field1 field2 ... fieldn]
|
Continuous (numeric range) fields pending transformation. You can bin multiple fields
simultaneously. |
method
|
FixedWidth
EqualCount
Rank
SDev
Optimal
|
Method used for determining cut points for new field bins (categories). |
rcalculate_bins
|
Always
IfNecessary
|
Specifies whether the bins are recalculated and the data placed in the relevant bin every
time the node is executed, or that data is added only to existing bins and any new bins that have
been added. |
fixed_width_name_extension
|
string
|
The default extension is _BIN. |
fixed_width_add_as
|
Suffix
Prefix
|
Specifies whether the extension is added to the end (suffix) of the field name or to the
start (prefix). The default extension is income_BIN. |
fixed_bin_method
|
Width
Count
|
|
fixed_bin_count
|
integer
|
Specifies an integer used to determine the number of fixed-width bins (categories) for the
new field(s). |
fixed_bin_width
|
real
|
Value (integer or real) for calculating width of the bin. |
equal_count_name_
extension
|
string
|
The default extension is _TILE. |
equal_count_add_as
|
Suffix
Prefix
|
Specifies an extension, either suffix or prefix, used for the field name generated by using
standard p-tiles. The default extension is _TILE plus N, where N is the tile
number. |
tile4
|
flag
|
Generates four quantile bins, each containing 25% of cases. |
tile5
|
flag
|
Generates five quintile bins. |
tile10
|
flag
|
Generates 10 decile bins. |
tile20
|
flag
|
Generates 20 vingtile bins. |
tile100
|
flag
|
Generates 100 percentile bins. |
use_custom_tile
|
flag
|
|
custom_tile_name_extension
|
string
|
The default extension is _TILEN. |
custom_tile_add_as
|
Suffix
Prefix
|
|
custom_tile
|
integer
|
|
equal_count_method
|
RecordCount
ValueSum
|
The RecordCount method seeks to assign an equal number of records to each
bin, while ValueSum assigns records so that the sum of the values in each bin is
equal. |
tied_values_method
|
Next
Current
Random
|
Specifies which bin tied value data is to be put in. |
rank_order
|
Ascending
Descending
|
This property includes Ascending (lowest value is marked 1) or
Descending (highest value is marked 1). |
rank_add_as
|
Suffix
Prefix
|
This option applies to rank, fractional rank, and percentage rank. |
rank
|
flag
|
|
rank_name_extension
|
string
|
The default extension is _RANK. |
rank_fractional
|
flag
|
Ranks cases where the value of the new field equals rank divided by the sum of the weights of
the nonmissing cases. Fractional ranks fall in the range of 0–1. |
rank_fractional_name_
extension
|
string
|
The default extension is _F_RANK. |
rank_pct
|
flag
|
Each rank is divided by the number of records with valid values and multiplied by 100.
Percentage fractional ranks fall in the range of 1–100. |
rank_pct_name_extension
|
string
|
The default extension is _P_RANK. |
sdev_name_extension
|
string
|
|
sdev_add_as
|
Suffix
Prefix
|
|
sdev_count
|
One
Two
Three
|
|
optimal_name_extension
|
string
|
The default extension is _OPTIMAL. |
optimal_add_as
|
Suffix
Prefix
|
|
optimal_supervisor_field
|
field
|
Field chosen as the supervisory field to which the fields selected for binning are
related. |
optimal_merge_bins
|
flag
|
Specifies that any bins with small case counts will be added to a larger, neighboring
bin. |
optimal_small_bin_threshold
|
integer
|
|
optimal_pre_bin
|
flag
|
Indicates that prebinning of dataset is to take place. |
optimal_max_bins
|
integer
|
Specifies an upper limit to avoid creating an inordinately large number of bins. |
optimal_lower_end_point
|
Inclusive
Exclusive
|
|
optimal_first_bin
|
Unbounded
Bounded
|
|
optimal_last_bin
|
Unbounded
Bounded
|
|