featureselectionnode properties

Feature Selection node iconThe Feature Selection node screens input fields for removal based on a set of criteria (such as the percentage of missing values); it then ranks the importance of remaining inputs relative to a specified target. For example, given a data set with hundreds of potential inputs, which are most likely to be useful in modeling patient outcomes?

Example

node = stream.create("featureselection", "My node")
node.setPropertyValue("screen_single_category", True)
node.setPropertyValue("max_single_category", 95)
node.setPropertyValue("screen_missing_values", True)
node.setPropertyValue("max_missing_values", 80)
node.setPropertyValue("criteria", "Likelihood")
node.setPropertyValue("unimportant_below", 0.8)
node.setPropertyValue("important_above", 0.9)
node.setPropertyValue("important_label", "Check Me Out!")
node.setPropertyValue("selection_mode", "TopN")
node.setPropertyValue("top_n", 15)
Table 1. featureselectionnode properties
featureselectionnode Properties Values Property description
target field Feature Selection models rank predictors relative to the specified target. Weight and frequency fields are not used. See Common modeling node properties for more information.
screen_single_category flag If True, screens fields that have too many records falling into the same category relative to the total number of records.
max_single_category number Specifies the threshold used when screen_single_category is True.
screen_missing_values flag If True, screens fields with too many missing values, expressed as a percentage of the total number of records.
max_missing_values number  
screen_num_categories flag If True, screens fields with too many categories relative to the total number of records.
max_num_categories number  
screen_std_dev flag If True, screens fields with a standard deviation of less than or equal to the specified minimum.
min_std_dev number  
screen_coeff_of_var flag If True, screens fields with a coefficient of variance less than or equal to the specified minimum.
min_coeff_of_var number  
criteria Pearson Likelihood CramersV Lambda When ranking categorical predictors against a categorical target, specifies the measure on which the importance value is based.
unimportant_below number Specifies the threshold p values used to rank variables as important, marginal, or unimportant. Accepts values from 0.0 to 1.0.
important_above number Accepts values from 0.0 to 1.0.
unimportant_label string Specifies the label for the unimportant ranking.
marginal_label string  
important_label string  
selection_mode ImportanceLevel ImportanceValue TopN  
select_important flag When selection_mode is set to ImportanceLevel, specifies whether to select important fields.
select_marginal flag When selection_mode is set to ImportanceLevel, specifies whether to select marginal fields.
select_unimportant flag When selection_mode is set to ImportanceLevel, specifies whether to select unimportant fields.
importance_value number When selection_mode is set to ImportanceValue, specifies the cutoff value to use. Accepts values from 0 to 100.
top_n integer When selection_mode is set to TopN, specifies the cutoff value to use. Accepts values from 0 to 1000.