Modeling Node Fields Options
All modeling nodes have a Fields tab, where you can specify the fields to be used in building the model.
Before you can build a model, you need to specify which fields you want to use as targets and as inputs. With a few exceptions, all modeling nodes will use field information from an upstream Type node. If you are using a Type node to select input and target fields, you don't need to change anything on this tab. (Exceptions include the Sequence node and the Text Extraction node, which require that field settings be specified in the modeling node.)
Use type node settings. This option tells the node to use field information from an upstream Type node. This is the default.
Use custom settings. This option tells the node to use field information specified here instead of that given in any upstream Type node(s). After selecting this option, specify the fields below as required.
Note: Not all fields are displayed for all nodes.
-
Use transactional format (Apriori, CARMA, MS Association Rules and Oracle Apriori nodes
only). Select this check box if the source data is in transactional format. Records in
this format have two fields, one for an ID and one for content. Each record represents a single
transaction or item, and associated items are linked by having the same ID. Deselect this box if the
data is in tabular format, in which items are represented by separate flags, where each flag
field represents the presence or absence of a specific item and each record represents a complete
set of associated items. See the topic Tabular versus Transactional Data for more
information.
- ID. For transactional data, select an ID field from the list. Numeric or symbolic fields can be used as the ID field. Each unique value of this field should indicate a specific unit of analysis. For example, in a market basket application, each ID might represent a single customer. For a Web log analysis application, each ID might represent a computer (by IP address) or a user (by login data).
- IDs are contiguous. (Apriori and CARMA nodes only) If
your data are presorted so that all records with the same ID are grouped together in the data
stream, select this option to speed up processing. If your data are not presorted (or you are not
sure), leave this option unselected and the node will sort the data automatically.Note: If your data are not sorted and you select this option, you may get invalid results in your model.
- Content. Specify the content field(s) for the model. These fields contain the items of interest in association modeling. You can specify multiple flag fields (if data are in tabular format) or a single nominal field (if data are in transactional format).
- Target. For models that require one or more target fields, select the target field or fields. This is similar to setting the field role to Target in a Type node.
-
Evaluation. (For Auto Cluster models only.) No target is specified for cluster models;
however, you can select an evaluation field to identify its level of importance. In addition, you
can evaluate how well the clusters differentiate values of this field, which in turn indicates
whether the clusters can be used to predict this field. Note The evaluation field must be a
string with more than one value.
- Inputs. Select the input field or fields. This is similar to setting the field role to Input in a Type node.
- Partition. This field allows you to specify a field used to partition the data into separate samples for the training, testing, and validation stages of model building. By using one sample to generate the model and a different sample to test it, you can get a good indication of how well the model will generalize to larger datasets that are similar to the current data. If multiple partition fields have been defined by using Type or Partition nodes, a single partition field must be selected on the Fields tab in each modeling node that uses partitioning. (If only one partition is present, it is automatically used whenever partitioning is enabled.) Also note that to apply the selected partition in your analysis, partitioning must also be enabled in the Model Options tab for the node. (Deselecting this option makes it possible to disable partitioning without changing field settings.)
- Splits. For split models, select the split field or fields. This is similar to setting the field role to Split in a Type node. You can designate only fields with a measurement level of Flag, Nominal, Ordinal or Continuous as split fields. Fields chosen as split fields cannot be used as target, input, partition, frequency or weight fields. See the topic Building Split Models for more information.
- Use frequency field. This option enables you to select a field as a frequency weight. Use this if the records in your training data represent more than one unit each--for example, if you are using aggregated data. The field values should be the number of units represented by each record. See the topic Using Frequency and Weight Fields for more information.
Note: If you see the error message Metadata (on input/output fields) not valid, ensure that you have specified all fields that are required, such as the frequency field.
- Use weight field. This option enables you to select a field as a case weight. Case weights are used to account for differences in variance across levels of the output field. See the topic Using Frequency and Weight Fields for more information.
- Consequents. For rule induction nodes (Apriori), select the fields to be used as consequents in the resulting rule set. (This corresponds to fields with role Target or Both in a Type node.)
- Antecedents. For rule induction nodes (Apriori), select the fields to be used as antecedents in the resulting rule set. (This corresponds to fields with role Input or Both in a Type node.)
Some models have a Fields tab that differs from those described in this section.
- See the topic Sequence Node Fields Options for more information.
- See the topic CARMA Node Fields Options for more information.