Setting options for values

The Value mode column under the Type node settings displays a drop-down list of predefined values. Choosing the Specify option on this list and then clicking the gear icon opens a new screen where you can set options for reading, specifying, labeling, and handling values for the selected field.

Many of the controls are common to all types of data. These common controls are discussed here.

Measure. Displays the currently selected measurement level. You can change this setting to reflect the way that you intend to use data. For instance, if a field called day_of_week contains numbers that represent individual days, you might want to change this to nominal data in order to create a distribution node that examines each category individually.

Role. Used to tell modeling nodes whether fields will be Input (predictor fields) or Target (predicted fields) for a machine-learning process. Other roles are also available such as Both , None, Partition, Split, Frequency, or Record ID.

Value mode. Select a mode to determine values for the selected field. Choices for reading values include the following:
  • Read. Select to read values when the node runs.
  • Pass. Select not to read data for the current field.
  • Specify. Options here are used to specify values and labels for the selected field. Used with value checking, use this option to specify values that are based on your knowledge of the current field. This option activates unique controls for each type of field. You can't specify values or labels for a field whose measurement level is Typeless.
  • Extend. Select to append the current data with the values that you enter here. For example, if field_1 has a range from (0,10) and you enter a range of values from (8,16), the range is extended by adding the 16 without removing the original minimum. The new range would be (0,16).
  • Current. Select to keep the current data values.

Value Labels (Add/Edit Labels). In this section you can enter custom labels for each value of the selected field.

Max list length. Only available for data with a measurement level of either Geospatial or Collection. Set the maximum length of the list by specifying the number of elements the list can contain.

Max string length. Only available for typeless data. Use this field when you're generating SQL to create a table. Enter the value of the largest string in your data; this generates a column in the table that's big enough for the string. If the string length value is not available, a default string size is used that may not be appropriate for the data (for example, if the value is too small, errors can occur when writing data to the table; too large a value could adversely affect performance).

Check. Select a method of coercing values to conform to the specified continuous, flag, or nominal values. This option corresponds to the Check column in the main Type node settings, and a selection made here will override those in the main settings. Used with the options for specifying values and labels, value checking allows you to conform values in the data with expected values. For example, if you specify values as 1, 0 and then use the Discard. option here, you can discard all records with values other than 1 or 0.

Define missing values. Select to activate the following controls you can use to declare missing values or blanks in your data.
  • Missing values. Use this field to define specific values (such as 99 or 0) as blanks. The value should be appropriate for the storage type of the field.
  • Range. Used to specify a range of missing values (such as ages 1–17 or greater than 65). If a bound value is blank, then the range is unbounded. For example, if you specify a lower bound of 100 with no upper bound, then all values greater than or equal to 100 are defined as missing. The bound values are inclusive. For example, a range with a lower bound of 5 and an upper bound of 10 includes 5 and 10 in the range definition. You can define a missing value range for any storage type, including date/time and string (in which case the alphabetic sort order is used to determine whether a value is within the range).
  • Null/White space. You can also specify system nulls (displayed in the data as $null$) and white space (string values with no visible characters) as blanks. Note that the Type node also treats empty strings as white space for purposes of analysis, although they are stored differently internally and may be handled differently in certain cases.
Note: To code blanks as undefined or $null$, use the Filler node.