Using the Values Dialog Box
Clicking the Values or Missing column of the Types tab displays a drop-down list of predefined values. Choosing the Specify... option on this list opens a separate dialog box where you can set options for reading, specifying, labeling, and handling values for the selected field.
Many of the controls are common to all types of data. These common controls are discussed here.
Measurement Displays the currently selected measurement
level. You can change the setting to reflect the way that you intend to use data. For instance, if a
field called day_of_week
contains numbers that represent individual days, you might
want to change this to nominal data in order to create a distribution node that examines each
category individually.
Storage Displays the storage type if known. Storage types are unaffected by the measurement level that you choose. To alter the storage type, you can use the Data tab in Fixed File and Variable File source nodes, or a conversion function in a Filler node.
Model Field For fields generated as a result of scoring a model nugget, model field details can also be viewed. These include the name of the target field as well as the role of the field in modeling (whether a predicted value, probability, propensity, and so on).
Values Select a method to determine values for the selected field. Selections that you make here override any selections that you made earlier from the Values column of the Type node dialog box. Choices for reading values include the following:
- Read from data Select to read values when the node is executed. This option is the same as <Read>.
- Pass Select not to read data for the current field. This option is the same as <Pass>.
- Specify values and labels Options here are used to
specify values and labels for the selected field. Used with value checking, use this option to
specify values that are based on your knowledge of the current field. This option activates unique
controls for each type of field. Options for values and labels are covered individually in
subsequent topics. Note: You cannot specify values or labels for a field whose measurement level is Typeless or <Default>.
- Extend values from data Select to append the current
data with the values that you enter here. For example, if
field_1
has a range from (0,10), and you enter a range of values from (8,16), the range is extended by adding the 16, without removing the original minimum. The new range would be (0,16). Choosing this option automatically sets the auto-typing option to <Read+>.
Max list length Only available for data with a measurement level of either Geospatial or Collection. Set the maximum length of the list by specifying the number of elements the list can contain.
Max string length Only available for typeless data; use this field when you are generating SQL to create a table. Enter the value of the largest string in your data; this generates a column in the table that is big enough for the string. If the string length value is not available, a default string size is used that may not be appropriate for the data (for example, if the value is too small, errors can occur when writing data to the table; too large a value could adversely affect performance.
Check values Select a method of coercing values to conform to the specified continuous, flag, or nominal values. This option corresponds to the Check column in the Type node dialog box, and settings made here override those in the dialog box. Used with the Specify values and labels option, value checking allows you to conform values in the data with expected values. For example, if you specify values as 1, 0 and then use the Discard option, you can discard all records with values other than 1 or 0.
Define blanks Select to activate the following controls that you use to declare missing values or blanks in your data.
- Missing values Use this table to define specific values (such as 99 or 0) as blanks. The value should be appropriate for the storage type of the field.
- Range Used to specify a range of missing values, for example, ages 1–17 or greater than 65. If a bound value is left blank, then the range is unbounded; for example, if a lower bound of 100 is specified with no upper bound, then all values greater than or equal to 100 is defined as missing. The bound values are inclusive; for example, a range with a lower bound of 5 and an upper bound of 10 includes 5 and 10 in the range definition. A missing value range can be defined for any storage type, including date/time and string (in which case the alphabetic sort order is used to determine whether a value is within the range).
- Null/White space You can also specify system nulls
(displayed in the data as
$null$
) and white space (string values with no visible characters) as blanks.Note: The Type node also treats empty strings as white space for purposes of analysis, although they are stored differently internally and may be handled differently in certain cases.
$null$
, use the Filler node.Description Use this text box to specify a field label. These labels appear in various locations, such as in graphs, tables, output, and model browsers, depending on selections you make in the Stream Properties dialog box.