Setting Options for the User Input Node
The dialog box for a User Input node contains several tools you can use to enter values and define the data structure for synthetic data. For a generated node, the table on the Data tab contains field names from the original data source. For a node added from the Sources palette, the table is blank. Using the table options, you can perform the following tasks:
- Add new fields using the Add a New Field button at the right in the table.
- Rename existing fields.
- Specify data storage for each field.
- Specify values.
- Change the order of fields on the display.
Entering Data
For each field, you can specify values or insert values from the original
dataset using the value picker button to the right of the table. See the rules described below for
more information on specifying values. You can also choose to leave the field blank—fields left
blank are filled with the system null ($null$
).
To specify string values, simply type them in the Values column, separated by spaces:
Fred Ethel Martin
Strings that include spaces can be wrapped in double-quotes:
"Bill Smith" "Fred Martin" "Jack Jones"
For numeric fields, you can either enter multiple values in the same manner (listed with spaces between):
10 12 14 16 18 20
Or you can specify the same series of numbers by setting its limits (10, 20) and the steps in between (2). Using this method, you would type:
10,20,2
These two methods can be combined by embedding one within the other, such as:
1 5 7 10,20,2 21 23
This entry will produce the following values:
1 5 7 10 12 14 16 18 20 21 23
Date and time values can be entered using the current default format selected in the Stream Properties dialog box, for example:
11:04:00 11:05:00 11:06:00
2007-03-14 2007-03-15 2007-03-16
For timestamp values, which have both a date and time component, double-quotes must be used:
"2007-03-14 11:04:00" "2007-03-14 11:05:00" "2007-03-14 11:06:00"
For additional details see comments on data storage below.
Generate data. Enables you to specify how the records are generated when you run the stream.
- All combinations. Generates records containing every possible combination of the field values, so each field value will appear in several records. This can sometimes generate more data than is wanted, so often you might follow this node with a sample node.
- In order. Generates records in the order in which the data field values are specified. Each field value only appears in one record. The total number of records is equal to the largest number of values for a single field. Where fields have fewer than the largest number, undefined ($null$) values are inserted.
Show example
For example, the following entries will generate the records listed in the two following table examples.
- Age. 30,60,10
- BP. LOW
- Cholesterol. NORMAL HIGH
- Drug. (left blank)
Age | BP | Cholesterol | Drug |
---|---|---|---|
30 | LOW | NORMAL | $null$ |
30 | LOW | HIGH | $null$ |
40 | LOW | NORMAL | $null$ |
40 | LOW | HIGH | $null$ |
50 | LOW | NORMAL | $null$ |
50 | LOW | HIGH | $null$ |
60 | LOW | NORMAL | $null$ |
60 | LOW | HIGH | $null$ |
Age | BP | Cholesterol | Drug |
---|---|---|---|
30 | LOW | NORMAL | $null$ |
40 | $null$ | HIGH | $null$ |
50 | $null$ | $null$ | $null$ |
60 | $null$ | $null$ | $null$ |
Data Storage
Storage describes the way data are stored in a field. For example, a field with values of 1 and 0 stores integer data. This is distinct from the measurement level, which describes the usage of the data, and does not affect storage. For example, you may want to set the measurement level for an integer field with values of 1 and 0 to Flag. This usually indicates that 1 = True and 0 = False. While storage must be determined at the source, measurement level can be changed using a Type node at any point in the stream. See the topic Measurement levels for more information.
Available storage types are:
- String Used for fields that contain non-numeric data, also called alphanumeric data. A string can include any sequence of characters, such as fred, Class 2, or 1234. Note that numbers in strings cannot be used in calculations.
- Integer A field whose values are integers.
- Real Values are numbers that may include decimals (not limited to integers). The display format is specified in the Stream Properties dialog box and can be overridden for individual fields in a Type node (Format tab).
- Date Date values specified in a standard format such as year, month, and day (for example, 2007-09-26). The specific format is specified in the Stream Properties dialog box.
- Time Time measured as a duration. For example, a service call lasting 1 hour, 26 minutes, and 38 seconds might be represented as 01:26:38, depending on the current time format as specified in the Stream Properties dialog box.
- Timestamp Values that include both a date and time component, for example 2007–09–26 09:04:00, again depending on the current date and time formats in the Stream Properties dialog box. Note that timestamp values may need to be wrapped in double-quotes to ensure they are interpreted as a single value rather than separate date and time values. (This applies for example when entering values in a User Input node.)
- List Introduced in SPSS® Modeler version 17, along with new
measurement levels of Geospatial and Collection, a List storage field contains multiple values for a
single record. There are list versions of all of the other storage types.
Table 3. List storage type icons Icon Storage type List of string List of integer List of real List of time List of date List of timestamp List with a depth greater than zero In addition, for use with the Collection measurement level, there are list versions of the following measurement levels.
Table 4. List measurement level icons Icon Measurement level List of continuous List of categorical List of flags List of nominal List of ordinal Lists can be imported into SPSS Modeler in one of three source nodes (Analytic Server, Geospatial, or Variable File), or created within your streams through use of the Derive or Filler field operation nodes.
For more information on Lists and their interaction with the Collection and Geospatial measurement levels, see List storage and associated measurement levels
Storage conversions. You can convert storage for a field using a
variety of conversion functions, such as to_string
and to_integer
,
in a Filler node. See the topic Storage Conversion Using the Filler Node for more information. Note that conversion functions (and any other functions that
require a specific type of input such as a date or time value) depend on the current formats
specified in the Stream Properties dialog box. For example, if you want to convert a string field
with values Jan 2018, Feb 2018, (and so forth) to date storage, select MON
YYYY as the default date format for the stream. Conversion functions are also available
from the Derive node, for temporary conversion during a derive calculation. You can also use the
Derive node to perform other manipulations, such as recoding string fields with categorical values.
See the topic Recoding Values with the Derive Node for more
information.
Reading in mixed data. Note that when reading in fields with numeric storage (either integer, real, time, timestamp, or date), any non-numeric values are set to null or system missing. This is because unlike some applications, IBM® SPSS Modeler does not allow mixed storage types within a field. To avoid this, any fields with mixed data should be read in as strings, either by changing the storage type in the source node or in the external application as necessary.
Note: Generated User Input nodes may already contain storage information garnered from the source node if instantiated. An uninstantiated node does not contain storage or usage type information.
Rules for Specifying Values
For symbolic fields, you should leave spaces between multiple values, such as:
HIGH MEDIUM LOW
For numeric fields, you can either enter multiple values in the same manner (listed with spaces between):
10 12 14 16 18 20
Or you can specify the same series of numbers by setting its limits (10, 20) and the steps in between (2). Using this method, you would type:
10,20,2
These two methods can be combined by embedding one within the other, such as:
1 5 7 10,20,2 21 23
This entry will produce the following values:
1 5 7 10 12 14 16 18 20 21 23