Setting Field Storage and Formatting
Options on the Data tab for Fixed File, Variable File, XML Source, and User Input nodes allow you to specify the storage type for fields as they are imported or created in IBM® SPSS® Modeler. For Fixed File, Variable File and User Input nodes you can also specify the field formatting, and other metadata.
For data read from other sources, storage is determined automatically but can
be changed using a conversion function, such as to_integer
, in a Filler node or
Derive node.
Field Use the Field column to view and select fields in the current dataset.
Override Select the check box in the Override column to activate options in the Storage and Input Format columns.
Data Storage
Storage describes the way data are stored in a field. For example, a field with values of 1 and 0 stores integer data. This is distinct from the measurement level, which describes the usage of the data, and does not affect storage. For example, you may want to set the measurement level for an integer field with values of 1 and 0 to Flag. This usually indicates that 1 = True and 0 = False. While storage must be determined at the source, measurement level can be changed using a Type node at any point in the stream. See the topic Measurement levels for more information.
Available storage types are:
- String Used for fields that contain non-numeric data, also called alphanumeric data. A string can include any sequence of characters, such as fred, Class 2, or 1234. Note that numbers in strings cannot be used in calculations.
- Integer A field whose values are integers.
- Real Values are numbers that may include decimals (not limited to integers). The display format is specified in the Stream Properties dialog box and can be overridden for individual fields in a Type node (Format tab).
- Date Date values specified in a standard format such as year, month, and day (for example, 2007-09-26). The specific format is specified in the Stream Properties dialog box.
- Time Time measured as a duration. For example, a service call lasting 1 hour, 26 minutes, and 38 seconds might be represented as 01:26:38, depending on the current time format as specified in the Stream Properties dialog box.
- Timestamp Values that include both a date and time component, for example 2007–09–26 09:04:00, again depending on the current date and time formats in the Stream Properties dialog box. Note that timestamp values may need to be wrapped in double-quotes to ensure they are interpreted as a single value rather than separate date and time values. (This applies for example when entering values in a User Input node.)
- List Introduced in SPSS Modeler version 17, along with new
measurement levels of Geospatial and Collection, a List storage field contains multiple values for a
single record. There are list versions of all of the other storage types.
Table 1. List storage type icons Icon Storage type List of string List of integer List of real List of time List of date List of timestamp List with a depth greater than zero In addition, for use with the Collection measurement level, there are list versions of the following measurement levels.
Table 2. List measurement level icons Icon Measurement level List of continuous List of categorical List of flags List of nominal List of ordinal Lists can be imported into SPSS Modeler in one of three source nodes (Analytic Server, Geospatial, or Variable File), or created within your streams through use of the Derive or Filler field operation nodes.
For more information on Lists and their interaction with the Collection and Geospatial measurement levels, see List storage and associated measurement levels
Storage conversions. You can convert storage for a field using a
variety of conversion functions, such as to_string
and to_integer
,
in a Filler node. See the topic Storage Conversion Using the Filler Node for more information. Note that conversion functions (and any other functions that
require a specific type of input such as a date or time value) depend on the current formats
specified in the Stream Properties dialog box. For example, if you want to convert a string field
with values Jan 2018, Feb 2018, (and so forth) to date storage, select MON
YYYY as the default date format for the stream. Conversion functions are also available
from the Derive node, for temporary conversion during a derive calculation. You can also use the
Derive node to perform other manipulations, such as recoding string fields with categorical values.
See the topic Recoding Values with the Derive Node for more
information.
Reading in mixed data. Note that when reading in fields with numeric storage (either integer, real, time, timestamp, or date), any non-numeric values are set to null or system missing. This is because unlike some applications, IBM SPSS Modeler does not allow mixed storage types within a field. To avoid this, any fields with mixed data should be read in as strings, either by changing the storage type in the source node or in the external application as necessary.
Field Input Format (Fixed File, Variable File, and User Input nodes only)
For all storage types except String and Integer, you can specify formatting options for the selected field using the drop-down list. For example, when merging data from various locales, you may need to specify a period (.) as the decimal separator for one field, while another will require a comma separator.
Input options specified in the source node override the formatting options specified in the stream properties dialog box; however, they do not persist later in the stream. They are intended to parse input correctly based on your knowledge of the data. The specified formats are used as a guide for parsing the data as they are read into IBM SPSS Modeler, not to determine how they should be formatted after being read into IBM SPSS Modeler. To specify formatting on a per-field basis elsewhere in the stream, use the Format tab of a Type node. See the topic Field Format Settings Tab for more information.
Options vary depending on the storage type. For example, for the Real storage type, you can select Period (.) or Comma (,) as the decimal separator. For timestamp fields, a separate dialog box opens when you select Specify from the drop-down list. See the topic Setting Field Format options for more information.
For all storage types, you can also select Stream default to use the stream default settings for import. Stream settings are specified in the stream properties dialog box.
Additional Options
Several other options can be specified using the Data tab:
- To view storage settings for data that are no longer connected through the current node (train data, for example), select View unused field settings. You can clear the legacy fields by clicking Clear.
- At any point while working in this dialog box, click Refresh to reload fields from the data source. This is useful when you are altering data connections to the source node or when you are working between tabs on the dialog box.