File source

The application supports several data file types. When you enter the filename or browse for the file, the application automatically detects the type and expands the Data Source Editor to display additional fields that relate to that type. If the file type is incorrect, or requires a specific version (for example, Microsoft Excel 1997-2003), you can change the type to the one required.

You can click Browse to select a file from the repository, or click Upload local file to select a file from your local file system (if enabled).

If you upload a local file, it will be uploaded to the IBM® SPSS® Modeler Server. In the Upload file dialog, browse to and select the local file you want to upload and then browse to and select the IBM SPSS Modeler Server upload destination. When you select the destination, you can change the file name if desired.

The Upload local file capability is disabled by default. Administrators can use browser-based IBM SPSS Deployment Manager to enable or disable it, and to increase or decrease the maximum file size limit allowed.

Text-based data files

When you select a text-based data source, you are prompted to enter further details.

First row has column names. Select this if the names of each column are included as a heading row in the data source.

Encoding. Specifies the text-encoding method used. You can choose between the system default or UTF-8.

Decimal symbol. Specifies how decimals should be represented in the data.

Stream default. The decimal separator defined by the current stream's default setting will be used. This will normally be the decimal separator defined by the computer's locale settings.
Period (.). The period character will be used as the decimal separator.
Comma (,). The comma character will be used as the decimal separator.

Delimiters. Using the check boxes listed for this control, you can specify which characters, such as the comma (,), define field boundaries in the file. You can also specify more than one delimiter, such as ", |" for records that use multiple delimiters. The default delimiter is the comma.

Note: If the comma is also defined as the decimal separator, the default settings here will not work. In cases where the comma is both the field delimiter and the decimal separator, select Other in the Delimiters list. Then manually specify a comma in the entry field.

Select Allow multiple blank delimiters to treat multiple adjacent blank delimiter characters as a single delimiter. For example, if one data value is followed by four spaces and then another data value, this group would be treated as two fields rather than five.

Advanced options

EOL comment characters. Specify characters, such as # or !, to indicate annotations in the data. Wherever one of these characters appears in the data file, everything up to but not including the next new-line character will be ignored.

Specify input fields. Specify the number of input fields to be used from each record.

Specify data format For File data source types that are of type Variable length fields in plain text, you can use this section to set the input storage type and the format of each field to ensure that values are read correctly. This is similar to the functionality available in IBM SPSS Modeler client. The Override option indicates whether the default is overridden. Selecting Override enables the Storage and Input Format controls. Deselecting Override will change the values back to their original defaults. Input Format only applies real, date, time, and timestamp storage types.

Skip header characters. Specify how many characters you want to ignore at the beginning of the first record.

Lines to scan for type. Specify how many lines to scan for specified data types.

Strip lead and trail spaces. Select to discard leading and trailing spaces in strings on import. You can choose to strip from the left, right, both sides, or none.

Invalid characters. Select Discard to remove invalid characters from the data source. Select Replace with to replace invalid characters with the specified symbol (one character only). Invalid characters are null characters or any character that does not exist in the encoding method specified.

Quotes. Using the drop-down lists, you can specify how single and double quotation marks are treated on import. You can choose to Discard all quotation marks, Include as text by including them in the field value, or Pair and discard to match pairs of quotation marks and remove them. If a quotation mark is unmatched, you will receive an error message. Both Discard and Pair and discard store the field value (without quotation marks) as a string.

Note: When using Pair and discard, spaces are kept. When using Discard, trailing spaces inside and outside quotes are removed (for example, ' " ab c" , "d ef " , " gh i " ' will result in 'ab c, d ef, gh i'). When using Include as text, quotes are treated as normal characters, so leading and trailing spaces will be stripped naturally.

Specify input fields. See the topic Selecting input fields for more information.

Excel data files

When you select an Excel data source, you are prompted to enter further details:

First row has column names. Select this if the names of each column are included as a heading row in the data source.

Named range. Enables you to select a named range of cells as defined in the Excel worksheet. If you use a named range, other worksheet and data range settings are no longer applicable and are disabled as a result.

Choose worksheet. Specifies the worksheet to import, either by name or by index.

By name. Select the name of the worksheet you want to import.
By index. Specify the index value for the worksheet you want to import, beginning with 0 for the first worksheet, 1 for the second worksheet, and so on.

Range on worksheet. You can import data beginning with the first non-blank row or with an explicit range of cells.

Range starts on first non-blank row. Locates the first non-blank cell and uses this as the upper left corner of the data range.
Explicit range of cells. Enables you to specify an explicit range by row and column. For example, to specify the Excel range A1:D5, you can enter A1 in the first field and D5 in the second (or alternatively, R1C1 and R5C4). All rows in the specified range are returned, including blank rows.

On blank rows. If a blank row is encountered, you can choose whether to skip and ignore the row or choose Return blank rows to continue reading all data to the end of the worksheet, including blank rows.

Specify input fields. See the topic Selecting input fields for more information.

IBM SPSS Statistics data files

When you select an IBM SPSS Statistics data source ( .sav or .zsav file), you are prompted to enter further details. If the file is password protected, you will also be prompted to enter the password.

Variable names. Select a method of handling variable names and labels upon import from a IBM SPSS Statistics .sav or .zsavfile.

Read names and labels. Select to read in both variable names and labels; this is the default option. Labels may be displayed in charts, model browsers, and other types of output.
Read labels as names. Select to read in the descriptive variable labels from the IBM SPSS Statistics .sav file rather than the short field names, and use these labels as variable names.

Values. Select a method of handling values and labels upon import from a IBM SPSS Statistics .sav or .zsavfile.

Read data and labels. Select to read in both actual values and value labels; this is the default option.
Read labels as data. Select if you want to use the value labels from the .sav or .zsav file rather than the numerical or symbolic codes used to represent the values. For example, selecting this option for data with a gender field whose values of 1 and 2 actually represent male and female, respectively, will convert the field to a string and import male and female as the actual values.
It is important to consider missing values in your IBM SPSS Statistics data before selecting this option. For example, if a numeric field uses labels only for missing values (0 = No Answer, –99 = Unknown), then selecting the option above will import only the value labels No Answer and Unknown and will convert the field to a string. In such cases, you should import the values themselves.

Specify input fields. See the topic Selecting input fields for more information.