Setting options for the Variable File Node

You set the options on the File tab of the Variable File node dialog box.

File Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path is shown once you select a file, and its contents are displayed with delimiters in the panel below it.

The sample text that is displayed from your data source can be copied and pasted into the following controls: EOL comment characters and user-specified delimiters. Use Ctrl-C and Ctrl-V to copy and paste.

Read field names from file Selected by default, this option treats the first row in the data file as labels for the column. If your first row is not a header, deselect to automatically give each field a generic name, such as Field1, Field2, for the number of fields in the dataset.

Specify number of fields. Specify the number of fields in each record. The number of fields can be detected automatically as long as the records are new-line terminated. You can also set a number manually.

Skip header characters. Specify how many characters you want to ignore at the beginning of the first record.

EOL comment characters. Specify characters, such as # or !, to indicate annotations in the data. Wherever one of these characters appears in the data file, everything up to but not including the next new-line character will be ignored.

Strip lead and trail spaces. Select options for discarding leading and trailing spaces in strings on import.

Note: Comparisons between strings that do and do not use SQL pushback may generate different results where trailing spaces exist.

Invalid characters. Select Discard to remove invalid characters from the data source. Select Replace with to replace invalid characters with the specified symbol (one character only). Invalid characters are null characters or any character that does not exist in the encoding method specified.

Encoding. Specifies the text-encoding method used. You can choose between the system default, stream default, or UTF-8.

  • The system default is specified in the Windows Control Panel or, if running in distributed mode, on the server computer.
  • The stream default is specified in the Stream Properties dialog box.

Decimal symbol Select the type of decimal separator that is used in your data source. The Stream default is the character that is selected from the Options tab of the stream properties dialog box. Otherwise, select either Period (.) or Comma (,) to read all data in this dialog box using the chosen character as the decimal separator.

Line delimiter is newline character To use the newline character as the line delimiter, instead of a field delimiter, select this option. For example, this may be useful if there is an odd number of delimiters in a row that cause the line to wrap. Note that selecting this option means you cannot select Newline in the Delimiters list.

Note: If you select this option, any blank values at the end of data rows will be stripped out.

Delimiters. Using the check boxes listed for this control, you can specify which characters, such as the comma (,), define field boundaries in the file. You can also specify more than one delimiter, such as ", |" for records that use multiple delimiters. The default delimiter is the comma.

Note: If the comma is also defined as the Decimal symbol, the default settings here will not work. In cases where the comma is both the Field delimiter and the Decimal symbol, select Other in the Field delimiters list. Then manually specify a comma in the entry field.

Select Allow multiple blank delimiters to treat multiple adjacent blank delimiter characters as a single delimiter. For example, if one data value is followed by four spaces and then another data value, this group would be treated as two fields rather than five.

Lines to scan for column and type Specify how many lines and columns to scan for specified data types.

Automatically recognize dates and times To enable IBM® SPSS® Modeler to automatically attempt to recognize data entries as dates or times, select this check box. For example, this means that an entry such as 07-11-1965 will be identified as a date and 02:35:58 will be identified as a time; however, ambiguous entries such as 07111965 or 023558 will show up as integers since there are no delimiters between the numbers.

Note: To avoid potential data problems when you use data files from previous versions of IBM SPSS Modeler, this box is turned off by default for information that is saved in versions prior to 13.

Treat square brackets as lists If you select this check box, the data included between opening and closing square brackets is treated as a single value, even if the content includes delimiter characters such as commas and double quotes. For example, this might include two or three dimensional geospatial data, where the coordinates contained within square brackets are processed as a single list item. For more information, see Importing geospatial data into the Variable File Node

Quotes. Using the drop-down lists, you can specify how single and double quotation marks are treated on import. You can choose to Discard all quotation marks, Include as text by including them in the field value, or Pair and discard to match pairs of quotation marks and remove them. If a quotation mark is unmatched, you will receive an error message. Both Discard and Pair and discard store the field value (without quotation marks) as a string.
Note: When using Pair and discard, spaces are kept. When using Discard, trailing spaces inside and outside quotes are removed (for example, ' " ab c" , "d ef " , " gh i " ' will result in 'ab c, d ef, gh i'). When using Include as text, quotes are treated as normal characters, so leading and trailing spaces will be stripped naturally.

At any point while you are working in this dialog box, click Refresh to reload fields from the data source. This is useful when you are altering data connections to the source node or when you are working between tabs in the dialog box.