XML Source Node

Note: This feature is available in SPSS® Modeler Professional and SPSS Modeler Premium.

Use the XML source node to import data from a file in XML format into an IBM® SPSS Modeler stream. XML is a standard language for data exchange, and for many organizations it is the format of choice for this purpose. For example, a government tax agency might want to analyze data from tax returns that have been submitted online and which have their data in XML format (see http://www.w3.org/standards/xml/).

Importing XML data into an IBM SPSS Modeler stream enables you to perform a wide range of predictive analytics functions on the source. The XML data is parsed into a tabular format in which the columns correspond to the different levels of nesting of the XML elements and attributes. The XML items are displayed in XPath format (see http://www.w3.org/TR/xpath20/).

Read a single file By default, SPSS Modeler reads a single file, which you specify in the XML data source field.

Read all XML files in a directory Choose this option if you want to read all the XML files in a particular directory. Specify the location in the Directory field that appears. Select the Include subdirectories check box to additionally read XML files from all the subdirectories of the specified directory.

XML data source Type the full path and file name of the XML source file you want to import, or use the Browse button to find the file.

XML schema (Optional) Specify the full path and file name of an XSD or DTD file from which to read the XML structure, or use the Browse button to find this file. If you leave this field blank, the structure is read from the XML source file. An XSD or DTD file can have more than one root element. In this case, when you change the focus to a different field, a dialog is displayed where you choose the root element you want to use. See the topic Selecting from Multiple Root Elements for more information.
Note: XSD Indicators are ignored by SPSS Modeler

XML structure A hierarchical tree showing the structure of the XML source file (or the schema, if you specified one in the XML schema field). To define a record boundary, select an element and click the right-arrow button to copy the item to the Records field.

Display attributes Displays or hides the attributes of the XML elements in the XML structure field.

Records (XPath expression) Shows the XPath syntax for an element copied from the XML structure field. This element is then highlighted in the XML structure, and defines the record boundary. Each time this element is encountered in the source file, a new record is created. If this field is empty, the first child element under the root is used as the record boundary.

Read all data By default, all data in the source file is read into the stream.

Specify data to read Choose this option if you want to import individual elements, attributes or both. Choosing this option enables the Fields table where you can specify the data you want to import.

Fields This table lists the elements and attributes selected for import, if you have selected the Specify data to read option. You can either type the XPath syntax of an element or attribute directly into the XPath column, or select an element or attribute in the XML structure and click the right-arrow button to copy the item into the table. To copy all the child elements and attributes of an element, select the element in the XML structure and click the double-arrow button.

  • XPath The XPath syntax of the items to be imported.
  • Location The location in the XML structure of the items to be imported. Fixed path shows the path of the item relative to the element highlighted in the XML structure (or the first child element under the root, if no element is highlighted). Any location denotes an item of the given name at any location in the XML structure. Custom is displayed if you type a location directly into the XPath column.