Overview

Source nodes enable you to import data stored in a number of formats, such as flat files, IBM® SPSS® Statistics (.sav), SAS, Microsoft Excel, and ODBC-compliant relational databases. You can also generate synthetic data using the User Input node.

The Sources palette contains the following nodes:

The Analytic Server source enables you to run a stream on Hadoop Distributed File System (HDFS). The information in an Analytic Server data source can come from a variety of places, such as text files and databases. See the topic Analytic Server Source node for more information.
The Database node can be used to import data from a variety of other packages using ODBC (Open Database Connectivity), including Microsoft SQL Server, Db2, Oracle, and others. See the topic Database source node for more information.
The Variable File node reads data from free-field text files—that is, files whose records contain a constant number of fields but a varied number of characters. This node is also useful for files with fixed-length header text and certain types of annotations. See the topic Variable File Node for more information.
The Fixed File node imports data from fixed-field text files—that is, files whose fields are not delimited but start at the same position and are of a fixed length. Machine-generated or legacy data are frequently stored in fixed-field format. See the topic Fixed File Node for more information.
The Statistics File node reads data from the .sav or .zsav file format used by IBM SPSS Statistics, as well as cache files saved in IBM SPSS Modeler, which also use the same format.
The Data Collection node imports survey data from various formats used by market research software conforming to the Data Collection Data Model. A Data Collection Developer Library must be installed to use this node. See the topic Data Collection Node for more information.
The IBM Cognos source node imports data from Cognos Analytics databases.
The IBM Cognos TM1 source node imports data from Cognos TM1 databases.
The SAS File node imports SAS data into IBM SPSS Modeler. See the topic SAS Source Node for more information.
The Excel node imports data from Microsoft Excel in the .xlsx file format. An ODBC data source is not required. See the topic Excel Source node for more information.
The XML source node imports data in XML format into the stream. You can import a single file, or all files in a directory. You can optionally specify a schema file from which to read the XML structure.
The User Input node provides an easy way to create synthetic data—either from scratch or by altering existing data. This is useful, for example, when you want to create a test dataset for modeling. See the topic User Input Node for more information.
The Simulation Generate node provides an easy way to generate simulated data—either from scratch using user specified statistical distributions or automatically using the distributions obtained from running a Simulation Fitting node on existing historical data. This is useful when you want to evaluate the outcome of a predictive model in the presence of uncertainty in the model inputs.
Use the Geospatial source node to bring map or spatial data into your data mining session. See the topic Geospatial Source Node for more information.
The JSON source node imports data from a JSON file. See JSON Source node for more information.

To begin a stream, add a source node to the stream canvas. Next, double-click the node to open its dialog box. The various tabs in the dialog box allow you to read in data; view the fields and values; and set a variety of options, including filters, data types, field role, and missing-value checking.