Analytic Server Source node

The Analytic Server source enables you to run a stream on Hadoop Distributed File System (HDFS). The information in an Analytic Server data source can come from a variety of places, including:

  • Text files on HDFS
  • Databases
  • HCatalog

Typically, a stream with an Analytic Server Source will be executed on HDFS; however, if a stream contains a node that is not supported for execution on HDFS, then as much of the stream as possible will be "pushed back" to Analytic Server, and then SPSS® Modeler Server will attempt to process the remainder of the stream. You will need to subsample very large datasets; for example, by placing a Sample node within the stream.

If you want to use your own Analytic Server connection instead of the default connection defined by your administrator, deselect Use default Analytic Server and select your connection.

Data source. Assuming you or your SPSS Modeler Server administrator has established a connection, you select a data source containing the data you wish to use. A data source contains the files and metadata associated with that source. Click Select to display a list of available data sources. See the topic Selecting a data source for more information.

If you need to create a new data source or edit an existing one, click Launch Data Source Editor....

Note that using multiple Analytic Server connections can be useful in controlling the flow of data. For example, when using the Analytic Server Source and Export nodes, you may want to use different Analytic Server connections in different branches of a stream so that when each branch runs it uses its own Analytic Server and no data will be pulled to the IBM® SPSS Modeler Server. Note that if a branch contains more than one Analytic Server connection, the data will be pulled from the Analytic Servers to the IBM SPSS Modeler Server.