Creating SPSS Modeler flows

With SPSS Modeler flows in Cloud Pak for Data, you can quickly develop predictive models using business expertise and deploy them into business operations to improve decision making. Designed around the long-established SPSS Modeler client software and the industry-standard CRISP-DM model it uses, the flows interface supports the entire data mining process, from data to better business results.

Cloud Pak for Data offers a variety of modeling methods taken from machine learning, artificial intelligence, and statistics. The methods available on the node palette allow you to derive new information from your data and to develop predictive models. Each method has certain strengths and is best suited for particular types of problems.

Using the Flow Editor, you prepare or shape data, train or deploy a model, or transform data and export it back to a database table or a file. To create an SPSS model, add the Modeler flow asset type to your project, then select SPSS as the flow type.

Service This service is not available by default. An administrator must install this service on the IBM® Cloud Pak for Data platform. To determine whether the service is installed, open the Services catalog and check whether the service is enabled.

Data format
Relational: Tables in relational data sources
Tabular: Excel files (.xls, .xlsx), CSV files (.csv), or SPSS Statistics files (.sav). For Excel files, only the first sheet is read.
Textual: In the supported relational tables or files
Data size
How can I prepare data?
Use automatic data preparation functions
Write SQL statements to manipulate data
Cleanse, shape, sample, sort, and derive data
How can I analyze data?
Visualize data with many chart options
Identify the natural language of a text field
How can I build models?
Build predictive models
Choose from over 40 modeling algorithms, and many other nodes
Use automatic modeling functions
Model time series or geospatial data
Classify textual data
Identify relationships between the concepts in textual data
Getting started
To create an SPSS Modeler flow, click Add to project > Modeler flow. See the following information for more details.
Note: Cloud Pak for Data doesn't include SPSS functionality in Peru, Ecuador, Colombia, or Venezuela.

Getting started with creating a flow

  1. Open a project.
  2. If your data isn't already part of the project, go the Assets tab, click Add to project, and add data to your project's data assets. These data file types are currently supported via the Data Assset import node: .csv, .txt, .json, .xls, .xlsx, and .sav.
  3. While still on the Assets tab, click Add to project again and add a Modeler flow.
  4. Type a name and description for your flow and click Create to create the new flow. Or you can use the From file tab to create a new flow based on an existing file you saved locally, or use the From example tab to open one of the available example flows.
  5. Click the Palette icon (Shows the Palette icon) to open the node palette, then drag nodes to the canvas as desired. Or you can import an existing SPSS Modeler stream file (.str). Many nodes are available. For a detailed description of each node, see Nodes palette.
    Figure 1. Node palette
    Many node types are available on the palette
  6. Double-click a node to set its properties.
  7. To connect nodes, click the small empty circle icon on a node and drag on top of the node you want to connect it to.
    Figure 2. Connecting nodes
    Shows two connected nodes
  8. Continue to add and connect nodes as desired to create your flow.
    Figure 3. Creating a flow
    Shows a flow with several nodes

Options for building a flow

  • You can import a stream (.str) that was created in SPSS Modeler Subscription or SPSS Modeler client. If the imported flow contains one or more import or export nodes, you'll be prompted to convert the nodes. See Importing an SPSS Modeler stream.
  • You can add data to your project to use as source or target nodes in a flow. See Add data to a project.
  • You can take a quick look at a portion of a flow's data by right-clicking a node and selecting Preview. Or, more thoroughly examine your data by using a Charts node to launch the chart builder and use advanced visualizations to explore your data from different perspectives and identify patterns, connections, and relationships within your data.
  • You can group related nodes together into a supernode, which is represented by a star icon. Ctrl-click the desired nodes, right-click, and select Create supernode.
  • You can run any terminal node without running the entire flow. Right-click the node and select Run.
  • To view the results of an Outputs node (such as a Table node), run the node and then click the View outputs and versions icon (Shows the View outputs and versions icon). In the side panel, on the Outputs tab, click the object, such as a table, to open it.
  • To save a version of a flow, click the View outputs and versions icon (Shows the View outputs and versions icon). In the side panel, on the Versions tab, save the version.
  • After running a flow, you'll notice a new model nugget is generated. Model nuggets are gold in color. You can right-click the nugget and select View Model to explore the output.
  • You can download a flow to your local machine as an SPSS Modeler stream file (.str) by clicking the Download stream icon (Shows the Download stream icon).
  • The Control Language for Expression Manipulation (CLEM) is a powerful language for analyzing and manipulating the data streams through your flows. Data miners use CLEM extensively in flow operations to perform tasks as simple as deriving profit from cost and revenue data or as complex as transforming web log data into a set of fields and records with usable information. You can enter CLEM functions as code in the Expression Builder for various nodes, such as Derive and Set To Flag.