Creating SPSS Modeler flows
With SPSS Modeler flows in Watson Studio, you can quickly develop predictive models using business expertise and deploy them into business operations to improve decision making. Designed around the long-established SPSS Modeler client software and the industry-standard CRISP-DM model it uses, the flows interface supports the entire data mining process, from data to better business results.
Watson Studio offers a variety of modeling methods taken from machine learning, artificial intelligence, and statistics. The methods available on the node palette allow you to derive new information from your data and to develop predictive models. Each method has certain strengths and is best suited for particular types of problems.
Using the Flow Editor, you prepare or shape data, train or deploy a model, or transform data and export it back to a database table or a file. To create an SPSS model, add the Modeler flow asset type to your project, then select SPSS as the flow type.
An example project is installed with the product that includes example data and flows. See Example projects.
Watch these short videos for a few modeling examples:
- Data format
- Relational: Tables in relational data sources
- Tabular: Excel files (.xls, .xlsx), CSV files (.csv), or SPSS Statistics files (.sav). For Excel files, only the first sheet is read.
- Textual: In the supported relational tables or files
- Data size
- How can I prepare data?
- Use automatic data preparation functions
- Write SQL statements to manipulate data
- Cleanse, shape, sample, sort, and derive data
- How can I analyze data?
- Visualize data with many chart options
- Identify the natural language of a text field
- How can I build models?
- Build predictive models
- Choose from over 40 modeling algorithms, and many other nodes
- Use automatic modeling functions
- Model time series or geospatial data
- Classify textual data
- Identify relationships between the concepts in textual data
- Getting started
- To create an SPSS Modeler flow, click . See the following information for more details.
Getting started with creating a flow
- Open a project.
- If your data isn't already part of the project, go the Assets tab, click Add to project, and add data to your project's data assets. These data file types are currently supported via the Data Assset import node: .csv, .txt, .json, .xls, .xlsx, and .sav.
- While still on the Assets tab, click Add to project again and add a Modeler flow.
- Type a name and description for your flow and click Create to create the new flow. Or you can use the From file tab to create a new flow based on an existing file you saved locally, or use the From example tab to open one of the available example flows.
- Click the Palette icon () to open the node palette, then drag nodes to the canvas as desired. Or you can import an existing SPSS Modeler stream file (.str). Many nodes are available. For a detailed description of each node, see Nodes palette.
- Double-click a node to set its properties.
- To connect nodes, click the small empty circle icon on a node and drag on top of the node you want to connect it to.
- Continue to add and connect nodes as desired to create your flow.
- Run your flow locally or, if you need more processing power, run it on the IBM Watson Machine Learning Server instead. You can also save and deploy your models to the server, if desired. See Saving and running models on Watson Machine Learning Server.
Options for building a flow
- You can import a stream (.str) that was created in SPSS Modeler Subscription or SPSS Modeler client. If the imported flow contains one or more import or export nodes, you'll be prompted to convert the nodes. See Importing an SPSS Modeler stream.
- You can add data to your project to use as source or target nodes in a flow. See Add data to a project.
- You can take a quick look at a portion of a flow's data by right-clicking a node and selecting Preview. Or, more thoroughly examine your data by using a Charts node to launch the chart builder and use advanced visualizations to explore your data from different perspectives and identify patterns, connections, and relationships within your data.
- You can group related nodes together into a supernode, which is represented by a star icon. Ctrl-click the desired nodes, right-click, and select Create supernode.
- You can run any terminal node without running the entire flow. Right-click the node and select Run.
- To view the results of an Outputs node (such as a Table node), run the node and then click the View outputs and versions icon (). In the side panel, on the Outputs tab, click the object, such as a table, to open it.
- To save a version of a flow, click the View outputs and versions icon (). In the side panel, on the Versions tab, save the version.
- After running a flow, you'll notice a new model nugget is generated. Model nuggets are gold in color. You can right-click the nugget and select View Model to explore the output.
- You can download a flow to your local machine as an SPSS Modeler stream file (.str) by clicking the Download stream icon ().
- In some cases, a task may fail to complete and you'll be prompted to continue running or restart your session. You can also force your session to restart at any time by clicking the Restart session button () in the right-hand Information panel.
- The Control Language for Expression Manipulation (CLEM) is a powerful language for analyzing and manipulating the data streams through your flows. Data miners use CLEM extensively in flow operations to perform tasks as simple as deriving profit from cost and revenue data or as complex as transforming web log data into a set of fields and records with usable information. You can enter CLEM functions as code in the Expression Builder for various nodes, such as Derive and Set To Flag.