Cluster modeling example
With clustering models, you can categorize records into groups with similar characteristics. This can help find natural groups in your data. For example, you might segment customers based on demographic characteristics or purchasing behavior.
This example uses the data file bank_customer_data.txt, which is distributed and installed with the product. A completed version of this example is also provided in the file bank_cluster_model.str. Contact your administrator for details on installing and sample files if necessary. See the topic Sample files for more information.
- Return to the application Launch page and create a new IBM® SPSS® Modeler Advantage project.
- On the Data tab, in the Project Data Sources panel, add a new data source called bank customer data as described in Setting up data sources.
- On the Modeling tab, click Change Model, select Clustering model, and click Save.
- For the data source, select bank customer data.
- Expand the Optional Settings section. For clustering models, additional Auto Cluster Options optional settings are available
for selecting an evaluation field or setting a desired range of clusters
to find.
When a clustering model is built, behind the scenes a number of different clustering models are actually created. These models are then compared and ranked by some measure of quality, and the best model is selected for use by the auto cluster model. So these two Auto Cluster Options settings provide some control over which of the models is actually selected. If an evaluation field is specified here, the model which best differentiates values of the evaluation field will be selected. And if the desired range is set, any models which find a number of clusters outside the specified range will be discarded.
- For this example, select the Set a desired range for the number of clusters found option and use the default minimum of 3 and maximum of 15. This will ensure the clustering model won't have too few or too many clusters.
- The rest of the optional settings are same as for other modeling types. Deselect the Automatically clean up and prepare data for reliable model building option just to make the auto clustering example easier to understand.
- Collapse the Optional Settings panel and expand the Clusters panel.
In the Clusters panel, there's a section for manual clusters and a section for auto clusters. Manual clusters allow you to define clusters based on your knowledge of the data. For this example, we'll define a manual cluster for high value customers and a manual cluster for young single customers, as follows.
- Click the Create a new rule icon. Name the rule high value customers, add the following to expressions, and click OK.
Months as a Customer > 12
Income > 65000
- Create another new rule called young single customers with the following expressions.
Age < 35
Marital Status = U
- The two new rules will be listed under the Manual Clusters section. To see how many customers are being caught by these manual clusters, click Clusters record count. Then expand the manual clusters again and look at the Count column to see how many customers were found for each cluster. You'll see that 67 high value customers were found and 33 young single customers were found.
- Next click Find Auto Clusters to automatically find any other clusters that might be of interest. When finished, expand the Auto Clusters section. You can see that three auto clusters were found.
- Click View Auto Clusters to see more detail. The Auto Cluster Results Viewer will be displayed.
The Model Summary page provides basic information about the auto cluster model algorithm that was used (TwoStep, in this example), the number of auto clusters found, and a basic idea of cluster quality.
Other charts are also available. You can hover over various areas in each chart to see more detail, and interact with some of the charts (such as the Predictor Importance chart).
Using the Clusters view (the last chart), you can also compare clusters to look for interesting patterns. In a real world implementation, you would study the auto clusters and then give them more meaningful names (by clicking each cluster label in the Clusters view). After updating the cluster label names, the new names will be used in all other charts and back on the Modeling tab. Then you would proceed to evaluate and test the model, and finally score it once you're happy with it.
- If desired, save the project as bank_cluster_model.str.
For more details about cluster modeling, see Building a clustering model.