The General Clustering solution plan

You can use the General Clustering solution plan to create mining flows that complete most tasks that involve cluster analysis.

Cluster analysis detects segments of records that have similar properties. Records that are assigned to the same cluster are as similar as possible, while pairs of records in different clusters are as different as possible. The records might represent customers, stores, products, or any other entity in your data. You often use clustering to gain an overview of a large data set.

The General Clustering solution plan creates a mining flow that analyzes your data and can produce the following output:

A table with information about the segments that the mining flow discovers, such as size or homogeneity
A table that contains a sample of typical records of each of the segments that the mining flow discovers
A table that assigns records to the segments into which each record best fits
A clustering model that describes the segments that the mining flow discovers. You can use this model to visualize your data set, or to score new records to determine which group each new record fits best.

The solution plan prompts you to select one or more of the above operations to include in your mining flow.

The solution plan also prompts you to select the preprocessing operations that you need to prepare your data for clustering. The Clusterer requires as input tables that contain aggregated data. For example, if your database has a record for every purchase made by every customer, and you want to group your customers by purchase habits, you need to construct during preprocessing a virtual table that contains a record for each customer that aggregates all purchase data.

You can also merge several tables and select rows and columns from these tables or add new columns by performing calculations using the values of other columns. For example, you can add a column that calculates the ages of your customers from the birthdays of your customers with the following SQL expression:

YEAR(CURRENT DATE) - YEAR(BIRTHDAY)

Feedback