You can use the General Clustering solution plan to create
mining flows that complete most tasks that involve cluster analysis.
Cluster analysis detects segments of records that have similar
properties. Records that are assigned to the same cluster are as similar
as possible, while pairs of records in different clusters are as different
as possible. The records might represent customers, stores, products,
or any other entity in your data. You often use clustering to gain
an overview of a large data set.
The General Clustering solution plan creates a mining flow that
analyzes your data and can produce the following output:
- A table with information about the segments that the mining flow
discovers, such as size or homogeneity
- A table that contains a sample of typical records of each of the
segments that the mining flow discovers
- A table that assigns records to the segments into which each record
best fits
- A clustering model that describes the segments that the mining
flow discovers. You can use this model to visualize your data set,
or to score new records to determine which group each new record fits
best.
The solution plan prompts you to select one or more of the above
operations to include in your mining flow.
The solution plan also prompts you to select the preprocessing
operations that you need to prepare your data for clustering. The
Clusterer requires as input tables that contain aggregated data. For
example, if your database has a record for every purchase made by
every customer, and you want to group your customers by purchase habits,
you need to construct during preprocessing a virtual table that contains
a record for each customer that aggregates all purchase data.
You can also merge several tables and select rows and columns from
these tables or add new columns by performing calculations using the
values of other columns. For example, you can add a column that calculates
the ages of your customers from the birthdays of your customers with
the following SQL expression:
YEAR(CURRENT DATE) - YEAR(BIRTHDAY)