IBM® Db2® for z/OS® models - TwoStep

The TwoStep node implements the TwoStep algorithm that provides a method to cluster data over large data sets.

You can use this node to cluster data while available resources, for example, memory and time constraints, are considered.

The TwoStep algorithm is a database-mining algorithm that clusters data in the following way:

  1. A clustering feature (CF) tree is created. This high-balanced tree stores clustering features for hierarchical clustering where similar input records become part of the same tree nodes.
  2. The leaves of the CF tree are clustered hierarchically in-memory to generate the final clustering result. The best number of clusters is determined automatically. If you specify a maximum number of clusters, the best number of clusters within the specified limit is determined.
  3. The clustering result is refined in a second step where an algorithm that is similar to the K-Means algorithm is applied to the data.