Question & Answer
Question
What distance functions are available in the clustering modules and how can I use them?
Answer
IBM Netezza Analytics contains two clustering algorithms: k-means and divisive clustering.
Both algorithms have two main methods:
1. Build a model (stored procedures: KMEANS and DIVCLUSTER)
2. Apply the model to new data (stored procedures: PREDICT_KMEANS and PREDICT_DIVCLUSTER)
When a clustering stored procedure is called, the user can specify the distance function which is used by the clustering algorithm. For continuous attributes, four distance functions are available:
1. euclidean (the default distance function)
2. manhattan
3. maximum
4. canberra
For nominal attributes, the hamming distance is used.
The following examples show how to specify the manhattan distance function instead of the euclidean distance function for the k-means algorithm and divcluster algorithm:
call nza..KMEANS('model = adult_mdl, intable=nza..adult, outtable=adult_out, id=id, target=income, distance= manhattan, k=3');
OR call nza..DIVCLUSTER('model=adult_mdl, intable=nza..adult, outtable=adult_out, id=id, target=income,distance=manhattan, maxdepth=3');
where:
- model = adult_mdl - defines the name of the table where model will be stored
- intable=nza..adult - defines the name of the table containing the input dataset
- outtable=adult_out - defines the name of the table where cluster assignment will be stored
- id=id - defines the name of the column containing a unique instance identifier in the input table
- income - defines the name of the target attribute (It will be omitted by the clustering algorithm.)
- distance=manhattan - defines the name of the distance function which the clustering algorithm uses
- k=3 - defines the number of centers in the k-means algorithm
- maxdepth=3 defines the cluster's maximum number of tree levels in the divisive clustering algorithm
Historical Number
NZ153287
Was this topic helpful?
Document Information
More support for:
IBM PureData System
Software version:
1.0.0
Document number:
460725
Modified date:
17 October 2019
UID
swg21568259