IDAX.KMEANS - Build a K-means clustering model

Use this stored procedure to build a K-means clustering model that clusters the input data into k centers.

Authorization

The privileges held by the authorization ID of the statement must include the IDAX_USER role.

Syntax

IDAX.KMEANS(in parameter_string varchar(32672))

Parameter descriptions

parameter_string
Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.
Data type: VARCHAR(32672)
The following list shows the parameter values:
model
Mandatory.
The name of the clustering model that is to be build.
Data type: VARCHAR(64)
intable
Mandatory.
The name of the input table.
Data type: VARCHAR(128)
outtable
Mandatory.
The name of the output table where the clusters are assigned to each input table record.
Data type: VARCHAR(128)
id
Mandatory.
The column of the input table that identifies a unique instance ID.
Data type: VARCHAR(128)
distance
Optional.
The distance function.
Allowed values are "distance=norm_euclidean" and "distance=euclidean"
Default: euclidean
Data type: VARCHAR(14)
k
Optional.
The number of cluster centers.
Default: 3
Data type: INTEGER
maxiter
Optional.
The maximum number of iterations that are to be done.
The minimum number is 1. The maximum number is 1000.
Default: 5
Data type: INTEGER
randseed
Optional.
The random seed for the generator.
Default: 12345
Data type: INTEGER
idbased
Optional.
Specifies that the random seed for the generator is based on the value of the id column
Default: false
Data type: BOOL
incolumn
Optional.
The columns of the input table that have specific properties, which are separated by a semi-colon (;).
Each column is succeeded by one or more of the following properties:
  • By type nominal (:nom) or by type continuous (:cont). By default, numerical types are continuous, and all other types are nominal.
  • By role :id, :target, :input, or :ignore.
If this parameter is not specified, all columns of the input table have default properties.
Default: none
Data type: VARCHAR(32000)
coldeftype
Optional.
The default type of the input table columns.
Allowed values are nom and cont.
If the parameter is not specified, numeric columns are continuous, and all other columns are nominal.
Default: none
Data type: VARCHAR(4)
coldefrole
Optional.
The default role of the input table columns.
Allowed values are input and ignore.
If the parameter is not specified, all columns are input columns.
Default: input
Data type: VARCHAR(6)
colPropertiesTable
Optional.
The input table where properties of the columns of the input table are stored.
If this parameter is not specified, the column properties of the input table column properties are detected automatically.
Default: none
Data type: VARCHAR(128)
statistics
Indicates the statistics that are to be collected.
Allowed values are: none, columns, values:n, and all.
The following conditions apply:
  • If statistics=none is specified, no statistics are collected.
  • If statistics=columns is specified, statistics on the columns of the input table are collected, for example, mean values.
  • If statistics=values:n is specified, and if n is a positive number, statistics on the columns and the column values are collected.
    Up to <n> column value statistics are collected.
    • If a nominal column contains more than <n> values, only the <n> most frequent column statistics are kept.
    • If a numeric column contains more than <n> values, the values are discretized, and the statistics are collected on the discretized values.
  • statistics=all is identical to statistics=values:100.
Default: none
Data type: VARCHAR(32)

Returned information

The number of generated clusters as a result set.