Use this stored procedure to build a K-means clustering model that clusters the input
data into k centers.
Authorization
The privileges held by the authorization ID of the statement must include the IDAX_USER role.
Syntax
IDAX.KMEANS(in parameter_string varchar(32672))
Parameter descriptions
- parameter_string
- Mandatory one-string parameter that contains pairs of
<parameter>=<value> entries that are separated by a comma.
- Data type: VARCHAR(32672)
- The following list shows the parameter values:
-
- model
- Mandatory.
- The name of the clustering model that is to be build.
- Data type: VARCHAR(64)
- intable
- Mandatory.
- The name of the input table.
- Data type: VARCHAR(128)
- outtable
- Mandatory.
- The name of the output table where the clusters are assigned to each input table record.
- Data type: VARCHAR(128)
- id
- Mandatory.
- The column of the input table that identifies a unique instance ID.
- Data type: VARCHAR(128)
- distance
- Optional.
- The distance function.
- Allowed values are "distance=norm_euclidean" and "distance=euclidean"
- Default: euclidean
- Data type: VARCHAR(14)
- k
- Optional.
- The number of cluster centers.
- Default: 3
- Data type: INTEGER
- maxiter
- Optional.
- The maximum number of iterations that are to be done.
- The minimum number is 1. The maximum number is 1000.
- Default: 5
- Data type: INTEGER
- randseed
- Optional.
- The random seed for the generator.
- Default: 12345
- Data type: INTEGER
- idbased
- Optional.
- Specifies that the random seed for the generator is based on the value of the id column
- Default: false
- Data type: BOOL
- incolumn
- Optional.
- The columns of the input table that have specific properties, which are separated by a
semi-colon (;).
- Each column is succeeded by one or more of the following properties:
- By type nominal (
:nom
) or by type continuous (:cont
). By
default, numerical types are continuous, and all other types are nominal.
- By role
:id
, :target
, :input
, or
:ignore
.
- If this parameter is not specified, all columns of the input table have default properties.
- Default: none
- Data type: VARCHAR(32000)
- coldeftype
- Optional.
- The default type of the input table columns.
- Allowed values are
nom
and cont
.
- If the parameter is not specified, numeric columns are continuous, and all other columns are
nominal.
- Default: none
- Data type: VARCHAR(4)
- coldefrole
- Optional.
- The default role of the input table columns.
- Allowed values are
input
and ignore
.
- If the parameter is not specified, all columns are input columns.
- Default: input
- Data type: VARCHAR(6)
- colPropertiesTable
- Optional.
- The input table where properties of the columns of the input table are stored.
- If this parameter is not specified, the column properties of the input table column properties
are detected automatically.
- Default: none
- Data type: VARCHAR(128)
- statistics
- Indicates the statistics that are to be collected.
- Allowed values are:
none
, columns
, values:n
, and all
.
- The following conditions apply:
- If statistics=none is specified, no statistics are collected.
- If statistics=columns is specified, statistics on the columns of the input
table are collected, for example, mean values.
- If statistics=values:n is specified, and if n is a positive number,
statistics on the columns and the column values are collected.
Up to <n> column value
statistics are collected.
- If a nominal column contains more than <n> values, only the <n> most frequent column
statistics are kept.
- If a numeric column contains more than <n> values, the values are discretized, and the
statistics are collected on the discretized values.
- statistics=all is identical to statistics=values:100.
- Default: none
- Data type: VARCHAR(32)
Returned information
The number of generated clusters as a result set.