IDAX.DECTREE - Build and then prune a decision tree model
Use this stored procedure to build a decision tree model by growing and pruning a tree.
Authorization
The privileges held by the authorization ID of the statement must include the IDAX_USER role.
Syntax
IDAX.DECTREE(in parameter_string varchar(32672))
Parameter descriptions
- parameter_string
- Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.
- Data type: VARCHAR(32672)
- The following list shows the parameter values:
-
- model
- Mandatory.
- The name of the decision tree model that is to be built.
- Data type: VARCHAR(64)
- intable
- Mandatory.
- The name of the input table.
- Data type: VARCHAR(128)
- id
- Mandatory.
- The column of the input table that identifies a unique instance ID.
- Data type: VARCHAR(128)
- target
- Mandatory.
- The column of the input table that represents the class.
- Data type: VARCHAR(128)
- incolumn
- Optional.
- The columns of the input table that have specific properties, which are separated by a semi-colon (;).
- Each column is succeeded by one or more of the following properties:
- By type nominal (":nom", ":nominal") or by type continuous (":cont", ":continuous"). By default, numerical types are continuous, and all other types are nominal.
- By type nominal (
:nom
) or by type continuous (:cont
). By default, numerical types are continuous, and all other types are nominal. - By role id (":id"), target (":target"), input (":active", ":in", ":input") or ignore (":ignore", ":inactive").
- By role
:id
,:target
,:input
, or:ignore
.
- If this parameter is not specified, all columns of the input table have default properties.
- Default: none
- Data type: VARCHAR(32000)
- coldeftype
- Optional.
- The default type of the input table columns.
- Allowed values are
nom
andcont
. - Allowed values are:
- "nom" and "nominal" for type nominal
- "cont" and "continuous" for type continuous
- If the parameter is not specified, numeric columns are continuous, and all other columns are nominal.
- Default: none
- Data type: VARCHAR(4)
- Data type: VARCHAR(10)
- coldefrole
- Optional.
- The default role of the input table columns.
- Allowed values are
input
andignore
. - Allowed values are:
- "active", "in", and "input" for role input
- "ignore" and "inactive" for role ignore
- If the parameter is not specified, all columns are input columns.
- Default: input
- Data type: VARCHAR(8)
- colPropertiesTable
- Optional.
- The input table where properties of the columns of the input table are stored.
- If this parameter is not specified, the column properties of the input table column properties are detected automatically.
- Default: none
- Data type: VARCHAR(128)
- weights
- Optional.
- The input table that contains optional instance weights or class weights for the columns of the input table.
- If this parameter is not specified, all weights have the value 1.
- Default: none
- Data type: VARCHAR(128)
- eval
- Optional.
- The class impurity measure that is used for split evaluation.
- Allowed values are
entropy
andgini
. - Default: entropy
- Data type: VARCHAR(8)
- minimprove
- Optional.
- The minimum improvement of the measure that is required for split evaluation.
- Default: 0.02
- Minimum: 0.0
- Data type: DOUBLE
- minsplit
- Optional.
- The minimum number of instances per tree node that can be split.
- Default: 50
- Minimum: 2
- Data type: INTEGER
- maxdepth
- Optional.
- The maximum number of tree levels and leaves.
- Default: 10
- Minimum: 1
- Maximum: 62
- Data type: INTEGER
- valtable
- Optional.
- The input table that contains the validation data set.
- If this parameter is not specified, no pruning is done.
- Default: none
- Data type: VARCHAR(128)
- valweights
- Optional.
- The input table that contains the optional instance weights or class weights for the validation data set.
- Default: none
- Data type: VARCHAR(128)
- qmeasure
- Optional.
- The quality measure for pruning.
- Allowed values are
Acc
orwAcc
. - Default: Acc
- Data type: VARCHAR(4)
- statistics
- Optional.
- Indicates the statistics that are to be collected.
- Allowed values are
none
,columns
,values:n
, andall
. - The following conditions apply:
- If statistics=none is specified, no statistics are collected.
- If statistics=columns is specified, statistics on the columns of the input table are collected, for example, mean values.
- If statistics=values:n is specified, and if n is a positive number,
statistics on the columns and the column values are collected. Up to <n> column value statistics are collected.
- If a nominal column contains more than <n> values, only the <n> most frequent column statistics are kept.
- If a numeric column contains more than <n> values, the values are discretized, and the statistics are collected on the discretized values.
- statistics=all is identical to statistics=values:100.
- Default: none
- Data type: VARCHAR(32)
Returned information
The number of the nodes of a decision tree as a result set.
Example
CALL IDAX.DECTREE('model=adult_dectree, intable=adult_train, id=id, target=marital_status, valtable=adult_prune');