Data mining — Defining mining settings

You must define mining settings for your training mining run. The mining settings contain the specific instructions for the mining function that you want to use. The definitions and roles of the fields also belong to the settings specification. This means that the logical data specification is part of the mining settings.

The following list shows the mining functions that provide mining-specific settings:

Clustering: You can specify the maximum number of clusters to be created or the similarity threshold for a distribution-based clustering training run.
Associations: You can reduce to your preferred level the number of associations that are created.
Classification and Regression: You must specify the name of the target field.

Defining mining settings in the model-building process means to create the following data types and to perform operations on these data types:

DM_RuleSettings
DM_ClasSettings
DM_ClusSettings
DM_RegSettings
DM_TsSettings

The data types store the settings for associations mining runs, classification mining runs, clustering mining runs, regression training runs, or sequence rules mining runs. These mining settings are sets of control parameters.

You can limit the fields that are used for mining by manually indicating the fields to be used.

You can also use automatic preprocessing and customization to identify the fields to be used or to be dropped. Using the automatic preprocessing includes the following advantages:

Reducing the mining skills that are necessary to deploy mining functions
Invoking all algorithms on arbitrary input tables without manual preprocessing and elaborate settings customization
Providing default settings to ensure good run-time results and models