Specific settings for the Regression mining function

This section describes specific settings for Linear, Polynomial and RBF Regression.

You can specify settings for the following Regression algorithms:
  • Linear Regression
  • Polynomial Regression
  • RBF Regression

The parameters of the Transform Regression algorithm cannot be changed.

Linear and Polynomial Regression specific settings

You can specify the following parameters that are specific to the Linear Regression algorithm and the Polynomial Regression algorithm:
Outlier treatment
To specify how the Regression algorithms treat values that lie outside of the scope of the value range of a field, you can use the method DM_setFldOutlTreat on a DM_RegSettings value. For example, you might want to use the following specification:
DM_setFldOutlTreat(fieldName, outlierTreatment)

The default value is 0. This means that the outliers are treated as they are.

It is recommended that you define an outlier treatment only if you also define lower and upper bounds for the numerical fields in the logical data specification by using the method DM_setFldOutlLim. If you specify an outlier treatment without specifying lower and upper bounds for numerical fields, the Regression algorithms calculate the limits automatically.

Maximum degree of Polynomial Regression
To specify a maximum degree of the polynomial for the Polynomial Regression algorithm, use the algorithm setting MaxExp by using DM_setAlgorithm().
The resulting model has a polynomial of a degree less than or equal to the specified value.
  • For the Linear Regression algorithm, the only valid value is the default value of 1.
  • For the Polynomial Regression algorithm, you can use values from 1 to 7. The default value is 3.
    For example, if you want to specify degree 2 for the polynomial, use the following statement:
    DM_setAlgorithm('Polynomial','<MaxExp>2</MaxExp>')
Minimum significance level
If you specify a minimum significance level, Stepwise Regression is used to determine a subset of possible predictors. Only independent fields whose significance is above the specified value contribute to the calculation of the regression model.
To set the algorithm setting MinSigLevel to a value between 0 and 1, you can use the method DM_setAlgorithm. For example, if you want to set the minimum significance level to 0.5, use the following statement:
DM_setAlgorithm('Polynomial",'<MinSigLevel>0.5</MinSigLevel>')

If the minimum significance level is set to 0, all fields that are provided in the logical data specification that are not marked as Supplementary are used to calculate the model.

To calculate a typical Linear Regression model, set the algorithm name to Linear and the minimum significance level to 0 by using the following statement:
DM_setAlgorithm('Linear",'<MinSigLevel>0</MinSigLevel>')
If you do not specify a minimum significance level, the algorithm automatically tries to select an optimal subset of possible predictors.
Intercept
You can force the regression curve to go through the origin. This means that the model will not contain a constant term.
To force the regression curve to go through the origin, you can use the algorithm setting Intercept in the method DM_SetAlgorithm by using the following statement:
DM_setAlgorithm('Linear",'<Intercept>True</Intercept>')

The default setting is false.

Automatic feature selection
The algorithm tries to determine an optimal subset of possible predictors if you do not specify a minimum significance level.
The algorithms support the following measures:
  • Normal squared Pearson correlation coefficient
  • Adjusted squared Pearson correlation coefficient
To use the adjusted squared Pearson correlation coefficient for the algorithm setting RSquaredMode in the method DM_setAlgorithm, you can use the following statement:
DM_setAlgorithm('Polynomial",'<RSquaredMode>adjusted</RSquaredMode>')

The default value is set to normal.

RBF Regression-specific settings

You can specify the following parameters that are specific to the RBF Regression algorithm:
OutSampleSize
This value represents the number of consecutive records to select from the input data to be used when training shifts to the verification phase to determine if the desired accuracy and error limit objectives have been met. It is also used to detect overtraining of the model in time. When these goals are met, the training mode ends. During training, the specified number of records is skipped.

Default value is 1.

To specify a value of 2, use the following statement:
DM_setAlgorithm('RBF','<OutSampleSize>2</OutSampleSize>')
InSampleSize
This value represents the number of consecutive records to select from the input data to be used for training. During the verification phase, the specified number of records is skipped.

Default value is 2.

To specify a value of 1, use the following statement:
DM_setAlgorithm('RBF','<inSampleSize>1</InSampleSize>')
MaxNumberCenters
This value limits the number of centers that are built by the mining function at each pass through the input data. Because the number of centers can increase up to twice the initial number during a pass, the actual number of centers might be higher than the number you specified.

If no value is specified, an appropriate value is calculated by the system.

To specify a value of 20, use the following statement:
DM_setAlgorithm('RBF','<MaxNumberCenters>20</MaxNumberCenters>')
MinRegionSize
This value determines the minimum numbers of records that are assigned to a region. If at the end of a pass a region has a value less than the specified value, the region is deleted.

Default value is 15.

To specify a value of 30, use the following statement:
DM_setAlgorithm('RBF','<MinRegionSize>30</MinRegionSize>')
MinNumberPasses
This value determines the minimum number of passes through the input data. During these passes, overtraining or accuracy is not checked. This saves time, however, specify a large value for minimum passes only if you have enough training data and if you are sure that a good model exists.

The minimum value to be specified is 2.

If no value is specified, an appropriate value is calculated by the system.

To specify a value of 5, use the following statement:
DM_setAlgorithm('RBF','<MinNumberPasses>5</MinNumberPasses>')
MaxNumberPasses
This value limits the number of times the mining function goes through the data. If the desired prediction accuracy is reached before this limit, the training stops.

If specified, the maximum number of passes must be greater or equal to the minimum number of passes.

If no value is specified, an appropriate value is calculated by the system.

To specify a value of 5, use the following statement:
DM_setAlgorithm('RBF','<MaxNumberPasses>5</MaxNumberPasses>')