Specific settings for the Regression mining function
This section describes specific settings for Linear, Polynomial and RBF Regression.
- Linear Regression
- Polynomial Regression
- RBF Regression
The parameters of the Transform Regression algorithm cannot be changed.
Linear and Polynomial Regression specific settings
- Outlier treatment
- To specify how the Regression algorithms treat values that lie
outside of the scope of the value range of a field, you can use the
method
DM_setFldOutlTreaton aDM_RegSettingsvalue. For example, you might want to use the following specification:DM_setFldOutlTreat(fieldName, outlierTreatment)The default value is 0. This means that the outliers are treated as they are.
It is recommended that you define an outlier treatment only if you also define lower and upper bounds for the numerical fields in the logical data specification by using the method
DM_setFldOutlLim. If you specify an outlier treatment without specifying lower and upper bounds for numerical fields, the Regression algorithms calculate the limits automatically. - Maximum degree of Polynomial Regression
- To specify a maximum degree of the polynomial for the Polynomial
Regression algorithm, use the algorithm setting
MaxExpby usingDM_setAlgorithm().The resulting model has a polynomial of a degree less than or equal to the specified value.- For the Linear Regression algorithm, the only valid value is the default value of 1.
- For the Polynomial Regression algorithm, you can use values from
1 to 7. The default value is 3.For example, if you want to specify degree 2 for the polynomial, use the following statement:
DM_setAlgorithm('Polynomial','<MaxExp>2</MaxExp>')
- Minimum significance level
- If you specify a minimum significance level,
Stepwise Regressionis used to determine a subset of possible predictors. Only independent fields whose significance is above the specified value contribute to the calculation of the regression model.To set the algorithm settingMinSigLevelto a value between 0 and 1, you can use the methodDM_setAlgorithm. For example, if you want to set the minimum significance level to 0.5, use the following statement:DM_setAlgorithm('Polynomial",'<MinSigLevel>0.5</MinSigLevel>')If the minimum significance level is set to 0, all fields that are provided in the logical data specification that are not marked as
Supplementaryare used to calculate the model.To calculate a typical Linear Regression model, set the algorithm name toLinearand the minimum significance level to 0 by using the following statement:
If you do not specify a minimum significance level, the algorithm automatically tries to select an optimal subset of possible predictors.DM_setAlgorithm('Linear",'<MinSigLevel>0</MinSigLevel>') - Intercept
- You can force the regression curve to go through the origin. This
means that the model will not contain a constant term. To force the regression curve to go through the origin, you can use the algorithm setting
Interceptin the methodDM_SetAlgorithmby using the following statement:DM_setAlgorithm('Linear",'<Intercept>True</Intercept>')The default setting is
false. - Automatic feature selection
- The algorithm tries to determine an optimal subset of possible
predictors if you do not specify a minimum significance level. The algorithms support the following measures:
- Normal squared Pearson correlation coefficient
- Adjusted squared Pearson correlation coefficient
To use the adjusted squared Pearson correlation coefficient for the algorithm settingRSquaredModein the methodDM_setAlgorithm, you can use the following statement:DM_setAlgorithm('Polynomial",'<RSquaredMode>adjusted</RSquaredMode>')The default value is set to
normal.
RBF Regression-specific settings
- OutSampleSize
- This value represents the number of consecutive records to select
from the input data to be used when training shifts to the verification
phase to determine if the desired accuracy and error limit objectives
have been met. It is also used to detect overtraining of the model
in time. When these goals are met, the training mode ends. During
training, the specified number of records is skipped.
Default value is 1.
To specify a value of 2, use the following statement:DM_setAlgorithm('RBF','<OutSampleSize>2</OutSampleSize>') - InSampleSize
- This value represents the number of consecutive records to select
from the input data to be used for training. During the verification
phase, the specified number of records is skipped.
Default value is 2.
To specify a value of 1, use the following statement:DM_setAlgorithm('RBF','<inSampleSize>1</InSampleSize>') - MaxNumberCenters
- This value limits the number of centers that are built by the
mining function at each pass through the input data. Because the number
of centers can increase up to twice the initial number during a pass,
the actual number of centers might be higher than the number you specified.
If no value is specified, an appropriate value is calculated by the system.
To specify a value of 20, use the following statement:DM_setAlgorithm('RBF','<MaxNumberCenters>20</MaxNumberCenters>') - MinRegionSize
- This value determines the minimum numbers of records that are
assigned to a region. If at the end of a pass a region has a value
less than the specified value, the region is deleted.
Default value is 15.
To specify a value of 30, use the following statement:DM_setAlgorithm('RBF','<MinRegionSize>30</MinRegionSize>') - MinNumberPasses
- This value determines the minimum number of passes through the
input data. During these passes, overtraining or accuracy is not checked.
This saves time, however, specify a large value for minimum passes
only if you have enough training data and if you are sure that a good
model exists.
The minimum value to be specified is 2.
If no value is specified, an appropriate value is calculated by the system.
To specify a value of 5, use the following statement:DM_setAlgorithm('RBF','<MinNumberPasses>5</MinNumberPasses>') - MaxNumberPasses
- This value limits the number of times the mining function goes
through the data. If the desired prediction accuracy is reached before
this limit, the training stops.
If specified, the maximum number of passes must be greater or equal to the minimum number of passes.
If no value is specified, an appropriate value is calculated by the system.
To specify a value of 5, use the following statement:DM_setAlgorithm('RBF','<MaxNumberPasses>5</MaxNumberPasses>')