Usage of linear regression

The linear regression algorithm solves the least squares problem X * B = Y, where X is the input n * p matrix, B is a p * 1 vector, and Y is the n * 1 vector of target values.

In this implementation, matrix X is represented as an input table with specified input columns, Y is represented as a target column in table X. Both types of input columns, numerical columns and nominal columns, are handled.

Missing values, that is, NULL values, in the input table X are handled specifically as described in the following list. The handling depends on where the NULL values are contained in the linear regression algorithm.

  • NULL values are contained in the id column.
    • The corresponding column in the input table is scanned for NULL values and duplicates. If NULL values or duplicates are found, the algorithm stops. An error message is then shown.
    • This handling applies to the IDAX.LINEAR_REGRESSION stored procedure and the IDAX.PREDICT_LINEAR_REGRESSION stored procedure.
  • NULL values are contained in any of the input columns.
    • NULL values in any of the input columns are not valid. Therefore, the row from the input table in which one or more input columns contain a NULL value is ignored. An error message is not shown.
    • For the IDAX.LINEAR_REGRESSION stored procedure, the input table is considered as empty, if all rows from the input data set contain NULL values. An error message is shown.
    • For the IDAX.PREDICT_LINEAR_REGRESSION stored procedure, NULL values in input rows are ignored but the stored procedure is completed. The output table is also created.
  • NULL values are contained in the target column.
    • For the IDAX.LINEAR_REGRESSION stored procedure, NULL values in the target column are handled as NULL values in any of the input columns, that is, these rows are ignored. If the target column only contains NULL values, the input table is considered as empty.
    • For the IDAX.PREDICT_LINEAR_REGRESSION stored procedure, NULL values in the target column are not significant because the target itself is subject to prediction.