Understanding this formula’s operations requires familiarity with matrix notation. But at present, all we need to understand is that the size and contents of the X matrix are determined by the independent variables chosen as the model’s parameters. Moreover, the degree of correlation between predictor variables—known as correlation coefficients and represented by —are used in calculating regression coefficients between X and Y.3

As independent variables are included or excluded from the model, the estimated coefficients for any one predictor can change drastically, making coefficient estimates unreliable and imprecise. Correlation between two or more predictors creates difficulty in determining any one variable’s individual impact on the model output. Remember that a regression coefficient measures a given predictor variable’s effect on the output assuming other predictors remain constant. But if predictors are correlated, it may not be possible to isolate predictors. Thus, estimated regression coefficients for multicollinear variables do not reflect any one predictor’s effect on the output but rather the predictor’s partial effect, depending on which covariates are in the model.4

Additionally, different data samples, or even small changes in data, with the same multicollinear variables can produce widely different regression coefficient. This is perhaps the most widely known problem of multicollinearity: overfitting. Overfitting denotes models with low training error and high generalization error. As mentioned, statistical significance of any one multicollinear variable remains unclear amidst its relational noise with the others. This prevents precise calculation of any one variable’s statistical significance on the model’s output, which is what the coefficient estimate largely indicates. Because multicollinearity prevents calculating precise coefficient estimates, multicollinear models fail to generalize onto unseen data. In this way, estimated coefficients for multicollinear variables possess a large variability, also known as a large standard error.5