Lasso regression—also known as L1 regularization—is a form of regularization for linear regression models. Regularization is a statistical method to reduce errors caused by overfitting on training data. This approach can be reflected with this formula:
w-hat = argminw MSE(W ) + ||w||1
The concepts behind the Lasso technique can be traced to a 1986 geophysics research paper (link resides outside ibm.com) by Santosa and Symes1, which used the L1 penalty for coefficients. However, in 1996, statistician, Robert Tibshirani, independently developed and popularized the term2 (link resides outside ibm.com), “lasso”, based on Breiman’s nonnegative garrote work3 (link resides outside ibm.com).
Lasso stands for Least Absolute Shrinkage and Selection Operator. It is frequently used in machine learning to handle high dimensional data as it facilitates automatic feature selection with its application. It does this by adding a penalty term to the residual sum of squares (RSS), which is then multiplied by the regularization parameter (lambda or λ). This regularization parameter controls the amount of regularization applied. Larger values of lambda increase the penalty, shrinking more of the coefficients towards zero; this subsequently reduces the importance of (or altogether eliminates) some of the features from the model, resulting in automatic feature selection. Conversely, smaller values of lambda reduce the effect of the penalty, retaining more features within the model.
This penalty promotes sparsity within the model, which can help avoid issues of multicollinearity and overfitting issues within datasets. Multicollinearity occurs when two or more independent variables are highly correlated with one another, which can be problematic for causal modeling. Overfit models will generalize poorly to new data, diminishing their value altogether. By reducing regression coefficients to zero, lasso regression can effectively eliminate independent variables from the model, sidestepping these potential issues within modeling process. Model sparsity can also improve the interpretability of the model compared to other regularization techniques such as ridge regression (also known as L2 regularization).
As a note, this article focuses on regularization of linear regression models, but it’s worth noting that lasso regression may also be applied in logistic regression.