Linear Lasso Regression

Linear Lasso uses the Python sklearn.linear_model.Lasso class to estimate L1 loss regularized linear regression models for a dependent variable on one or more independent variables, and includes optional modes to display trace plots and to select the alpha hyperparameter value based on crossvalidation. When a single model is fitted or crossvalidation is used to select alpha, a partition of holdout data can be used to estimate out-of-sample performance.

In addition to fitting a model with a specified value of the alpha regularization parameter, linear lasso can display a trace plot of coefficient values for a range of alpha values, or facilitate choice of the hyperparameter value via k-fold crossvalidation on specified grids of values. If a single model is fitted or alpha selection via crossvalidation is performed, the final model can be applied to held-out data that is created by a partition of the input data to obtain a valid estimate of out-of-sample performance of the model.

Obtaining a Linear Lasso Regression analysis

  1. From the menus choose:

    Analyze > Regression > Linear OLS Alternatives > Lasso

    The dialog allows you to specify a variable that assigns each case in the active dataset to the training or holdout sample.

  2. Select a numeric target variable. Only one target variable is required to run an analysis.
  3. Specify a numeric dependent.
  4. Specify at least one categorical factor variable or numeric covariate variable.

Optionally, Partition provides a way to create a holdout or test subset of the input data for estimation of out-of-sample performance of the specified or chosen model. All partitioning is performed after listwise deletion of any cases with invalid data for any variable used by the procedure. Note that for crossvalidation, folds or partitions of the training data are created in Python. The holdout data created by the partition is not used in estimation, regardless of the mode in effect.

The partition can be defined either by specifying the ratio of cases randomly assigned to each sample (under Training and Holdout partitions), or by a variable that assigns each case to the training or holdout sample. You cannot specify both training and variables. If the partition is not specified, a holdout sample is created of approximately 30% of the input data is created.

The Training % specifies the relative number of cases in the active dataset to randomly assign to the training sample. The default training is 70%.

This procedure pastes LINEAR LASSO REGRESSION command syntax.