Partial Least Squares Regression

The Partial Least Squares Regression procedure estimates partial least squares (PLS, also known as "projection to latent structure") regression models. PLS is a predictive technique that is an alternative to ordinary least squares (OLS) regression, canonical correlation, or structural equation modeling, and it is particularly useful when predictor variables are highly correlated or when the number of predictors exceeds the number of cases.

PLS combines features of principal components analysis and multiple regression. It first extracts a set of latent factors that explain as much of the covariance as possible between the independent and dependent variables. Then a regression step predicts values of the dependent variables using the decomposition of the independent variables.

Tables
Proportion of variance explained (by latent factor), latent factor weights, latent factor loadings, independent variable importance in projection (VIP), and regression parameter estimates (by dependent variable) are all produced by default.
Charts
Variable importance in projection (VIP), factor scores, factor weights for the first three latent factors, and distance to the model are all produced from the Options tab.

Data considerations

Measurement level
The dependent and independent (predictor) variables can be scale, nominal, or ordinal. The procedure assumes that the appropriate measurement level has been assigned to all variables, although you can temporarily change the measurement level for a variable by right-clicking the variable in the source variable list and selecting a measurement level from the pop-up menu. Categorical (nominal or ordinal) variables are treated equivalently by the procedure.
Categorical variable coding
The procedure temporarily recodes categorical dependent variables using one-of-c coding for the duration of the procedure. If there are c categories of a variable, then the variable is stored as c vectors, with the first category denoted (1,0,...,0), the next category (0,1,0,...,0), ..., and the final category (0,0,...,0,1). Categorical dependent variables are represented using dummy coding; that is, simply omit the indicator corresponding to the reference category.
Frequency weights
Weight values are rounded to the nearest whole number before use. Cases with missing weights or weights less than 0.5 are not used in the analyses.
Missing values
User- and system-missing values are treated as invalid.
Rescaling
All model variables are centered and standardized, including indicator variables representing categorical variables.

Obtaining Partial Least Squares Regression

From the menus choose:

Analyze > Regression > Partial Least Squares...

  1. Select at least one dependent variable.
  2. Select at least one independent variable.

Optionally, you can:

  • Specify a reference category for categorical (nominal or ordinal) dependent variables.
  • Specify a variable to be used as a unique identifier for casewise output and saved datasets.
  • Specify an upper limit on the number of latent factors to be extracted.

This procedure pastes PLS command syntax.

Prerequisites

The Partial Least Squares Regression procedure is a Python extension command and requires IBM® SPSS® Statistics - Essentials for Python, which is installed by default with your IBM SPSS Statistics product. It also requires the NumPy and SciPy Python libraries, which are freely available.
Note: For users working in distributed analysis mode (requires IBM SPSS Statistics Server), NumPy and SciPy must be installed on the server. Contact your system administrator for assistance.
Windows and Mac Users

For Windows and Mac, NumPy and SciPy must be installed to a separate version of Python 3.8 from the version that is installed with IBM SPSS Statistics. If you do not have a separate version of Python 3.8, you can download it from http://www.python.org. Then, install NumPy and SciPy for Python version 3.8. The installers are available from http://www.scipy.org/Download.

To enable use of NumPy and SciPy, you must set your Python location to the version of Python 3.8 where you installed NumPy and SciPy. The Python location is set from the File Locations tab in the Options dialog (Edit > Options).

Linux Users

We suggest that you download the source and build NumPy and SciPy yourself. The source is available from http://www.scipy.org/Download. You can install NumPy and SciPy to the version of Python 3.8 that is installed with IBM SPSS Statistics. It is in the Python directory under the location where IBM SPSS Statistics is installed.

If you choose to install NumPy and SciPy to a version of Python 3.8 other than the version that is installed with IBM SPSS Statistics, then you must set your Python location to point to that version. The Python location is set from the File Locations tab in the Options dialog (Edit > Options).

Windows and Unix Server

NumPy and SciPy must be installed, on the server, to a separate version of Python 3.8 from the version that is installed with IBM SPSS Statistics. If there is not a separate version of Python 3.8 on the server, then it can be downloaded from http://www.python.org. NumPy and SciPy for Python 3.8 are available from http://www.scipy.org/Download. To enable use of NumPy and SciPy, the Python location for the server must be set to the version of Python 3.8 where NumPy and SciPy are installed. The Python location is set from the IBM SPSS Statistics Administration Console.