IBM Support

Spline Regression with Estimated Knots in SPSS



I want to fit a regression model where I expect the slope to shift up or down, or even change sign, at a particular value of the predictor. Also, I want that shift point to be estimated as a parameter in the model. Can this model be fitted in SPSS Statistics?

Resolving The Problem

Regression models in which the function changes at one or more points along the range of the predictor are called splines, or piecewise polynomials, and the location of these shifts are called knots. If the knots are fixed by the analyst, then splines can be fitted quite easily with the SPSS REGRESSION procedure. The properties of these splines are described in Technote 1476028, with a description of the process of fitting them with REGRESSION. If the knots are to be estimated from the data, i.e., if the knots are variable, then you can fit many such models with the SPSS NLR and CNLR procedures, as described below. These procedures perform Nonlinear Regression and Constrained Nonlinear Regression, respectively.

One property of NLR which allows us to fit variable-knot models is the facility to express multiplicative relationships between parameters. The MODEL PROGRAM commands from the first example, a 3-phase linear spline model, illustrates this point.

MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25.
COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa GE knot1)
+ bc1*(xa-knot2)*(xa GE knot2).
NLR y1 with xa / pred = predex1 /save pred resid (residex1).

The parameters ba0 and ba1 are the intercept and the linear coefficient, respectively, for the predictor, XA. The parameter bb1 is the second-phase adjustment to the slope. It is applied to (xa - knot1), where knot1 is the first knot. By multiplying this model term by the logical expression (xa GE knot1), we limit the influence of bb1 to values of XA at or above the first knot).

The parameter bc1 is a further adjustment to the slope in the third phase. Its relationship to knot2 is similar to the bb1-knot1 relationship. By naming knot1 and knot2 in the MODEL PROGRAM statement, we declare them as parameters to be estimated. The numbers assigned to each of the parameters are starting values for the iterative maximum likelihood estimation process used by NLR and CNLR.

However, one constraint must be placed on variable-knot models which is not required in fixed-knot models. The model must be continuous at the knots. There cannot be a "jump", or step, in the model at the knot. This constraint simply requires the absence of a phase 2 intercept, or step parameter, to be enforced in either NLR or CNLR. Also, the examples below are restricted to models where the number of knots is fixed by the analyst, although their location is estimated from the data. Finally, if you have more than one knot in the model, you may wish to constrain 'later' knots to have higher values than 'earlier' knots. The assignment of higher starting values for 'later' knots than 'earlier' knots may be sufficient for this purpose, but in a poorly-identified model, the additional constraint may be helpful. The CNLR procedure must be used for this purpose, with the constraints expressed in the /BOUNDS subcommand. Such a constraint is illustrated in Example 1, which has 2 knot points.

For a discussion of spline regression models with fixed or variable knots, see Chapter 9 of :.
Seber, G.A.F., & Wild, C.J. (1989). Nonlinear regression. New York: Wiley,

Three examples of analyses of variable-knot spline regression models are shown below. An input program is presented that constructs a data set that allows you to run these commands in SPSS. The predictor in each example is the variable XA which has integer values from 1 to 35. There are 3 replicates at each value of XA for a total sample size of 105. The dependent variables are Y1, Y2, and Y3 for examples 1, 2, and 3, respectively. They are generated with SPSS commands just before their respective analyses. The examples are somewhat contrived and we may have cheated slightly by using parameter starting values that equal the data-generating values for those parameters. However, these examples are merely illustrative of the process of using these commands for spline models.

The examples are described below with syntax commands. The procedures may be run from the

Graphic User Interface (GUI), or menu system, with the exception of a few nonessential options. The GUI steps are described in detail for the constrained version of Example 1 at the end of this technical note.


SET SEED = 2000000.
title 'spline regression with estimated knots, good start values'.
LOOP xa = 1 TO 35.
LOOP rep = 1 TO 3.
END case.
END file.

In the first example, the linear model has 2 knots,
with starting values at XA=15 and XA=25,
and is continuous at both knot points.
* y1 is dep var for example 1, with knots at 15 and 25, linear .
COMPUTE y1=3 + 3*xa + normal(2).
IF (xa gt 15) y1=y1 - 4*(xa-15).
IF (xa gt 25) y1=y1 + 2*(xa-25).


* Constraints not placed on knot values.
MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25.
COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa GE knot1)
+ bc1*(xa-knot2)*(xa GE knot2).
NLR y1 / PRED = predex1 /SAVE pred resid (residex1).
* Diagnostic plots of residuals, predicted values .
/SCATTERPLOT(BIVAR)= predex1 WITH residex1 .
/SCATTERPLOT(OVERLAY)=xa xa WITH y1 predex1 (PAIR) .

* Note that in SPSS versions prior to 5.0, you must include the
* predictor(s) in the NLR and CNLR commands, as in:
* NLR y1 WITH xa / PRED = predex1 /SAVE pred resid (residex1).
* Including the predictors list returns a warning in later SPSS versions,
* but the command will still run.

* Example 1 with KNOT2 constrained to be greater than KNOT1 .
MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25.
COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa ge knot1)
+ bc1*(xa-knot2)*(xa ge knot2).
CNLR y1 / PRED = predex1 /SAVE pred resid (residex1)
/ BOUNDS knot2 - knot1 > 0 .
/SCATTERPLOT(BIVAR)= predex1 WITH residex1 .
/SCATTERPLOT(OVERLAY)=xa xa WITH y1 predex1 (PAIR) .

* In the second example, there is one knot at XA=15. The spline is
* continuous at the knot, but the order of the polynomial changes.
* Before the knot, the relationship is linear; after the knot, quadratic.
* Y2 is dep var for example 2, knot at 15, linear,quadratic.
COMPUTE y2 = 4 + 2*xa + normal(.5).
IF (xa gt 15) y2=y2 + (xa - 15) - .5*(xa-15)*(xa-15).

* In this model, we compute an intermediate term, pc2, to
* stand for the expression (xa GE knot1). This term is just
* used to avoid repetitive typing of the logical expression.
* Note that knot1 is still the parameter to be estimated.

MODEL PROGRAM ba0=4 ba1=2 bb1=1 bb2=-0.5 knot1=15 .
COMPUTE pc2 = (xa GE knot1).
COMPUTE predex2 = ba0 + ba1*xa + bb1*(xa-knot1)*pc2
+ bb2*((xa-knot1)**2)*pc2.
NLR y2 / PRED = predex2 /SAVE pred resid (residex2).

* Diagnostic plots of residuals, predicted values .
/SCATTERPLOT(BIVAR)= predex2 WITH residex2 .
/SCATTERPLOT(OVERLAY)=xa xa WITH y2 predex2 (PAIR) .

* In the third example, there is one knot at XA=15. A cubic spline is
* fitted with the constraint that all second-order and lower-order
* derivatives are equal on both sides of the knot, which is equivalent to
* saying that only the cubic term of the function is adjusted after the
* knot. The continuity constraints are gradually relaxed in subsequent NLR
* runs that introduce the lower-order XB terms into the
* model. If you examine the correlations among the parameter estimates, you
* will see that the latter 2 models are poorly identified as many of these
* correlations are very high. In the final model, some of the parameter estimates
* are perfectly correlated. These models are shown here as illustrations
* of gradually loosening constraints in the model.
COMPUTE y3 = 6 + 3.5*xa - 1.5*xa*xa + .5*xa*xa*xa + normal(.5).
IF (xa gt 15) y3 = y3 - 2*(xa-15)*(xa-15)*(xa-15).

MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5 bb3=-2 knot1=15 .
COMPUTE pc2 = (xa ge knot1).
COMPUTE predex3 = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa
+ bb3*((xa-knot1)**3)*pc2.
NLR y3 / PRED = predex3 /SAVE pred resid (residex3).

* Diagnostic plots of residuals, predicted values .
/SCATTERPLOT(BIVAR)= predex3 WITH residex3 .
/SCATTERPLOT(OVERLAY)=xa xa WITH y3 predex3 (PAIR) .

* example 3 but with a quadratic term added to 2nd piece .
MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5 bb2=0 bb3=-2 knot1=15 .
COMPUTE pc2 = (xa ge knot1).
COMPUTE pred = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa
+ bb2*((xa-knot1)**2)*pc2 + bb3*((xa-knot1)**3)*pc2.
NLR y3.

* example 3 but with a quadratic & linear term added to 2nd piece .
MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5
bb1=0 bb2=0 bb3=-2 knot1=15 .
COMPUTE pc2 = (xa ge knot1).
COMPUTE pred = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa
+ bb1*(xa-knot1)*pc2 + bb2*((xa-knot1)**2)*pc2
+ bb3*((xa-knot1)**3)*pc2.
NLR y3.


Example 1 Constrained using the Graphic User Interface

From the Analyze menu, choose Regression and then Nonlinear to open the Nonlinear Regression dialog box.
From the variable list on the left side of the dialog, highlight the dependent variable (XA in this example) and click the arrow to paste it into the "Dependent:" box.

To define your parameters and provide starting values, click the Parameters button on the left side of the dialog, opening the Nonlinear Regression Parameters dialog. For each parameter in your model, including coefficients and knot points, do the following:

Type the name of the parameter into the "Name:" box. This name must not be the same as the name of any variables in the active data set.
Type the Starting Value for the parameter into the "Starting Value:" box.
Click the Add button, which will now be enabled. This will add the parameter and its starting value to the list of parameters,
When all of the parameters have been defined, click Continue to return to the main Nonlinear Regression dialog.

In the Model Expression box, you will need to write the nonlinear model with predictors, parameters, and
the relation between them. The Example 1 constrined model, as expressed in the COMPUTE PREDEX1 command
of the MODEL STATEMENT in the syntax commands, is:

COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa ge knot1)
+ bc1*(xa-knot2)*(xa ge knot2).

In the Nonlinear Regression dialog, you would type the right side of the equation into the Model Expression box:

ba0 + ba1*xa + bb1*(xa-knot1)*(xa ge knot1)+ bc1*(xa-knot2)*(xa ge knot2)

As you enter variables or parameters into the box, you can either type their names or highlight them in the
variable or parameter lists and click the arrow pointing to the "Model Expression:" box.
If you realize that you forgot to define a parameter, you reenter the Parameter dialog to
define it and then return to proceed with the model entry.

To constrain the knot parameters so that knot2 is greater than knot1, click the Constraints button to open the Nonlinear Regression Parameter Constraints dialog. In that dialog, click the "Define Parameter Constraints" radio button. The parameter name or relationship to be constrained is entered into the box just above the calculator pad. The constraining value is entered into the far right box. A relational operator is chosen with the scroll arrow. For this example we enter "knot2 - knot1" in the first box, choose ">=" (for "greater than or equal to") from the scroll bar and 0 in the far right box. Then
click the Add button to register the constraint. Click Continue when all constraints have been added.

To save the predicted values and residuals, click the Save button from the main Nonlinear Regression dialog. In the dialog box that opens, click the check box for "Predicted values" and "Residuals" and then click Continue.

You can click OK to run the procedure from the dialog box or Paste to paste the commands to a syntax window. If you wish to change the relation in the constraint to simply ">" (for "greater than", which was not available from the constraint dialog scroll bar), you will need to paste the commands and edit the /BOUNDS subcommand. In the Example 1 syntax, the bounds command was:

/BOUNDS knot2 - knot1 > 0

By default, the Nonlinear Regression procedure save the predicted values and residuals in the new
variables PRED_ and RESID. In the syntax for this example, we saved these variables as predex1 and residx1 with the /PRED and /SAVE subcommands, as follows:

CNLR y1 / PRED = predex1 /SAVE pred resid (residex1)
/ BOUNDS knot2 - knot1 > 0 .

There is no option in the Nonlinear Regression dialogs to change these variable names but you can change them in the Variable View of the Data Editor after running the procedure.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number


Document Information

Modified date:
16 April 2020