Examples (CATREG command)

CATREG VARIABLES = TEST1 TEST3 TEST2 TEST4 TEST5 TEST6 
   TEST7 TO TEST9 STATUS01 STATUS02
  /ANALYSIS TEST4 (LEVEL=NUME) 
   WITH TEST1 TO TEST2 (LEVEL=SPORD DEGREE=1 INKNOT=3) 
   TEST5 TEST7 (LEVEL=SPNOM) TEST8 (LEVEL=ORDI) 
   STATUS01 STATUS02 (LEVEL=NOMI)
  /DISCRETIZATION = TEST1(GROUPING NCAT=5 DISTR=UNIFORM)
                    TEST5(GROUPING) TEST7(MULTIPLYING)
  /INITIAL = RANDOM
  /MAXITER = 100
  /CRITITER = .000001
 /RESAMPLE BOOTSTRAP (100)
  /MISSING = MODEIMPU
  /PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02) 
  /PLOT = TRANS (TEST2 TO TEST7 TEST4)
  /SAVE
  /OUTFILE = '/data/qdata.sav'.

VARIABLES defines variables. The keyword TO refers to the order of the variables in the working data file.
The ANALYSIS subcommand defines variables used in the analysis. It is specified that TEST4 is the dependent variable, with optimal scaling level numerical and that the variables TEST1, TEST2, TEST3, TEST5, TEST7, TEST8, STATUS01, and STATUS02 are the independent variables to be used in the analysis. (The keyword TO refers to the order of the variables in the VARIABLES subcommand.) The optimal scaling level for TEST1, TEST2, and TEST3 is spline ordinal; for TEST5 and TEST7, spline nominal; for TEST8, ordinal; and for STATUS01 and STATUS02, nominal. The splines for TEST1 and TEST2 have degree 1 and three interior knots, and the splines for TEST5 and TEST7 have degree 2 and two interior knots (default because unspecified).
DISCRETIZATION specifies that TEST5 and TEST7, which are fractional-value variables, are discretized: TEST5 by recoding into seven categories with a normal distribution (default because unspecified) and TEST7 by “multiplying.” TEST1, which is a categorical variable, is recoded into five categories with a close-to-uniform distribution.
Because there are nominal variables, a random initial solution is requested by the INITIAL subcommand.
MAXITER specifies the maximum number of iterations to be 100. This is the default, so this subcommand could be omitted here.
CRITITER sets the convergence criterion to a value smaller than the default value.
To include cases with missing values, the MISSING subcommand specifies that for each variable, missing values are replaced with the most frequent category (the mode).
RESAMPLE specifies the .632 bootstrap for estimation of the prediction error using 100 bootstrap samples (in stead of the default of 50).
PRINT specifies the correlations, the coefficients, the descriptive statistics for all variables, the ANOVA table, the category quantifications for variables TEST1, TEST2, TEST3, STATUS01, and STATUS02, and the transformed data list of all cases.
PLOT is used to request quantification plots for the variables TEST2, TEST5, TEST7, and TEST4.
The SAVE subcommand adds the transformed variables to the working data file. The names of these new variables are TRANS1_1, ..., TRANS9_1.
The OUTFILE subcommand writes the transformed data to a data file called qdata.sav in the directory /data.

Example: Multiple Systematic Starts

CATREG
 ...
 /INITIAL MULTISTART(ALL)(‘c:\data\startsigns.sav’)
 /PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02) 
 /PLOT = TRANS (TEST2 TO TEST7 TEST4)
 /SAVE TRDATA PRED RES
 /OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’)

Because the ordinal and spline ordinal scaling levels are specified for some variables, there is chance of obtaining a suboptimal solution when applying the numerical or random initial solution. To ascertain obtaining the optimal solution, all multiple systematic starts are used. Using all systematic starts is feasible here because the number of variables with (spline) ordinal scaling is only 3; then the number of all starts is 2 to the power of 3 is 8. With a larger number of variables with (spline) ordinal scaling level, a reduced number of starts is recommended, which can be requested by specifying /INITIAL MULTISTART(value).
The specifications at the PRINT, PLOT, SAVE, and OUTFILE subcommands will be applied to the optimal solution.

Example: Fixing Initial Signs for Regression Coefficients

CATREG 
 ...
 /INITIAL FIXSIGNS (63) (‘c:\data\startsigns.sav’)

The INITIAL subcommand specifies using a specific set of fixed signs for the regression coefficients. The signs are in the file startsigns.sav in the directory c:\data. This file was created by a previous run of CATREG with keyword MULTISTART at the INITIAL subcommand (see previous example). The signs of start number 63 are specified to be used.

Example: Elastic Net Regularization

CATREG
 ...
 /REGULARIZATION ENET (.5 2.5 .25) (.01 3.8 .05)(‘c:\data\regu_enet.sav’)
 /PRINT = REGU R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2)
 /PLOT = REGU (.75 1.5) TRANS (TEST2 TO TEST7 TEST4) 
 /SAVE TRDATA PRED RES
 /OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).

REGULARIZATION specifies application of Elastic Net regularization, with start value of the Lasso penalty 0.01, stop value 3.8, and increment 0.05, resulting in 76 regularized models, with Lasso penalty values 0.01,0 .06, ..., 3.76. To each of these 76 Lasso models 10 Ridge penalties are applied (0.5, 0.75, ..., 2.5), resulting in 76 × 10 = 760 Elastic Net models.
PRINT specifies displaying a table with the penalty values, R-squared, and the regression coefficients for each regularized model. The contents of this table is written to a data file called regu_enet.sav in the directory c:\data.
The PLOT subcommand requests two Elastic Net plots: a Lasso plot with a fixed Ridge penalty of 0.75 and a Lasso plot with a fixed Ridge penalty of 1.50. Any other keywords than REGU at the PLOT subcommand are ignored.
Other specifications then REGU at the PRINT and PLOT subcommands, the SAVE subcommand and the TRDATA keyword at the OUTFILE subcommand are ignored.

Example: Elastic Net Regularization with Crossvalidation Resampling

CATREG 
 ...
 /REGULARIZATION ENET (.5 2.5 .25)(.01 3.8 .05)(‘c:\data\regu_enet.sav’)
 /RESAMPLE CROSSVAL (5)
 /PRINT = REGU R COEFF DESCRIP ANOVA 
 /PLOT = REGU (.75 1.5) TRANS (TEST2 TO TEST7 TEST4)
 /SAVE TRDATA PRED RES
 /OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).

REGULARIZATION is the same as in the previous example.
The RESAMPLE subcommand specifies 5-fold cross-validation to estimate the prediction error for each of the 760 Elastic Net models.
PRINT specifies displaying a table with the penalty values, R-squared, the regression coefficients, and the estimated prediction error for each regularized model. The contents of this table is written to a data file called regu_enet.sav in the directory c:\data.
The specification at the PLOT subcommand result in the same plots as in the previous example.
The other specifications at the PRINT and PLOT subcommands, and the SAVE and OUTFILE specifications will be applied to the model with lowest prediction error.

Example: Obtaining a Specific Elastic Net Model

CATREG 
 ...
 /REGULARIZATION ENET (1.25 1.25 0)(.46 .46 0)
 /PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02) 
 /PLOT = TRANS (TEST2 TO TEST7 TEST4)
 /SAVE TRDATA PRED RES
 /OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).

REGULARIZATION is specified here (stop value equal to start value, increment zero) to obtain output for a specific Elastic Net model: the model with penalty values 1.25 (Ridge) and .46 (Lasso).