Example 3: Generating Random Data

You can use command syntax to generate variables that have approximately a normal distribution. Commands for generating five standard normal variables (X1 through X5) for 1000 cases are shown in the following figure. As shown in the output below, each variable has a mean of approximately 0 and a standard deviation of approximately 1.

Figure 1. Data-generating commands
INPUT PROGRAM.
-   VECTOR X(5).
-       LOOP #I = 1 TO 1000.
-           LOOP #J = 1 TO 5.
-             COMPUTE X(#J) = NORMAL(1).
-           END LOOP.
-           END CASE.
-       END LOOP.
-       END FILE.
END INPUT PROGRAM.

DESCRIPTIVES VARIABLES X1 TO X5.
Figure 2. Descriptive statistics for generated data
                                           Valid
Variable  Mean  Std Dev  Minimum  Maximum      N  Label
X1        -.01     1.02    -3.11     4.15   1000
X2         .08     1.03    -3.19     3.22   1000
X3         .02     1.00    -3.01     3.51   1000
X4         .03     1.00    -3.35     3.19   1000
X5        -.01      .96    -3.34     2.91   1000

The !DATAGEN macro below issues the data-generating commands shown above.

Figure 3. !DATAGEN macro
DEFINE !DATAGEN ().

INPUT PROGRAM.
-    VECTOR X(5).
-        LOOP #I = 1 TO 1000.
-            LOOP #J = 1 TO 5.
-                COMPUTE X(#J) = NORMAL(1).
-            END LOOP.
-            END CASE.
-        END LOOP.
-        END FILE.
END INPUT PROGRAM.

DESCRIPTIVES VARIABLES X1 TO X5.

!ENDDEFINE.

!DATAGEN.

The data-generating commands are imbedded between macro definition commands. The macro produces the same data and descriptive statistics as shown above.

You can tailor the generation of normally distributed variables if you modify the !DATAGEN macro so it will accept keyword arguments, as shown in the following figure. The macro allows you to specify the number of variables and cases to be generated and the approximate standard deviation.

Figure 4. !DATAGEN macro with keyword arguments
DEFINE !DATAGEN ( OBS =!TOKENS(1) !DEFAULT(1000)
                  /VARS =!TOKENS(1) !DEFAULT(5)
                  /SD =!CMDEND !DEFAULT(1)).
INPUT PROGRAM.
-   VECTOR X(!VARS).
-       LOOP #I = 1 TO !OBS.
-           LOOP #J = 1 TO !VARS.
-               COMPUTE X(#J) = NORMAL(!SD).
-           END LOOP.
-           END CASE.
-       END LOOP.
-       END FILE.
END INPUT PROGRAM.

!LET !LIST = !NULL.
!DO  !I = 1 !TO !VARS.
-   !LET !LIST = !CONCAT(!LIST, ‘ ‘, X, !I).
!DOEND.

DESCRIPTIVES VARIABLES !LIST.

!ENDDEFINE.

!DATAGEN OBS=500 VARS=2 SD=1.
!DATAGEN.
  • The DEFINE statement declares arguments that specify the number of cases (OBS), variables (VARS), and standard deviation (SD). By default, the macro creates 1000 cases with 5 variables that have a standard deviation of 1.
  • Commands between INPUT PROGRAM and END INPUT PROGRAM generate the new data using values of the macro arguments.
  • Commands !LET and !DO/!DOEND construct a variable list (!LIST) that is used in DESCRIPTIVES. The first !LET command initializes the list to a null (blank) string value. For each new variable, the index loop adds to the list a string of the form X1, X2, X3, and so forth. Thus, DESCRIPTIVES requests means and standard deviations for each new variable.
  • The first macro call generates 500 cases with two standard normal variables. The second call requests the default number of variables, cases, and standard deviation. Descriptive statistics (not shown) are also computed for each variable.

As shown in the following figure, you can declare additional keyword arguments that allow you to specify the distribution (normal or uniform) of the generated data and a parameter value that is used as the standard deviation (for normally distributed data) or a range (for uniformly distributed data).

Figure 5. !DATAGEN macro with additional keyword arguments
DEFINE !DATAGEN (OBS    =!TOKENS(1) !DEFAULT(1000)
                 /VARS  =!TOKENS(1) !DEFAULT(5)
                 /DIST  =!TOKENS(1) !DEFAULT(NORMAL)
                 /PARAM =!TOKENS(1) !DEFAULT(1)).
INPUT PROGRAM.
-   VECTOR X(!VARS).
-       LOOP #I = 1 TO !OBS.
-           LOOP #J = 1 TO !VARS.
-               COMPUTE X(#J) = !DIST(!PARAM).
-           END LOOP.
-           END CASE.
-       END LOOP.
-       END FILE.
END INPUT PROGRAM.

!LET !LIST = !NULL.
!DO  !I = 1 !TO !VARS.
-   !LET !LIST = !CONCAT(!LIST, ’ ’, X, !I).
!DOEND.

DESCRIPTIVES VARIABLES !LIST.
!ENDDEFINE.

!DATAGEN OBS=500 VARS=2 DIST=UNIFORM PARAM=2.
  • The DEFINE statement declares arguments OBS, VARS, DIST, and PARAM. OBS and VARS represent the number of observations and cases to be generated. Arguments DIST and PARAM specify the shape and parameter of the distribution of generated data. By default, the macro generates 1000 observations with 5 standard normal variables.
  • Statements between INPUT PROGRAM and END INPUT PROGRAM generate the new data using values of macro arguments.
  • Remaining commands in the body of the macro obtain descriptive statistics for generated variables.
  • The macro call in creates two approximately uniformly distributed variables with a range of 2. The output from the macro call is shown below.
Figure 6. Descriptive statistics for uniform variables
                                           Valid
Variable  Mean  Std Dev  Minimum  Maximum      N  Label
X1         .99      .57      .00     2.00    500
X2        1.00      .57      .00     2.00    500