Overview (SAMPLE command)

SAMPLE permanently draws a random sample of cases for processing in all subsequent procedures. For a temporary sample, use a TEMPORARY command before SAMPLE.

Basic Specification

The basic specification is either a decimal value between 0 and 1 or the sample size followed by keyword FROM and the size of the active dataset.

  • To select an approximate percentage of cases, specify a decimal value between 0 and 1.
  • To select an exact-size random sample, specify a positive integer that is less than the file size, and follow it with keyword FROM and the file size.

Operations

  • SAMPLE is a permanent transformation.
  • Sampling is based on a pseudo-random-number generator that depends on a seed value that is established by the program. On some implementations of the program, this number defaults to a fixed integer, and a SAMPLE command that specifies n FROM m will generate the identical sample whenever a session is rerun. To generate a different sample each time, use the SET command to reset SEED to a different value for each session. See the SET command for more information.
  • If sampling is done by using the n FROM m method, and the TEMPORARY command is specified, successive samples will not be the same because the seed value changes each time that a random-number series is needed within a session.
  • A proportional sample (a sample that is based on a decimal value) usually does not produce the exact proportion that is specified.
  • If the number that is specified for m following FROM is less than the actual file size, the sample is drawn only from the first m cases.
  • If the number following FROM is greater than the actual file size, the program samples an equivalent proportion of cases from the active dataset.
  • If SAMPLE follows SELECT IF, SAMPLE samples only cases that are selected by SELECT IF.
  • If SAMPLE precedes SELECT IF, cases are selected from the sample.
  • If more than one SAMPLE command is specified in a session, each command acts on the sample that was selected by the preceding SAMPLE command.
  • If N OF CASES is used with SAMPLE, the program reads as many records as required to build the specified n cases. It makes no difference whether N OF CASES precedes or follows SAMPLE.

Limitations

SAMPLE cannot be placed in a FILE TYPE-END FILE TYPE or INPUT PROGRAM-END INPUT PROGRAM structure. SAMPLE can be placed nearly anywhere following these commands in a transformation program. See the topic Commands and Program States for more information. .