Overview (SAMPLE command)
SAMPLE permanently draws a random sample of cases for processing in all subsequent procedures. For a temporary sample, use a TEMPORARY command before SAMPLE.
Basic Specification
The basic specification is either a decimal value between 0 and 1 or the sample size followed by keyword FROM and the size of the active dataset.
- To select an approximate percentage of cases, specify a decimal value between 0 and 1.
- To select an exact-size random sample, specify a positive integer that is less than the file size, and follow it with keyword FROM and the file size.
Operations
- SAMPLE is a permanent transformation.
- Sampling is based on a pseudo-random-number generator that depends on a seed value that is established by the program. On some implementations of the program, this number defaults to a fixed integer, and a SAMPLE command that specifies n FROM m will generate the identical sample whenever a session is rerun. To generate a different sample each time, use the SET command to reset SEED to a different value for each session. See the SET command for more information.
- If sampling is done by using the n FROM m method, and the TEMPORARY command is specified, successive samples will not be the same because the seed value changes each time that a random-number series is needed within a session.
- A proportional sample (a sample that is based on a decimal value) usually does not produce the exact proportion that is specified.
- If the number that is specified for m following FROM is less than the actual file size, the sample is drawn only from the first m cases.
- If the number following FROM is greater than the actual file size, the program samples an equivalent proportion of cases from the active dataset.
- If SAMPLE follows SELECT IF, SAMPLE samples only cases that are selected by SELECT IF.
- If SAMPLE precedes SELECT IF, cases are selected from the sample.
- If more than one SAMPLE command is specified in a session, each command acts on the sample that was selected by the preceding SAMPLE command.
- If N OF CASES is used with SAMPLE, the program reads as many records as required to build the specified n cases. It makes no difference whether N OF CASES precedes or follows SAMPLE.
Limitations
SAMPLE cannot be placed in a FILE TYPE-END FILE TYPE or INPUT PROGRAM-END INPUT PROGRAM structure. SAMPLE can be placed nearly anywhere following these commands in a transformation program. See the topic Commands and Program States for more information. .