Overview (SAMPLE command)
SAMPLE permanently
draws a random sample of cases for processing in all subsequent procedures.
For a temporary sample, use a TEMPORARY command before SAMPLE.
Basic Specification
The basic
specification is either a decimal value between 0 and 1 or the sample
size followed by keyword FROM and the size of the active dataset.
- To select an approximate percentage of cases, specify a decimal value between 0 and 1.
- To select an exact-size random sample,
specify a positive integer that is less than the file size, and follow
it with keyword
FROMand the file size.
Operations
-
SAMPLEis a permanent transformation. - Sampling is based on a pseudo-random-number generator
that depends on a seed value that is established by the program. On
some implementations of the program, this number defaults to a fixed
integer, and a
SAMPLEcommand that specifies nFROMm will generate the identical sample whenever a session is rerun. To generate a different sample each time, use theSETcommand to resetSEEDto a different value for each session. See theSETcommand for more information. - If sampling is done by using the n
FROMm method, and theTEMPORARYcommand is specified, successive samples will not be the same because the seed value changes each time that a random-number series is needed within a session. - A proportional sample (a sample that is based on a decimal value) usually does not produce the exact proportion that is specified.
- If the number that is specified for m following
FROMis less than the actual file size, the sample is drawn only from the first m cases. - If the number
following
FROMis greater than the actual file size, the program samples an equivalent proportion of cases from the active dataset. - If
SAMPLEfollowsSELECT IF,SAMPLEsamples only cases that are selected bySELECT IF. - If
SAMPLEprecedesSELECT IF, cases are selected from the sample. - If more
than one
SAMPLEcommand is specified in a session, each command acts on the sample that was selected by the precedingSAMPLEcommand. - If
N OF CASESis used withSAMPLE, the program reads as many records as required to build the specified n cases. It makes no difference whetherN OF CASESprecedes or followsSAMPLE.
Limitations
SAMPLE cannot be placed in a FILE TYPE-END FILE TYPE or INPUT PROGRAM-END INPUT PROGRAM structure. SAMPLE can be placed nearly anywhere following
these commands in a transformation program. See the topic Commands and Program States for more information. .