Overview (SAMPLE command)
SAMPLE
permanently
draws a random sample of cases for processing in all subsequent procedures.
For a temporary sample, use a TEMPORARY
command before SAMPLE
.
Basic Specification
The basic
specification is either a decimal value between 0 and 1 or the sample
size followed by keyword FROM
and the size of the active dataset.
- To select an approximate percentage of cases, specify a decimal value between 0 and 1.
- To select an exact-size random sample,
specify a positive integer that is less than the file size, and follow
it with keyword
FROM
and the file size.
Operations
-
SAMPLE
is a permanent transformation. - Sampling is based on a pseudo-random-number generator
that depends on a seed value that is established by the program. On
some implementations of the program, this number defaults to a fixed
integer, and a
SAMPLE
command that specifies nFROM
m will generate the identical sample whenever a session is rerun. To generate a different sample each time, use theSET
command to resetSEED
to a different value for each session. See theSET
command for more information. - If sampling is done by using the n
FROM
m method, and theTEMPORARY
command is specified, successive samples will not be the same because the seed value changes each time that a random-number series is needed within a session. - A proportional sample (a sample that is based on a decimal value) usually does not produce the exact proportion that is specified.
- If the number that is specified for m following
FROM
is less than the actual file size, the sample is drawn only from the first m cases. - If the number
following
FROM
is greater than the actual file size, the program samples an equivalent proportion of cases from the active dataset. - If
SAMPLE
followsSELECT IF
,SAMPLE
samples only cases that are selected bySELECT IF
. - If
SAMPLE
precedesSELECT IF
, cases are selected from the sample. - If more
than one
SAMPLE
command is specified in a session, each command acts on the sample that was selected by the precedingSAMPLE
command. - If
N OF CASES
is used withSAMPLE
, the program reads as many records as required to build the specified n cases. It makes no difference whetherN OF CASES
precedes or followsSAMPLE
.
Limitations
SAMPLE
cannot be placed in a FILE TYPE-END FILE TYPE
or INPUT PROGRAM-END INPUT PROGRAM
structure. SAMPLE
can be placed nearly anywhere following
these commands in a transformation program. See the topic Commands and Program States for more information. .