Overview (CSSELECT command)
CSSELECT selects complex, probability-based samples from a population. CSSELECT selects units according to a sample
design created using the CSPLAN procedure.
Options
Scope of Execution. By default, CSSELECT executes all stages defined in
the sampling plan. Optionally, you can execute specific stages of
the design. This capability is useful if a full sampling frame is
not available at the outset of the sampling process, in which case
new stages can be sampled as they become available. For example, CSSELECT might first be used to sample cities,
then to sample blocks, and finally to sample individuals. Each time
a different stage of the sampling plan would be executed.
Seed. By default, a random seed value is used by the CSSELECT random number generator. You can specify a seed
to ensure that the same sample will be drawn when CSSELECT is invoked repeatedly using the
same sample plan and population frame. The CSSELECT seed value is independent of the global seed
specified via the SET command.
Missing Values. A case is excluded from the sample frame if it has a system-missing value for any input variable in the plan file. You can control whether user-missing values of stratification and cluster variables are treated as invalid. User-missing values of measure variables are always treated as invalid.
Input Data. If the sampling frame is sorted in advance, you can specify that the data are presorted, which may improve performance when stratification and/or clustering is requested for a large sampling frame.
Sample
Data. CSSELECT writes
data to the active dataset (the default) or an external file. Regardless
of the data destination, CSSELECT generates final sampling weights, stagewise inclusion probabilities,
stagewise cumulative sampling weights, as well as variables requested
in the sampling plan.
External files or
datasets produced by CSSELECT include selected cases only. By default, all variables in the active
dataset are copied to the external file or dataset. Optionally, you
can specify that only certain variables are to be copied.
Joint Probabilities. First-stage joint inclusion probabilities are automatically saved
to an external file when the plan file specifies a PPS without-replacement
sampling method. Joint probabilities are used by Complex Samples analysis
procedures, such as CSDESCRIPTIVES and CSTABULATE. You can control
the name and location of the joint probabilities file.
Output. By default, CSSELECT displays
the distribution of selected cases by stratum. Optionally, you can
display a case-processing summary.
Basic Specification
- The basic
specification is a
PLANsubcommand that specifies a sample design file. - By default,
CSPLANwrites output data to the active dataset including final sample weights, stagewise cumulative weights, and stagewise inclusion probabilities. See theCSPLANdesign for a description of available output variables.
Operations
-
CSSELECTselects sampling units according to specifications given in a sample plan. Typically, the plan is created using theCSPLANprocedure. - In general, elements are selected. If cluster sampling is performed, groups of elements are selected.
-
CSSELECTassumes that the active dataset represents the sampling frame. If a multistage sample design is executed, the active dataset should contain data for all stages. For example, if you want to sample individuals within cities and city blocks, then each case should be an individual, and city and block variables should be coded for each individual. WhenCSSELECTis used to execute particular stages of the sample design, the active dataset should represent the subframe for those stages only. - A case is excluded from the sample frame if it has a system-missing value for any input variable in the plan.
- You can control whether user-missing values of stratification and cluster variables are treated as valid. By default, they are treated as invalid.
- User-missing values of measure variables are always treated as invalid.
- The
CSSELECTprocedure has its own seed specification that is independent of the globalSETcommand. - First-stage joint inclusion probabilities are automatically saved to an external file when the plan file specifies a PPS without-replacement sampling method. By default, the joint probabilities file is given the same name as the plan file (with a different extension) and is written to the same location.
- Output data must be written to an external data file if with-replacement sampling is specified in the plan file.
- This procedure uses the multithreaded options specified
by
SET THREADS.
Syntax Rules
- The
PLANsubcommand is required. All other subcommands are optional. - Only a single instance of each subcommand is allowed.
- An error occurs if an attribute or keyword is specified more than once within a subcommand.
- An error occurs if the same output file is specified for more than one subcommand.
- Equals signs shown in the syntax chart are required.
- Subcommand names and keywords must be spelled in full.
- Empty subcommands are not allowed.
Limitations
-
WEIGHTandSPLIT FILEsettings are ignored with a warning by theCSSELECTprocedure.