Overview (CSSELECT command)
CSSELECT selects complex, probability-based samples from a population. CSSELECT selects units according to a sample design created using the CSPLAN procedure.
Options
Scope of Execution. By default, CSSELECT executes all stages defined in the sampling plan. Optionally, you can execute specific stages of the design. This capability is useful if a full sampling frame is not available at the outset of the sampling process, in which case new stages can be sampled as they become available. For example, CSSELECT might first be used to sample cities, then to sample blocks, and finally to sample individuals. Each time a different stage of the sampling plan would be executed.
Seed. By default, a random seed value is used by the CSSELECT random number generator. You can specify a seed to ensure that the same sample will be drawn when CSSELECT is invoked repeatedly using the same sample plan and population frame. The CSSELECT seed value is independent of the global seed specified via the SET command.
Missing Values. A case is excluded from the sample frame if it has a system-missing value for any input variable in the plan file. You can control whether user-missing values of stratification and cluster variables are treated as invalid. User-missing values of measure variables are always treated as invalid.
Input Data. If the sampling frame is sorted in advance, you can specify that the data are presorted, which may improve performance when stratification and/or clustering is requested for a large sampling frame.
Sample Data. CSSELECT writes data to the active dataset (the default) or an external file. Regardless of the data destination, CSSELECT generates final sampling weights, stagewise inclusion probabilities, stagewise cumulative sampling weights, as well as variables requested in the sampling plan.
External files or datasets produced by CSSELECT include selected cases only. By default, all variables in the active dataset are copied to the external file or dataset. Optionally, you can specify that only certain variables are to be copied.
Joint Probabilities. First-stage joint inclusion probabilities are automatically saved to an external file when the plan file specifies a PPS without-replacement sampling method. Joint probabilities are used by Complex Samples analysis procedures, such as CSDESCRIPTIVES and CSTABULATE. You can control the name and location of the joint probabilities file.
Output. By default, CSSELECT displays the distribution of selected cases by stratum. Optionally, you can display a case-processing summary.
Basic Specification
- The basic specification is a PLAN subcommand that specifies a sample design file.
- By default, CSPLAN writes output data to the active dataset including final sample weights, stagewise cumulative weights, and stagewise inclusion probabilities. See the CSPLAN design for a description of available output variables.
Operations
- CSSELECT selects sampling units according to specifications given in a sample plan. Typically, the plan is created using the CSPLAN procedure.
- In general, elements are selected. If cluster sampling is performed, groups of elements are selected.
- CSSELECT assumes that the active dataset represents the sampling frame. If a multistage sample design is executed, the active dataset should contain data for all stages. For example, if you want to sample individuals within cities and city blocks, then each case should be an individual, and city and block variables should be coded for each individual. When CSSELECT is used to execute particular stages of the sample design, the active dataset should represent the subframe for those stages only.
- A case is excluded from the sample frame if it has a system-missing value for any input variable in the plan.
- You can control whether user-missing values of stratification and cluster variables are treated as valid. By default, they are treated as invalid.
- User-missing values of measure variables are always treated as invalid.
- The CSSELECT procedure has its own seed specification that is independent of the global SET command.
- First-stage joint inclusion probabilities are automatically saved to an external file when the plan file specifies a PPS without-replacement sampling method. By default, the joint probabilities file is given the same name as the plan file (with a different extension) and is written to the same location.
- Output data must be written to an external data file if with-replacement sampling is specified in the plan file.
- This procedure uses the multithreaded options specified by SET THREADS.
Syntax Rules
- The PLAN subcommand is required. All other subcommands are optional.
- Only a single instance of each subcommand is allowed.
- An error occurs if an attribute or keyword is specified more than once within a subcommand.
- An error occurs if the same output file is specified for more than one subcommand.
- Equals signs shown in the syntax chart are required.
- Subcommand names and keywords must be spelled in full.
- Empty subcommands are not allowed.
Limitations
- WEIGHT and SPLIT FILE settings are ignored with a warning by the CSSELECT procedure.