Structure of the Joint Probabilities File
Complex Samples analysis procedures will expect the following variables in the joint probability file in the order listed below. If there are other variables beyond the joint probability variables, they will be silently ignored.
- Stratification variables. These are the stratification variables used in the first stage of sampling. If there is no stratification in first stage, no stratification variables are included in the file.
- Cluster variables. These are variables used to identify each primary sampling unit (PSU) within a stratum. At least one cluster variable is always included, since it is required for all selection methods that generate the joint probabilities as well as for the estimation method using them.
- System PSU id. This variable labels PSU's within a stratum. The variable name used is Unit_No_.
- Joint probability variables. These variables store the joint inclusion probabilities for each pair of units. The default names of these variables will have the form Joint_Prob_n_; for example, the joint inclusion probabilities of the 2nd and 3rd units will be the values located at case 2 of Joint_Prob_3_ or case 3 of Joint_Prob_2_. Since the analysis procedures extract joint probabilities by location, it is safe to rename these variables at your convenience. Within each stratum, these joint inclusion probabilities will form a square symmetric matrix. Since the joint inclusion probabilities only vary for the off diagonal entries, the diagonal elements correspond to the first stage inclusion probabilities. The maximum number of joint inclusion probability variables will be equal to the maximum sample size across all strata.
Example

The file poll_jointprob.sav contains first-stage joint probabilities for selected townships within counties. County is a first-stage stratification variable, and Township is a cluster variable. Combinations of these variables identify all first-stage PSUs uniquely. Unit_No_ labels PSUs within each stratum and is used to match up with Joint_Prob_1_, Joint_Prob_2_, Joint_Prob_3_, Joint_Prob_4_, and Joint_Prob_5_. The first two strata each have 4 PSUs; therefore, the joint inclusion probability matrices are 4×4 for these strata, and the Joint_Prob_5_ column is left empty for these rows. Similarly, strata 3 and 5 have 3×3 joint inclusion probability matrices, and stratum 4 has a 5×5 joint inclusion probability matrix.
The need for a joint probabilities file is seen by perusing the values of the joint inclusion probability matrices. When the sampling method is not a PPS WOR method, the selection of a PSU is independent of the selection of another PSU, and their joint inclusion probability is simply the product of their inclusion probabilities. In contrast, the joint inclusion probability for Townships 9 and 10 of County 1 is approximately 0.11 (see the first case of Joint_Prob_3_ or the third case of Joint_Prob_1_), or less than the product of their individual inclusion probabilities (the product of the first case of Joint_Prob_1_ and the third case of Joint_Prob_3_ is 0.31×0.44=0.1364).