IBM Support

Inconsistency in the output when using weighting procedure

Troubleshooting


Problem

There are discrepancies among several SPSS procedures for descriptive statistics in regard to which statistics may be reported when the sum of the case weights is less than 1.0. I have assigned noninteger weights to my data so that certain cases have greater influence than others on the statistics reported. If the sum of case weights is less than 1.0 and I request descriptive statistics with the following procedures: Descriptives will display the range, minimum and maximum values of the listed variable(s), but not the mean, or sum. Frequencies will not report the range, minimum, maximum, mean, or sum. It will report the table of frequencies for each observed value. Means will report the range, minimum, maximum, mean, and sum. Explore (EXAMINE) will report the range, minimum, maximum, and mean. The sum is not available from the Explore procedure. Custom Tables (CTABLES) will report the range, minimum, maximum, mean, and sum. None of the above procedures report the variance standard deviation when the sum of weights is less than 1.0, but this makes sense as the denominator of the variance is the sum of the case weights minus 1 and this denominator must be a positive value. Why do these discrepancies exist in SPSS procedures in the situation where the sum of the weights is less than 1.0?

Resolving The Problem

This issue was reported to SPSS Development to request that the descriptive statistic procedures be consistent in regard to which statistics (of those which are commonly available to these procedures) are reported when the sum of weights for a group, split file value, or the full data file, is less than 1.0.

However, when this intended behavior is changed it is likely that unexpected side effects to procedures would be introduced. Therefore this design will not be changed.

Since the purpose of frequency weight variable is just a way to specify repeated cases you should use proper methods to use fractional weights such as Complex Samples:



The purpose of frequency weights is to account for aggregate data where each row in the the data file may represent multiple respondents, where the number of respondents represented is stored in the weight variable. If the weights are intended to reflect unequal sampling probabilities due to clustering or stratification, then the Complex Samples module is the proper procedure to perform analyses that will reflect the sampling design that underlies the weights and provide proper standard errors and significance values for inferential statistics.

If the aim of your analyses is simply to provide estimates of means and proportions without reference to standard errors or inferential tests, then you might consider multiplying the weights by a constant such that the average weight equals 1.0 but the relative sizes of the weights are not affected. With no weight variable assigned (i.e. choose "Do not weight cases" in the Data->Weight cases dialog or run the command WEIGHT OFF), run Descriptives on your current weight variable and request the sum. If WT is your current weight variable, N is the number of actual rows (or cases) in the data file, and SWT is the sum of WT, then multiplying WT by N/SWT will give you weights with an unweighted mean of 1. The Ns in your weighted descriptive statistics procedures will reflect the actual sample size. The following commands would compute and assign the new weigts:

WEIGHT OFF.
COMPUTE brk = 1.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=brk
/wt_sum=SUM(wt)
/N_BREAK=N.

COMPUTE wt_av1=wt * N_BREAK / wt_sum.
EXECUTE.
WEIGHT BY wt_av1.

If you already have a split file variable and plan to run separate analyses by split file group, you can use that variable in place of brk in the commands above. The (unweighted) average weight in each split file group will equal 1.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

77565

Document Information

Modified date:
16 April 2020

UID

swg21478331