Count, Valid N, and Missing Values

It is often useful to display the number of cases used to compute summary statistics, such as the mean, and you might assume (not unreasonably) that the summary statistic Count would provide that information. However, this will not give you an accurate case base if there are any missing values. To obtain an accurate case base, use Valid N.

  1. Open the table builder (Analyze menu, Tables, Custom Tables).
  2. Right-click any one of the three scale variables in the table preview on the canvas pane and select Summary Statistics from the pop-up menu.
  3. In the Summary Statistics dialog box, select Count in the Statistics list and click the arrow to add it to the Display list.
  4. Then select Valid N in the Statistics list and click the arrow to add it to the Display list.
  5. Click Apply to All to apply these changes to all three scale variables.
  6. Click OK in the table builder to create the table.
    Figure 1. Count versus Valid N
    Count versus Valid N

    For all three variables, Count is the same: 2,832. Not coincidentally, this is the total number of cases in the data file. Since the scale variables aren't nested within any categorical variables, Count simply represents the total number of cases in the data file.

    Valid N, on the other hand, is different for each variable and differs quite a lot from Count for Hours per day watching TV. This is because there is a large number of missing values for this variable--that is, cases with no value recorded for this variable or values defined as representing missing data (such as a code of 99 to represent Not Applicable for pregnancy in males).

  7. Open the table builder (Analyze menu, Tables, Custom Tables).
  8. Right-click any one of the three scale variables in the table preview on the canvas pane and select Summary Statistics from the pop-up menu.
  9. In the Summary Statistics dialog box, select Valid N in the Display list and click the arrow key to move it back to the Statistics list, removing it from the Display list.
  10. Select Count in the Display list and click the arrow key to move it back to the Statistics list, removing it from the Display list.
  11. Select Missing in the Statistics list and click the arrow key to add it to the Display list.
  12. Click Apply to All to apply these changes to all three scale variables.
  13. Click OK in the table builder to create the table.
Figure 2. Number of missing values displayed in table of scale summary statistics
Number of missing values displayed in table of scale summary statistics

The table now displays the number of missing values for each scale variable. This makes it quite apparent that Hours per day watching TV has a large number of missing values, whereas the other two variables have very few. This may be a factor to consider before putting a great deal of faith in the summary values for that variable.