Topic
• 2 replies
• Latest Post - ‏2012-12-05T14:21:39Z by smithha
294 Posts

# Pinned topic Create Data Sample - 95% Confidence Chart Meaning

‏2012-12-04T23:03:02Z |
All -

In the Key and Cross Domain analysis section, there is an option to create a data sample. When you view this option, a table appears listing the Prop., Ratio, and the different sample sizes.

Can anyone help me understand what the columns 'Prop' and 'Ratio' mean?

Attached is an image for your convenience.

Thanks,
Kimberly

#### Attachments

Updated on 2012-12-05T14:21:39Z at 2012-12-05T14:21:39Z by smithha
• smithha
161 Posts

#### Re: Create Data Sample - 95% Confidence Chart Meaning

‏2012-12-05T14:19:36Z
Hi Kimberly,

Prop. is the proportion of defects in a given sample.
Ratio is the ratio of defect-free records for the sample.

The following describes more of the statistical method used -- I'm not a statistician, but hopefully can highlight what you are looking at in the table.

The fundamental approach of the statistical sampling methodology is to use the percentage of records that are defect-free in a relatively small sample of records to determine, within a narrow range, what would be the actual percentage of defect free records for the full data environment. In statistics, this method is referred to as making a statistical inference for the proportion (P) of an infinite population (N) from the tested proportion (p) of the sample (n).

Given the standard error of the proportion (SE) for the sample, you can estimate what percentage of records would be free of critical data defects in the full data environment. The extrapolation can be made as follows:
for a 95% confidence interval – proportion (P) = p ± 2*(SE)
for a 99% confidence interval – proportion (P) = p ± 2.5*(SE)

The standard error (SE) of the proportion is equal to the square root of p(1 – p)/n.

From a random sample of 5,000 records, say that 4,900 (98%) are defect-free (and 100 records or 2% of the records have defects).
When the values are supplied, the formula for SE is the square root of .98(1 – .98)/5000.
This evaluates to the square root of .0196/5000 or .0020, so that is the standard error of the proportion (SE).

To work within the 95% confidence interval (as on the table), use the formula for the population noted above where your ratio is 98%:
P = p ± 2*(SE) or P = 98% ± 2*(.0020) or P = 98% ± .0040

Note: .0040 is the number you see in the chart for 95% confidence, 5000 record sample, at 98% defect-free

In this example, there is a 95% probability that the actual percentage of defect-free records in the full data environment is from 97.60% to 98.40%.
The table that appears provides the sample size required to produce a desired plus or minus range at the 95% confidence interval for an assumed proportion of defect-free (ratio) records http://or conversely for an assumed proportion of defective records (prop.)

Hope that helps.

Harald
• smithha
161 Posts

#### Re: Create Data Sample - 95% Confidence Chart Meaning

‏2012-12-05T14:21:39Z
• smithha
• ‏2012-12-05T14:19:36Z
Hi Kimberly,

Prop. is the proportion of defects in a given sample.
Ratio is the ratio of defect-free records for the sample.

The following describes more of the statistical method used -- I'm not a statistician, but hopefully can highlight what you are looking at in the table.

The fundamental approach of the statistical sampling methodology is to use the percentage of records that are defect-free in a relatively small sample of records to determine, within a narrow range, what would be the actual percentage of defect free records for the full data environment. In statistics, this method is referred to as making a statistical inference for the proportion (P) of an infinite population (N) from the tested proportion (p) of the sample (n).

Given the standard error of the proportion (SE) for the sample, you can estimate what percentage of records would be free of critical data defects in the full data environment. The extrapolation can be made as follows:
for a 95% confidence interval – proportion (P) = p ± 2*(SE)
for a 99% confidence interval – proportion (P) = p ± 2.5*(SE)

The standard error (SE) of the proportion is equal to the square root of p(1 – p)/n.

From a random sample of 5,000 records, say that 4,900 (98%) are defect-free (and 100 records or 2% of the records have defects).
When the values are supplied, the formula for SE is the square root of .98(1 – .98)/5000.
This evaluates to the square root of .0196/5000 or .0020, so that is the standard error of the proportion (SE).

To work within the 95% confidence interval (as on the table), use the formula for the population noted above where your ratio is 98%:
P = p ± 2*(SE) or P = 98% ± 2*(.0020) or P = 98% ± .0040

Note: .0040 is the number you see in the chart for 95% confidence, 5000 record sample, at 98% defect-free

In this example, there is a 95% probability that the actual percentage of defect-free records in the full data environment is from 97.60% to 98.40%.
The table that appears provides the sample size required to produce a desired plus or minus range at the 95% confidence interval for an assumed proportion of defect-free (ratio) records http://or conversely for an assumed proportion of defective records (prop.)

Hope that helps.

Harald
I always forget that adding two hyphens creates a strikethrough of the text. The line with the strikethrough should read:

Note: .0040 is the number you see in the chart for 95% confidence, 5000 record sample, at 98% defect-free