2110https://www.ibm.com/developerworks/community/forums/atom/replies?topicUuid=77777777-0000-0000-0000-000014916205Create Data Sample - 95% Confidence Chart Meaning Replies2012-12-05T14:21:39.887ZIBM Connections - Discussion Forumurn:lsid:ibm.com:forum:77777777-0000-0000-0000-000014916474Re: Create Data Sample - 95% Confidence Chart Meaning2012-12-05T14:21:39.887Zsmithha110000PAKNactive2012-12-05T14:21:39.887Z
I always forget that adding two hyphens creates a strikethrough of the text. The line with the strikethrough should read:<br />
<br />
Note: .0040 is the number you see in the chart for 95% confidence, 5000 record sample, at 98% defect-free
none, view_forum, view_categoryurn:lsid:ibm.com:forum:77777777-0000-0000-0000-000014916471Re: Create Data Sample - 95% Confidence Chart Meaning2012-12-05T14:19:36.953Zsmithha110000PAKNactive2012-12-05T14:19:36.953Z
Hi Kimberly,<br />
<br />
Prop. is the proportion of defects in a given sample.<br />
Ratio is the ratio of defect-free records for the sample.<br />
<br />
The following describes more of the statistical method used -- I'm not a statistician, but hopefully can highlight what you are looking at in the table.<br />
<br />
The fundamental approach of the statistical sampling methodology is to use the percentage of records that are defect-free in a relatively small sample of records to determine, within a narrow range, what would be the actual percentage of defect free records for the full data environment. In statistics, this method is referred to as making a statistical inference for the proportion (P) of an infinite population (N) from the tested proportion (p) of the sample (n).<br />
<br />
Given the standard error of the proportion (SE) for the sample, you can estimate what percentage of records would be free of critical data defects in the full data environment. The extrapolation can be made as follows:<br />
for a 95% confidence interval – proportion (P) = p ± 2*(SE)<br />
for a 99% confidence interval – proportion (P) = p ± 2.5*(SE)<br />
<br />
The standard error (SE) of the proportion is equal to the square root of p(1 – p)/n.<br />
<br />
From a random sample of 5,000 records, say that 4,900 (98%) are defect-free (and 100 records or 2% of the records have defects). <br />
When the values are supplied, the formula for SE is the square root of .98(1 – .98)/5000.<br />
This evaluates to the square root of .0196/5000 or .0020, so that is the standard error of the proportion (SE).<br />
<br />
To work within the 95% confidence interval (as on the table), use the formula for the population noted above where your ratio is 98%:<br />
P = p ± 2*(SE) or P = 98% ± 2*(.0020) or P = 98% ± .0040 <br />
<br />
<strike>Note: .0040 is the number you see in the chart for 95% confidence, 5000 record sample, at 98% defect-free</strike><br />
<br />
In this example, there is a 95% probability that the actual percentage of defect-free records in the full data environment is from 97.60% to 98.40%.<br />
The table that appears provides the sample size required to produce a desired plus or minus range at the 95% confidence interval for an assumed proportion of defect-free (ratio) records <a class="jive-link-external" href="http://or%20conversely%20for%20an%20assumed%20proportion%20of%20defective%20records%20(prop.)">http://or conversely for an assumed proportion of defective records (prop.)</a><br />
<br />
Hope that helps.<br />
<br />
Harald
none, view_forum, view_category