urn:lsid:ibm.com:blogs:entries-62407611-5ae9-4120-ab15-a6f61b507bd1Murray Cantor - Tags - bayesian Rational business analytics and optimization03022014-02-16T11:54:30-05:00IBM Connections - Blogsurn:lsid:ibm.com:blogs:entry-03d9ee3f-fafd-4110-9890-516d9fd4465fIntroduction to Prediction Papermcantor@us.ibm.com110000CH6Xactivefalsemcantor@us.ibm.com110000CH6XactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2013-05-31T08:58:00-04:002013-05-31T10:56:30-04:00<p dir="ltr">
An ongoing theme of this blog is that development processes differ from other business processes in that there is a wide range of uncertainty inherent in the efforts. It follows that tracking and steering development efforts entails ongoing <strong>predicting</strong>, from the evolving project information, when a project is likely to meet its goals. </p>
<p dir="ltr">
Late last year, Nate Silver author of <a href="http://fivethirtyeight.blogs.nytimes.com">the Fivethrityeight blog</a> and well know predictor of elections published <a href="http://www.amazon.com/The-Signal-Noise-Many-Predictions/dp/159420411X/ref=tmm_hrd_title_0?ie=UTF8&qid=1370004316&sr=8-1">The Signal and the Noise</a>, a text for the intelligent layperson on how prediction works. I was impressed by the book as it explained the principles behind the sort of Bayesian analytics we need for development analytics without any explicit math. However, I felt for the folks in our field would greatly benefit by having the mathematical blanks filled in. So I decided to write a series of papers introducing the topics to folks who had some statistics and maybe some calculus in college, but not a solid background in prediction principles. </p>
<p dir="ltr">
The first in the series is now online: <a href="http://www.ibm.com/developerworks/rational/library/filling-in-the-blanks-1/">Filling in the blanks: The math behind Nate Silver's "The Signal and the Noise" Part 1.</a> It presents the very basics of Bayesian analysis. </p>
<p dir="ltr">
I hope you all find it useful and especially hope you find it interesting.</p>
An ongoing theme of this blog is that development processes differ from other business processes in that there is a wide range of uncertainty inherent in the efforts. It follows that tracking and steering development efforts entails ongoing predicting ,...102833urn:lsid:ibm.com:blogs:entry-f65e1cb7-2813-4403-9b82-054e83cffe95Garbage in, Garbage outmcantor@us.ibm.com110000CH6XactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2011-10-12T09:52:16-04:002011-10-12T09:53:52-04:00
<style>@font-face {
font-family: &amp;quot;ＭＳ 明朝&amp;quot;;
}@font-face {
font-family: &amp;quot;ＭＳ 明朝&amp;quot;;
}@font-face {
font-family: &amp;quot;Cambria&amp;quot;;
}p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: Cambria; }a:link, span.MsoHyperlink { color: blue; text-decoration: underline; }a:visited, span.MsoHyperlinkFollowed { color: purple; text-decoration: underline; }.MsoChpDefault { font-size: 10pt; font-family: Cambria; }div.WordSection1 { page: WordSection1; }</style>
<p class="MsoNormal"> </p>
<p class="MsoNormal">One of the common criticisms of estimation methods is that
the calculation is no better than the assumptions: <i style="">garbage in, garbage out (</i>affectionately known as<i style=""> GIGO). </i>That is, if you make poor or
dishonest assumptions then you will get misleading forecasts. It is especially egregious
that occasionally someone might take advantage of the system by gaming the system
and intentionally feeding assumptions that lead to false forecasts to get a
desired business decision. </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">However, estimation is an essential part of any disciplined
funding decision process (such as program portfolio management). The funding decision
relies on estimates of the costs and benefits. But for reasons just described,
estimation is suspect. <span style=""> </span></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">So, what to do? I suggest the answer is not to abandon
estimation; the answer is the not input garbage, or if you do, detect it as
soon as possible to minimize the damage.</p>
<p class="MsoNormal"><span style=""> </span></p>
<p class="MsoNormal">First note that the future costs and benefits are uncertain,
so any serious <span style=""> </span>approach to the GIGO
problem is to treat the assumptions as <a href="http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.html#randvar">random variables</a> with <a href="http://en.wikipedia.org/wiki/Probability_distribution">probability distributions</a> and work from there. Generally, this allows one to use the limited information at hand to enter the assumptions and calculate
the forecasts. <br /></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Douglas Hubbard, in <a href="http://www.howtomeasureanything.com/" style="font-style: italic;">How to Measure Anything</a>, gives us one way to proceed. Briefly, when an uncertain
value is needed, ask the subject matter expert (sme) to give not one but three values:
low, high, and expected. The three values may be used to specify random
variables with triangular distributions [ref].</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">In this case, the greater the difference between the high
and low values, the wider the triangular distribution of the estimate reflecting
the uncertainty of the sme who is honestly making the assumptions. </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">One can use the random variables as values in the estimation
algorithm using Monte<span style=""> </span>Carlo by repeatedly
replacing the single values with sampled values of the triangular distributions
and assembling the distribution of the estimated value. Note the estimate is
again just as good as the assumptions, however we assess our faith in the
estimate by the width of the 10%-90% range of its distribution. </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">For example, one might estimate to the total time for
completing s project by a project by entering, for each task, the least time,
the most time, and the most likely time. Then one could apply Monte Carlo
simulation or more <a href="http://en.wikipedia.org/wiki/Three-point_estimation">or
more elementary methods</a> to rollup the estimates to compute the distribution
of the time to complete.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Hubbard goes further by suggesting that as actuals in the
assumptions come available to review if they fall within the 10% -90% range of
the initial distributions.<span style=""> </span>If they do,
fine. If they don’t, questions are asked about the underlying reasoning and
beliefs. Over time the organization becomes more capable and accountable at
making good assumptions. <span style=""> </span></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Further, we can also deal with the garbage in garbage out
problem by using actual data whenever possible. There are at least two techniques.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">In the first, as actuals in the assumptions become available
in the, they can used to replace the distributions. For example if there are
month-by-month sales projections captured as triangular distributions to
forecast sales volumes, the distributions are replaced by the actual sales numbers.<span style=""> </span>Also, one should update the remaining triangular
distributions reflecting the actual sales trends. The resulting estimate will usually
have a narrower distribution.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">A second technique is Bayesian trend analysis. In this case
we use actuals for evidence of the estimate. For example, if a project were on
track, then we can expect that certain measures, such as burn down rate and
test coverage reflect that. If a project were to ship on time, the number
unimplemented requirements would be going to zero, Similarly, the code coverage
measure would be trending towards the target. <span style=""> </span>So these measures are evidence of a healthy
project. <span style=""> </span>Using Bayesian trend analysis, we
can turn the reasoning around and update the initial (prior) estimate of the
time for completion using the actuals as evidence for an improved estimate. The
result is an improved probability distribution of the time to complete the
project. As more actuals become available, the distribution becomes narrower,
increasing the certainty of the forecast.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">This way one can detect early if the system is being gamed
and at the same time, use the actuals to estimate the likelihood of an on-time
delivery. </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">So generally, one can use actuals to not only improve the
estimation process as Hubbard suggests, but also to apply Bayesian techniques,
to improve the estimates of the program variables.<br /></p>
@font-face {
font-family: &amp;quot;ＭＳ 明朝&amp;quot;;
}@font-face {
font-family: &amp;quot;ＭＳ 明朝&amp;quot;;
}@font-face {
font-family: &amp;quot;Cambria&amp;quot;;
}p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0in 0in...004516