Murray Cantor
Probability Distributions ExplainedWhen I started this experiment in blogging, I wrote that I am not a natural blogger. I am not the affable, chatty web presence who on a daily or weekly basis shares one's thoughts. I have learned since then what kind of blogger I am. My style is to write little essays that might take weeks to prepare, given the priorities of my day job. I have also found I do enjoy writing the blog as it gives me a chance to share some issue that is top of mind. So here goes: A few weeks ago, my good colleague displayed a chart in one of his PowerPoint decks entitled something like "How to Understand Murray." The chart was an explanation of probability distributions. It was both flattering and a bit of a wakeup call. As Arthur mentioned and the readers of this blog know, much, if not all, of my writing assumes an understanding of probability and probability distributions (aka probability densities). My experience in discussions with folks from our industry is that most of them have vague memories from some stat class in college and so can generally follow the discussions, but most could use a refresher. I could simply refer readers to a good Wikipedia article, but, instead, let me given a domainspecific example. Let's go with a topic I wrote about in a previous entry: time to ship. For explanation purpose, let's take a fictional example. Suppose you are starting a project expected to ship in 110 days. That said, we cannot be 100% certain of being ready to ship on exactly that day, no sooner, no later. In fact, it is very unlikely we will exactly hit that day. Maybe we will be ready the day before, or maybe the day before that. Since being ready on or any day before day 110 is success, we can sum up the probabilities on being ready on any of those days to get the probability we really care about. All that said, the probabilities for each of the days matter because we need them in order to get the sum. The set of probabilities for each of those days is the probability distribution or density that is our topic. Let's look at a simple example. In this example of a triangular distribution there is 0% probability that the product will be ready before day 91 and we are 100% certain we will be ready before day 120. We think the days become more probable and reach a peak as we approach day 110 and fall off after that. This graph then shows the probability, day by day, of being ready on exactly that day. You might notice that the peak is less than 0.07 (actually 0.06666...). This makes sense since we are assuming that the project may be complete on any of 30 different days and so the densities would be in the neighborhood of 1/30 = 0.03333. In our case, some are above and some below. These distributions are the basis of calculating the likelihood of outcomes. The principle is very simple: The probability of being ready within some range of dates is the sum of the probabilities of being ready on exactly one of those dates, i.e. , we add up the density values for those days. As I explained above, if we want to compute the probability of being ready on or before day 110, we would add up all of the densities for days 90 to 110 to get 0.7. Using the same reasoning the probability of being ready on some day before day 120 is the sum of all the densities which comes to exactly 1.0, which was one of our going in assumptions. In fact the property that the sum of densities for all possible outcomes equals 1 is a defining property of distributions. Those who want to try this out at home could use this spreadsheet.. For example, can you find on what day, being ready on or before that day is an even bet? For most development efforts, the overall state of the program (some would say 'health') is characterized by the shape of the distribution, This shape changes every day. Every action the team takes changes the shape. So, one key goal of development analytics would be to track the shape of the distribution throughout the lifecycle, a daunting task. More on this (probably ) in future postings. 
Garbage in, Garbage out
mcantor@us.ibm.com
Tags:
distributions.
douglas
random
gigo
bayesian
forecasts
hubbard
variables
3,471 Views
One of the common criticisms of estimation methods is that the calculation is no better than the assumptions: garbage in, garbage out (affectionately known as GIGO). That is, if you make poor or dishonest assumptions then you will get misleading forecasts. It is especially egregious that occasionally someone might take advantage of the system by gaming the system and intentionally feeding assumptions that lead to false forecasts to get a desired business decision.
However, estimation is an essential part of any disciplined funding decision process (such as program portfolio management). The funding decision relies on estimates of the costs and benefits. But for reasons just described, estimation is suspect.
So, what to do? I suggest the answer is not to abandon estimation; the answer is the not input garbage, or if you do, detect it as soon as possible to minimize the damage.
First note that the future costs and benefits are uncertain,
so any serious approach to the GIGO
problem is to treat the assumptions as random variables with probability distributions and work from there. Generally, this allows one to use the limited information at hand to enter the assumptions and calculate
the forecasts.
Douglas Hubbard, in How to Measure Anything, gives us one way to proceed. Briefly, when an uncertain value is needed, ask the subject matter expert (sme) to give not one but three values: low, high, and expected. The three values may be used to specify random variables with triangular distributions [ref].
In this case, the greater the difference between the high and low values, the wider the triangular distribution of the estimate reflecting the uncertainty of the sme who is honestly making the assumptions.
One can use the random variables as values in the estimation algorithm using Monte Carlo by repeatedly replacing the single values with sampled values of the triangular distributions and assembling the distribution of the estimated value. Note the estimate is again just as good as the assumptions, however we assess our faith in the estimate by the width of the 10%90% range of its distribution.
For example, one might estimate to the total time for completing s project by a project by entering, for each task, the least time, the most time, and the most likely time. Then one could apply Monte Carlo simulation or more or more elementary methods to rollup the estimates to compute the distribution of the time to complete.
Hubbard goes further by suggesting that as actuals in the assumptions come available to review if they fall within the 10% 90% range of the initial distributions. If they do, fine. If they don’t, questions are asked about the underlying reasoning and beliefs. Over time the organization becomes more capable and accountable at making good assumptions.
Further, we can also deal with the garbage in garbage out problem by using actual data whenever possible. There are at least two techniques.
In the first, as actuals in the assumptions become available in the, they can used to replace the distributions. For example if there are monthbymonth sales projections captured as triangular distributions to forecast sales volumes, the distributions are replaced by the actual sales numbers. Also, one should update the remaining triangular distributions reflecting the actual sales trends. The resulting estimate will usually have a narrower distribution.
A second technique is Bayesian trend analysis. In this case we use actuals for evidence of the estimate. For example, if a project were on track, then we can expect that certain measures, such as burn down rate and test coverage reflect that. If a project were to ship on time, the number unimplemented requirements would be going to zero, Similarly, the code coverage measure would be trending towards the target. So these measures are evidence of a healthy project. Using Bayesian trend analysis, we can turn the reasoning around and update the initial (prior) estimate of the time for completion using the actuals as evidence for an improved estimate. The result is an improved probability distribution of the time to complete the project. As more actuals become available, the distribution becomes narrower, increasing the certainty of the forecast.
This way one can detect early if the system is being gamed and at the same time, use the actuals to estimate the likelihood of an ontime delivery.
So generally, one can use actuals to not only improve the
estimation process as Hubbard suggests, but also to apply Bayesian techniques,
to improve the estimates of the program variables. 
Managing commitments
In the previous entry, I introduced a probabilistic view of a commitment. The main idea is that when you commit to deliver something in a future, you are making a kind of bet. The odds of winning the bet is the fraction of the distribution of the time=todeliver before the target date. For example, in the following example, the project manager has a 47% likelihood of winning the bet.
The raises a couple of questions. First, how is the distribution of timetocomplete determined? There are variety of methods to estimate time to complete of an effort. I am not taking a position on what method to adopt. The important point is that the estimation method should not return a number but a distribution! The major estimation vendor have this capability even if it not always surfaced. I will expand on this point in the next blog entry. For now, the key point is that you should be working not with point estimations, but with the distributions.
Second is how the project manager affects the shape and position of the distribution and therefore affects the odds. Some of the techniques are intuitive, some not so much, There are two things one might do: move the distribution relative to the target date, and change the shape if distribution typically narrowing it so that more of it .lies within the target date.
In the first, one can either move the target date out, so that the picture looks like this
This is, of course, intuitive  moving out the date lowers the risk. Another intuitive thing a project manager might do is the descope the project  commit to deliver less functionality. This may have two effects on the distribution: It will move it to the left as there will be less work to do. Depending on the difficulty of the descoped feature, the descoping may also narrow the distribution. By removing a difficult to implement feature. one is more certain of delivery, narrowing the distribution, removing risk resulting in this diagram:
Figure 3
Now comes the unintuitive part. Suppose the target date and content are not negotiable. What is a project manager to do then? The idea is to take actions that will narrow the distribution in Figure 1 so that it looks like
Figure 4
How is this done? Many project managers, in the name of making progress choose the easiest functions to implement first, "the low hanging fruit". However, by doing this the shape of the curve in figure in minimally affected, The less intuitive approach, Following the principle of the Ration Unified Process, is to work on the most difficult, riskiest requirements first! These are the requirements of which
the team has the least information and so should tackle first in order to have time to gain the information needed to succeed. Putting off the riskier requirements and doing the easy stuff first gives the appearance of progress, but by putting off the riskier requirements, one will run out time to do the riskier requirements and fail to meet the commitment.
All this has to be while ensuring their is sufficient time to fulfill all the requirements, risky or not. So in the end, one must account for both the time to complete tasks and their uncertainty to meet commitments. Some techniques for doing that will be discussed in the next blog entry. 
What is a Commitment?
One of the things that characterizes software or systems development is that the project manager routinely commits to deliver certain functionality on a given date at an agreedupon level of quality for a given budget. It is the role of the project manager to make good on the commitment, The software and systems organization leadership may count on the commitments being met in order to meet their business commitments or there may be an explicit contract to deliver on time for a fixed budget. The measure of a good project manager is the ability to make and meet commitments.
iIn this blog entry, I will discuss the nature of that commitment and how it relates to project analytics. First of all, lets define 'commitment' in this context. Of course, I do not mean the confinement to a mental institution, I mean, as suggested above, the promise to deliver certain content with acceptable on or before a certain date.
The first thing to notice is that the future is never certain, and so we are in the realm of probability and random variables, i.e. a quantity described by probability distribution. Going forward, I will assume the reader is familiar with the concepts of random variables and their associated distributions . Soon, I will devote a blog entry just that topic.
Meanwhile, the best way to describe the likelihood of meeting a commitment is the use of a random variable. Consider the distribution of the time it will take to meet the commitment. It might look something like this: A similar distribution would apply to cost to complete. Recall, the probability then of the commitment being met is the area under the curve that falls before the target date: The manager, in making the commitment, is essentially betting (perhaps his or her career) that he or she will meet the commitment. According to this measurement, the odds are about 5050. The key measurement then is the amount area of the random variable that lies prior to the target date, which in turn relies on the the ability to calculate the probability distribution. I also will discuss some techniques to do that in a later entry. Now consider for example, "project health". What I believe what is meant is the likelihood of meet the commitment to deliver the project on time. If it highly probable the project will ship at the target date, the project is 'green' otherwise it is 'yellow' or 'red' like in the following figure.
There are three reposes to a yellow or red project. One can move the target date, move the distribution, or change the shape of the distriburion, again a topic for a later bog.

ROI of software and system programs article published
mcantor@us.ibm.com
Tags:
net
management
ppm
portfolio
roi
npv
return
investment
software
of
present
on
value
3,157 Views
In an earlier blog entry, I mentioned my article Calculation and Improving the ROI of Software and System Programs. I am pleased to announce to that it has been published in the September 2011 issue of the Communications of the ACM.

Estimating Nuclear Safetyis Really Hard
I know. It has been a long time since my last posting. Over the last few months, nothing much happened that prompted an entry. I was thinking of writing something sort of philosophical on the nature of estimates, but never got to it. Then there was the tsunami and the associated nucler reactor failures at the Fukushima power plant. Suddenly, the topic became more urgent. This is relevant to this blog, because our domain includes the engineering and economics of safety critical systems. Presumably the nuclear reactor industry uses the state of the art methods. I have been exploring what is going on there and while, I am far from an expert, I have found out some things worth sharing in a blog. 
Economics of QualityI have the honor of giving one of the keynotes at the Conseg2011 conference this February in Bangalore. I have chosen a large, perhaps overly ambitious topic: "The Economics of Quality". Here is my conference proceedings document. My goals in preparing the paper and presentation is to make the case is that quality is fundamentally an economics concern and to suggest an overall approach for reasoning about when the software has sufficient quality for shipping. For those who have read some of my earlier entries, you will see have different my thinking on the topic differs from those who take a technical debt approach. Anyhow, the brief proceedings paper is very high level and there is considerable work to filling in the details and validating the approach. This paper really is the beginning of a program that I believe, when carried out, will have great benefit to our industry. So, I would like to hear from anyone who has similar interests and perspectives. There must be some existing relevant research. Perhaps we find enough likeminded folks to build a community exploring the topic. 
Risk, Value. and BPIt should not be a surprise that I have been following the BP oil spill with much interest. In fact, as I starting typing this entry, I was watching the grilling of the BP CEO, Tony Heyward, by Congress. Rep. Stupak is focusing on the BP’s risk management. Some of you have read my earlier posting on my thoughts of the BP decision process that led to the Deepwater Horizon blowout. So far, information uncovered since that posting is remarkably consistent with my earlier suppositions. In this entry I would like to step back a bit and discuss what broader lessons might be learned from the incident. While it is all too easy to fall into BP bashing, I would rather use this moment to reflect more deeply on risk taking and creating value. (BTW, some of you might now that my signature slogan is ‘Take risks, add value’.) In our industry we create value primarily through the efficient delivery of innovation. Delivering innovation, by definition, requires investing in efforts without initial full knowledge of the effort required and the value of the delivery. This incomplete information results in uncertainties in the cost, effort, schedule of the projects and the value of the delivered software and system, i.e. cost, schedule and value risk. Deciding to drill an oil well also entails investing in an effort with uncertain costs and value. In this case, the structure of the subsystem and productivity of the well cannot be know with certainty before drilling. As I pointed out in an earlier blog, a good definition of risk is uncertainty in some quantifiable measure that matters to the business. So in both our industry and oil drilling we deliberately assume risk to deliver value. So, what can we learn from the BP incident? Briefly, one creates value by genuinely managing risk. One creates the semblance of value for a while by ignoring risk. Assuming risk, investing in uncertain projects, provides the opportunity for creating value. That value is actually realized by investing in activities that reduce the risk. The model that shows the relationship is described in this entry. So, reducing risk has economic value, but reducing risk takes investment. In the end, the quality risk management is measured with a return on investment calculation. This in turn requires a means to quantify and in fact monetize risk. I wonder what was there risk management approach was followed by BP. A recent Wall Street Journal article suggested they used a risk map approach – building a diagram with one axis a score of the ‘likelihood of the risk’ and the other a score of the ‘severity of a failure’. So with this method, they would score the risk of a blowout as very low (based on past history) with a very high consequence. So, such a risk needs to be ‘mitigated’. (Some actually multiply the scores to get to some absolute risk measure.) Their mitigation was the installation of a blowout preventer. They could then confidently report they have executed their risk management plan. Note these scores are at best notionally quantified and not monetized. Paraphrasing my good colleague, Grady Booch (speaking of certain architecture frameworks), risk maps is the semblance of risk management. As pointed out by Douglis Hubbard in The Failure of Risk Management (and in an earlier rant in this blog), this sort of risk management is not only common, but dangerous: It is a sort of business common failure mode that leads to bad outcomes. Also, Hubbard points out, useful risk management entails quantification and calculation using probability distributions and Monte Carlo analysis. I would add that since risk management in the end is about business outcomes, risks need to be monetized as well as quantified. I am willing to bet a good bottle of wine that BP did no such thing. Any takers? The business common failure mode was overreliance on the preventers, even though there are several studies showing they are far from ‘failsafe’. Further, it appears BP assumed risk by consistently taking the cheaper, if riskier. design and procedure alternative, the one with greater uncertainty in the outcome, even when the cost of an undesired, if unlikely, outcome was possibly catastrophic. The laundry list of such decisions is long; some outlined in Congressman Waxman’s letter to Tony Hayward. CEO’s of Shell and Exxon testified before congress that their companies would have used a different, more costly designs and followed more rigorous procedures. According the congressional and journalistic reports, this behavior is BP standard operating procedure. So BP assumed risk by drilling wells but did not invest in reducing the risk. For quite a while they got away with the approach of assuming but really reducing risk, and appeared to be creating value as reflected in stock and value and dividends to the investors. The BP management raised the stock price from around $40/share in 2003 to a peak of around of $74/share prior to the Deepwater Horizon incident. At this writing the stock is trading at $32/share and the current dividend has been cancelled. Investors might rightly wonder if there is another latent disaster and so discount the apparent future profitability with the likelihood of unknown liabilities. The total loss of stockholder value is over $100B, which is in the ballpark of the eventual liability of BP. So, whatever approach BP used to manage risk failed. BTW, some may recognize this same pattern in the management of financial firms that participated in the subprime mortgage market. In that case, they ‘mitigated risk’ by relying on the ratings agencies. Those who actually built monetized models of the risk realized there was a great opportunity to bet against the subprime mortgage lenders and made huge fortunes (See, e.g. The Big Short: Inside the Doomsday Machine by Michael Lewis .). Readers of the blog will notice a recurrent theme is some of the postings. It is essential that we assume and manage risk. To repeat a favorite quote, “One cannot manage what one does not measure.” The risk map, score methods, while common are insufficient to the needs of our industry; they do not measure, nor really manage risk. We as a discipline need to step up to quantifying, monetizing, and working off risk in order to be succeed as drivers of innovation. We need to step up to the mathematical approach found in the Douglas and Dan Savage’s (see this posting) texts. I came to this same realization probably a decade ago. I held off at first because I had not deep enough understanding of how to proceed, and I knew I would encounter great skepticism. I tested the waters in 2005 and posted my first paper on the subject in 2006. I indeed received a great deal of skepticism and resistance, but enough acceptance to go forward. I have learned some important lessons from all that. In my next blog, I will share my experiences of bringing more mathematical thinking to risk management for SSD. 
The Flaw of Averages in Software and Systems
mcantor@us.ibm.com
Tags:
software_estimates
flaw_of_averages
system_estimates
2 Comments
3,151 Views
Folks who have heard me present will recognize the following discussion as a variation of what I have used as an example to explain the importance of variance in software and system estimates. Imagine this time you are a development organization manager given the following artificial opportunity. You can agree to the following deal: Have the teams at your own expense develop some application, each meeting a given set of requirements. The client really wants the applications and will accept them if acceptable and perfectly will to be consulted throughout the projects. Here is the catch: if you deliver the projects on time in 12 months, you will receive $1M per application. If you are a day late, you get nothing. You have to decide whether to take the deal. Lets suppose you take the projects to your estimators and they tell you the estimated time to complete is 11 months and the estimated cost to complete is $750K for each of the projects. So you stand to make an estimated $250K per project. So you staff up as much as you take on three projects looking forward to your bonus. Was this a good deal? Those who have read The Flaw of Averages by Sam Savage and Dan Denziger already know the answer. Those who haven’t read the book should. This book nicely captures the sort of statistical reasoning that underlies IBM Rational’s approach to business analytics and optimizations (found in the RTC agile planner and the ROI calculations in Focal Point). Some key rules:
Back to the example: The time to complete is an uncertain quantity and so must be described by a distribution. Often, the estimate returned by the estimator is the mean of that distribution. The distribution may be pretty wide and so may look like Figure 1 of the attached document. (I have had bad luck trying to embed figures in the blog and I have put the figures in this this attachment.) Note that 40% of the distribution lies beyond 12 months. Assuming the $750K cost to complete estimate is dead on, lets apply some simple high school probability to get the distribution of profit (See Figure 2): · The chance of succeeding at all three projects and getting $3M is revenue.is (0.6)^{3}=0.216, · The chance of succeeding at exactly two projects and getting $2M in revenue is 3(0.6)^{2}(0.4)=0.432 · The chance of succeeding at exactly one project and getting $1M in revenue is 3(0.6)(0.4)^{2}=0.288 · ^{ }The chance you will fail at all three projects yielding no revenue is (0.4)^{3}=0.064. The weighted average of the distribution of revenues is (0.216)($3M) + (0.432)($2M) + (0.288)($1M) + (0.064)($0) = $1.8M So the likely outcome of your (3)$750K = $2.25M expense is a loss of $450K. But wait, it is worse. The distribution is probably not normal. Programs are more likely to late than early and so are skewed to the right. In this case the average (i.e. the mean) is less than the 50% point. So, as shown in Figure 3, it is possible to have the estimate of 11 months and the likelihood of failure is 50%. The revenue distribution is given in Figure 4. In this case, the weighted average of the distribution of revenues is (0.125)($3M) + (0.375)($2M) + (0.375)($1M) + (0.125)($0) = $1.5M In this the expected loss is $750K. But wait, it is still worse. The cost to complete is also uncertain. To keep things as simple as possible, lets suppose the cost to complete for each of the projects is described by three values: best case is $700K, the likely case is $750K, and the worse case is $1M To compute the expect profit in this case requires using this values as parameters for a triangular distribution (see Figure 5) and then apply Monte Carlo methods to do the calculation to get the distribution that describes the profit. The result is shown in Figure 6. Briefly in this case: · The most likely outcome is a loss of $945K · There is a 90% certainty of losing at least $805K · There is a 10% chance of losing more than $1.1M So taking this deal is at best career limiting! Notice by ignoring the rules, one is tempted to make a bad deal. Applying each of the rules with more discipline shows how bad the deal is. The moral of all this is that making business decisions based on calculations of averages can lead to disastrous outcomes. This moral needs to be taken to heart by our industry. Far too often, managers when faced with making funding projects or business commitments insist, “Just give me the number.” What they need is a distribution; the number they are given is likely to be an average. Decisions based on the number will likely go sour. No wonder the software and system business outcomes rarely delight their stakeholders. The good news is that there are robust, proven techniques to avoid the flaw of averages. 
What were they Thinking?
First, some personal disclosure: In the late 1980’s, I
worked for a while at Shell Research, developing seismic modeling and data
imaging algorithms. (See
this link.) While there I received training on oil exploration. I
remain awed by the passion, expertise, daring, and discipline of the engineers,
scientists, technicians, and skilled laborers who take responsibility for
providing the hydrocarbons we completely rely on.
Oil exploration is remarkably costly and risky. Even then in the late 1980’s, it was not uncommon to spend $1B on an exploration well, hoping to find oil based on the seismic data only to find it dry. At Shell, I was on a team that developed, for the time, a highly computeintensive algorithm for imaging seismic data captured in complex subsurfaces. They literally bought us a Cray since running the algorithm might make a marginal difference in the success rate of exploration drilling. Hence, I am not an oil industry basher, far from it. So I have been watching the BP, Deepwater Horizon gulf catastrophe with great interest. In this entry, I will share what I have gleaned from various news sources. (I have found the Wall Street Coverage very credible). So here is my net: Recall, the blowout occurred shortly after capping an exploration well (a will drilled solely to confirm the presence of a oil resevior). The depth of the well, reported 18,000 ft, is no big deal. The depth of the water, 5000 ft, is far from the record of around 8000 ft. So the well itself was routine for the industry. So what happened? BP had drilled many of these wells. In fact, ironically the blowout occurred while BP executives were celebrating their safety record. However, over the last few years have become profitable by building a very costconscious culture. Such a culture is likely to cut corners, repeatedly taking small risks in business operations. Such behavior may be rational if you believe the total liability is bounded. There is reason to believe BP’s liability is ‘capped’ at $75M. This culture seemed to be at work on the oil platform:
I bet that BP managers routinely made the same decisions for years with no adverse outcomes. They were probably rewarded for this behavior. Such a culture makes such disasters inevitable over time. A great case study of such cultures is found in Diane Vaugh’s The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA. She studies the NASA culture that led to the decision to launch the shuttle that exploded on launch killing ‘the first teacher in space’ while tens of thousands of school children watched on television. The managers overruled the engineers who advised them that the temperature was out of spec for a launch. As she explains, the managers had gotten away with taking similar risks in the past and had decided to bow to political pressures and approve the launch. So what they were thinking is something like, “These risks are no big deal and the savings matter.” A key moral is that over time the unlikely becomes inevitable. Further, experience and past data lead to exactly the wrong behavior: they reward the risk taking, not the caution. Many have made exactly the same point about behavior of the financial firms during the financial meltdown. Internalizing this moral and acting accordingly is key to our industry. Increasing, we will be building life critical, economic critical systems that are very complex, will operate over long periods, and whose failure could be catastrophic. There is no turning away from this inevitability. So, we all need to understand that cost savings must be balanced by a clear understanding of the overall risks of failure, their consequences and the real return of investment in failure avoidance. This of course takes some math. In particular, thinking about averages is not useful. That is topic of my next entry. 