This blog experiment seems to be working. The entries are
gietting around 100 visits and growing - good enough to keep at it. I have
found that writing the entries has given me the opportunity to clarify and
express my thoughts. This entry is a case in point.
We are deploying a BAO solution for the level 3 support
organizations in our IBM India Software Labs. That deployment provides a case
study in how to integrate two concepts I introduced in earlier blogs. This
entry is longer than the others. I hope you find it worth the wait and effort
In those previous entries, I discussed two frameworks for reasoning
These frameworks address different aspects of the problem of
using measures to achieve business goals by measuring the right things and
taking actions to respond to the measurements. In fact, these frameworks fit
together hand and glove.
that level 3 support teams provide fixes to defects found in delivered
code.Each of the teams deals with
an ongoing series of change requests (aka APARs, PMRs). An organization goal is to reduce the time to and cost of completion of these
requests. To achieve the goal, they are adopting some Rational-supported practices
and supporting tools. So the questions
that need to be answered are:
is the time trend of the time to complete of the change requests?
is the time trend of the cost to complete of the change requests?
each case how would I know that some improvement action resulting in
significant improvement in the trends?
comes the hard part: determining the measures
that answer the questions. The change requests come arrive somewhat
unpredictably. Each goes through the fix and release process and presumably
gets released in a patch or point release.So at any given time there is a population of currently open
and recently closed releases. The measures that answer the question are a time
trend of some statistic on the population on some population of change requests.
of the change requests requires different amount of time and effort to
complete. So to measure if the outcome is being achieved, one must reason
statistically: defining populations of requests, building the statistical
distribution of say time to complete for that population, defining the outcome
statistic for the distribution.So
we need to do things to define the measure:
the population of requests for each point on the trend line
the statistics on that population
keep it simple (as least as simple as possible), lets form the population by
choosing the set of change requests closed in some previous period, say the
previous month or quarter. To choose a statistic, one needs to look at the data
and pick the statistic that best answers the question. Most people assume the
mean of the time (or cost) to complete is the best choice.However, that choice is appropriate
when the shape of the histogram of the time to complete is centered on a mean
as is common in normally distributed data.
of the advantages of working in IBM is that we have lots of useful data. Inspection
of some APAR data of the time to complete from one of our teams in the IBM
Software Lab in India shows the distribution is not centered on a mean. and so reduction of the mean time to
complete is not the best measure of improvement.
have looked at literally tens of thousands of data points for time to complete
of change requests across all of IBM and have found the same distribution. For
you statistics savvy, it appears to be a Pareto Distribution, but statistical
analysis carried out by Sergey Zeltyn of IBM Research’s Haifa lab shows that
this distribution does not well fit any standard distribution. A possible
explanation is that is the time required to fix the defects is Pareto
distributed, but since the resources available to fix them is limited, the
actual time to complete is not pure Pareto. In any case, a practical way to
proceed is to choose a simple (non-parametric) measure: width of the head, i.e.
the time it takes to complete 80% of the distributions.
with this analysis in place, the organization decides to precisely specify the goal such as a 15% reduction in time and
cost to complete 80% of the requests closed each month.So the outcome measures are the time it took to close and costs of 80% of
the requests closed each month.
chosen this measures, we are ready to identify the data sources and instrument the measures. So far so good. But wait, we still need
to answer questions 3.
I mentioned, in order to improve the outcome measure and achieve the goals, the
lab teams have agreed to adopt appropriate Rational practices and tools to automate certain processes. The
practices were selection using the Rational MCIF Value Tractability Trees (a
development causal analysis methed). Adopting and maturing the practices and
their automations are the controls. Some
control examples are automating the regression test and build process, and the adoption
of a stricter unit test discipline to reduce time lost in broken builds. There
are control mechanisms with associated control
measures such as time-to-build, regression test time-to-complete, percent
of code unit-tested, and a self-assessment by the team of their adoption of testing
and build practices.
answer question 3, we need statistical analytics
to determine if the changes in the control measures have had a significant impact
on the outcome measures. Our Research staff has settled on those analytics, but
I will discuss that in a later entry. This entry is already too long.
case study is both reasonably straightforward and far from trivial. It does
show as promised that GQM(AD) and Outcome and Controls work together. I leave
you all with a thought problem. How would you apply the pattern to teams
developing new features to existing applications?
To now, this blog has been a series of essays on the theoretical considerations underlying the analytics of development. With this entry, I want to start changing the emphasis to the practicalities of building analytic tools. Going from theory to practice raises all kinds of issues: data content and formats, robustness of algorithms, reinforcing agile practices, .... To start that discussion, lets start with an epic on how an analytic tool for agile teams might work:
A lead of an agile team, call her Shirley. has been asked to deliver a mobile application, with a specified set of features, in time for the next world games, which is one year away. Understanding that the future is uncertain, Shirley treats the time to complete as the random variable. Before committing to the project, she needs an initial distribution of the time to complete the project. With such a distribution, she has a view of the probability of achieving the goal. It is the area under the distribution curve that lies to the left of the target date in Figure 1.
Figure 1. Probability distribution of delivering the Shirley’s Mobile app project
Fortunately she has tool called 'ARaVar' to help her build and maintain this distribution. This tool is federated with her OSLC agile project environment, Agilista (a fictional product). To use ARaVar, the team estimates the level of effort required for each feature using planning poker. In particular, for each feature’s level of effort the leadership team agrees on three values to enter in Agilista:
The low (best case) – Assumes all the stars align and the feature comes together easily to meet requirements.
The high (worse case) – Assumes Mr. Murphy S. Law and Ms. May Hem unexpectedly join the team and inject unexpected challenges and obstacles.
The nominal (most likely) – Assumes level of effort has the expected mix of good fortune and bad luck.
Behind the scenes, the ARaVar finds these inputs to in Agilista and uses them to define triangular probability distributions. In particular, AraVar interprets these effort inputs as saying
There is zero probability that the level of effort will be less than the best case.
There is zero probability that the level of effort will be greater than the worst case.
The greatest probability of the level of effort will be at the expected case.
So ARaVar sets the distributions to be zero below the low value and above the high value, with a peak at the expected case. Figure 2 show the resulting triangular distribution, setting the high and low to zero and setting the peak (expected case) so that the total area of the distribution in one.
Figure 2: Typical triangular distribution for each feature.
In the parlance of Bayesian reasoning, this technique provides the subject matter experts a means of arriving at an honest prior, based on current information and informed belief. If the difference between the low and high of the distribution of a feature is large, then the team is expressing its uncertainty of the effort required to deliver the feature. This gives Shirley’s team the opportunity to focus the team on resolving the uncertainties early, progressively de-risking the project.
With this prior estimate in place, Shirley has an idea of how likely it is she can make the commitment and she negotiates the content. What-if analysis in ARaVa provides her with capability to compute the impact of adding, changing or dropping one or more features from the program. Luckily, she does find that one of the relatively uncertain features is more of a nice-to-have than a must-have and adds considerably more risk than value. So she negotiates that feature out of scope for a firmer commitment to an earlier delivery in 11 months as illustrated in Figure 3.
Figure 3: The negotiated delivery commitment: earlier and more predictable.
So Shirley now is in a good place. She has agreement on the scope of the project between her team and her stakeholders. She feels her team has a good chance of delivering on time.
In the Agile fashion, work proceeds by establishing work items to deliver the features. These work items are scheduled for iterations/sprints, on an ongoing basis. As the team completes work items, they not only have less work to complete, but also have a track record of the actual time it takes the team to complete work (called team velocity). From a Bayesian perspective, these constitute important evidence of how well the project is actually executing. ARaVa queries Agilista for the completion status of the features, the work item burndown history, and updated effort-to-complete estimates for the remaining features. ARaVa uses modern predictive algorithms to update the time to complete distribution.
With these ongoing predictions, Shirley can discuss with her team, and external stakeholders, whether the odds of meeting the commitment are improving (as they should) or degrading. If the later is the case, she can use ARaVa to predict the impact of managing content (decommitting features) or adjusting resources. For example, the tool revealed that one feature was very much at risk. In discussion with the stakeholders, it was decided that this feature was necessary and so it was decided that for the next sprint there should be more resources focused on the this feature. Some staff were assigned to the team for just that sprint. With ARaVa, all stakeholders can have a more honest and trustworthy discussion on how best to proceed.
ARaVa does not yet exist, but it is not a dream. IBM Rational and Research are now in the process of developing such a tool for a possible delivery next year. We are calling the project AnDes (for Analytics of Development). AnDes uses state of the art learning algorithms. We do have working versions federated with Rational Team Concert (We did show a preview at last year’s Rational Innovate). In addition to consideration of automating the data collection, we are exploring how it can be applied across a wide range of projects:
Large to small
Innovative to complex
Fully or partially agile.
We are looking for design partners now! Interested? Please let me know at email@example.com.
In my last blog, I laid out a vision of how a project lead and her stakeholders might use the predictive analytics to drive to better project outcomes. As I mentioned in the entry, IBM Rational is work on such a tool. A demonstration of this tool is found here: Agile Development Analytics Demo. The video was created by Peri Tarr, the lead architect on the project.
Some of you might notice that the terminology and development process described in the demo is at odds with your understanding of Agile. We do understand that and currently we are working on making a robust tool that accommodates a wide range of processes for what might some might call 'pure Agile' to various hybrids we are discovering in the market place.
In the next blog, I will explain more on how the tool works.
I am not a natural blogger, but do like to share thoughts with a broad community of folks with shared interests.
My longterm passion is how to enable organizations develop and deliver software and systems more effectively. I am thoroughly convinced that well organized, governed organizations not only deliver value to the businesses, but also enhance the lives of the staff. In such organizations, people get to work on cool things, innovate, work well with colleagues, and build a legacy by being a part of making things of value.I also am a mathematician by training. I am especially energized by the opportunity to apply mathematical reasoning to the improvement of software and system organizations. My current assignment is to lead the Business Analytics and Optimization (BAO) strategy for Rational.
So, with this blog, from time to time, I will share my thoughts on BAO for software and system organizations. I hope this blog will be a catalyst for building a community. I especially look forward to comments and conversations.
First, I am pleased that many saw the humor in the April fools posting. That said, I wonder if there will be ever quantum project management. Also, I fear this blog lacks humor. I will do what I can, but there is only so much that can be done to spice up the topic of analytics for software and system organizations. So, back to the serious stuff.
But first a joke that I believe that dates back to vaudeville: Onstage, there is a streetlight. Under the streetlight, there is a man crawling around on hands and knees. A policeman walks up and asks what he is doing. The man says he is looking for his keys. The policeman asks if he is sure he lost them here. The man answers, "No, in fact I lost them down the street." The policeman asks why is he looking under the light. The man answers, "The light is brighter here."
OK, not so funny. So what's the point? A while back, I was discussing a client's management program with a colleague (who will remain nameless and I hope is reading the blog). I pointed it would not serve any purpose. My colleague answered "Well at least they are measuring something." I retorted, "First, you need to figure out what you need to measure, then figure out how to do the analysis and get the data." We left it at that. More generally, software and system organizations often measure what is easy, not what they need. They look where the light is brightest. We still have the question how to specify the needed measures, analytics, and data collection program.
In an earlier entry, I proposed some measurement principles. While these principles are sound for assessing a measurement and analytics program, they do not provide operation guidance for defining the set of measures, associated analytics, and data. What is also needed is the analytics version of a requirements analysis. Last Friday two colleagues (named Clay Williams and Peri Tarr who I believe do read the blog) introduced me to the Goal Question Measure (GQM) method. This method has been extended in various ways such as GQM+Strategy.
I have seen the method applied. It looks much like functional decomposition and so it is a requirements analysis technique for analytics solutions. I think it should be extended to include identification of the data sources. So we would have GQMAD (not kidding), my spin on the main idea:
Goals - what is the organization trying to achieve
Questions - how would know quantitatively that the goals are being met?
Measures that provide answers to the questions.
Analytics that realize the measures
Data that feeds the analytics.
Taking such an approach is a far cry from looking where the light is brightest. Note after building out the GQMAD requirements, one still needs to design how the data is collected and staged, how the analysis is executed, and measures displayed in order to answer the questions. So design and development of the analytics solution remains after the GQMAD process is carried out.
For my waterfallphobic friends, I share the concern. Building an analytics solution this way should be more iterative than is described above. Probably something like the Unified Process can be applied using GQMAD as a good requirements practice.
Anyone out there with GQM experience they would like to share?
This entry is a follow-on to my most recent entry. The idea is that random variables are are theway describe the uncertain quantities that arise in managing development efforts. They are a natural extension of the fixed variables we all grew up with. In fact, a fixed variable in a random variable that has probability one of taking a given value and probability zero of taking any other value. In this entry, I explain how you can (with computer assistance) calculate with random variables.
Suppose you want to add two random variables, v1 and v2. This need might arise if you have two serialized tasks in a Gantt chart, each described by a distribution as explain in the previous entry and you would like to know how long it would take to complete both of them, i.e. the sum of their durations.
How would you proceed? First note the sum would be another random variable. Therefore what you need is the probability distribution of the sum. There is no formula for that distribution, but there is an effective, commonly used numerical approach, known as Monte Carlo simulation.
The idea behind Monte Carlo simulation is to use a pseudo random number generator take a sample value of v1 and a sample value of v2 and then add them. For more detail, follow this link. Note that the values are selected according to the probability distributions of each of the variables. The more likely values are taken more often. Now save that sum and do the same thing many times, say 100,000 times, and store each of the sums. For each of the sums, you can compute its probability by looking at its frequency in the collection of saved sums (some sums are more frequent than others) and divide by the number of samples (actually you have to round the sums to get the counts). What you get is an approximation of the distribution of the sums.
Lets look at an example, if v1 has a triangular distribution with L = 3, E = 4, H=7, as shown in Figure 2 and v2 has a triangular distribution with L = 1, E = 6, H= 7 as seen figure 3.
The distribution of the sum, found using the Monte Carlo simulator in Focal Point, is given by figure 3.
First note the sum is not another triangular distribution. It is
smoother. This is to be expected from the mathematics of probability. On
the other hand, the distribution of the sum makes sense. For example,
we would expect the most likely value of the sum to be 10, the sum of
the two most likely values, but the simulation found 9.98. The
discrepancy is due to chance and would diminish with more samples. Also
note the probability is zero below 4, the sum of the lows, and above 14,
the sums of the highs.
For fun, here is the distribution of the product for the variables:
The reader can check if this looks sensible. Note also that the product does not have a triangular distribution. The peak is much smoother.
So random variables can be used in place of fixed variables in any computation. So they have all of the utility of fixed variables and enable us to express uncertainty. They may seem foreign at first, but they are worth the trouble to learn. Like anything else, they become intuitive after a while.
I have the honor of giving one of the keynotes at the Conseg2011 conference this February in Bangalore. I have chosen a large, perhaps overly ambitious topic: "The Economics of Quality". Here is my conference proceedings document. My goals in preparing the paper and presentation is to make the case is that quality is fundamentally an economics concern and to suggest an overall approach for reasoning about when the software has sufficient quality for shipping. For those who have read some of my earlier entries, you will see have different my thinking on the topic differs from those who take a technical debt approach.
Anyhow, the brief proceedings paper is very high level and there is considerable work to filling in the details and validating the approach. This paper really is the beginning of a program that I believe, when carried out, will have great benefit to our industry. So, I would like to hear from anyone who has similar interests and perspectives. There must be some existing relevant research. Perhaps we find enough like-minded folks to build a community exploring the topic.
In a previous entry, I discussed triangular distributions. I pointed out that they arose from the practices in Hubbard's How to Measure Anything. When making an estimate, one asks for the high, low and expected values of a quantity. These are used to to define a probability distribution by interpreting the results to mean that there is zero probability that the value is less than the low or greater than the high and the mode of the distribution is at the expected. You get a distribution like the figure below.
A reader, Blaine Bateman, president of EAF LLC, found the elicitation question too restraining. After all, saying there is no probability of a value being below the low may call for an unreasonable level of certainty. He prefers asking the asking the question, "Give me low and high values that in which you are 90% confidence."
This raises an interesting (at least to me) mathematical question, "Can you specify the triangle given the mode and the 90% of the distribution". That is, can you specify the triangle given points c and d rather than a and b. Blaine has a couple of solutions based a choice of addition assumptions you need to make to get a unique solution.
An IBM colleague, Jim Densmore, has the phrase 'Embrace Uncertainty' below his e-mail signature. I know Jim well enough to believe I know what he means. In fact, I used to give a presentation with the that title. The idea is that uncertainty is an aspect of the world as it is. If you constantly deny uncertainty and expect to be able to predict the future as if it were certain, all sorts of disappoints ensue. If, instead, you understand and accept the uncertainty, you are more in harmony with the world. Uncertainty is not your enemy, it just is a part of life.
What brought this mind is that one of the world's leading poker players, Annie Duke, used the phrase in the latest episode of Radiolab, one of my favorite NPR programs. She explains how embracing uncertainty is the key to being a great poker player, not gimmicks like reading 'tells'. Follow the link and check it out.
Of course, readers of the blog know that I believe this applies to our domain. Developing software is systems is fraught with uncertainty. One should embrace it and learn to manage it by applying the right analytics.
I know. It has been a long time since my last posting. Over the last few months, nothing much happened that prompted an entry. I was thinking of writing something sort of philosophical on the nature of estimates, but never got to it. Then there was the tsunami and the associated nucler reactor failures at the Fukushima power plant. Suddenly, the topic became more urgent. This is relevant to this blog, because our domain includes the engineering and economics of safety critical systems. Presumably the nuclear reactor industry uses the state of the art methods. I have been exploring what is going on there and while, I am far from an expert, I have found out some things worth sharing in a blog.
We have been told that reactor failure is a 1 in over a hundred thousand year event. Sounds reasurring.Yet, in my lifetime, there have been three that I know of: Three mile island, Chernoble, and now Fukushima. Discounting Chernobyl which apparently was greatly under-engineered, something must be wrong for the there to be two meltdowns in what has been estimated to be a one in over 100,000 year event. Apparently, there have been many near misses, e.g. the loss of coolant at the Brown's Ferry plant, something one would not expect from such safe systems. This raised some questions. What does a 'one is N year event' mean? Does it mean that we should not expect the event until N years has passed or that we can be certain one will occur within N years? More importantly, if there are K systems, each having a 1 in N year safety rating, what is the rating of the poputation? As I have pointed out in previous blogs, we do not need estimates, we need probablility distributions to get any practical understanding.
Here are some of the sources I found. This New York times article helps explain what is going on. A few points in the article caught my attention. First, they carried out 'deterministic' risk analysis (see this NRC page) because probabilistic methods are "too hard. A good summary of the difficulty of the problem and the history of how it is addressed is found in Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis by M. Granger Morgan and Max Henrion. Briefly there is little to go on to estimate the likelihood of individual events and their dependencies in event chains. So the distribution of estimated time to failure must have huge variance. This article by M. V. Ramana summarizes and criticizes the current practice including probablistic risk assessment (pra). The key idea is that failure results from a sequence of component failures. Each component is reliable and so the probability of a system failure is the joint probability of the component failures which is very low. This assumes that the component failures are independent events. However, as parts of thr system, the joint probabilities are hard to estimate. For example, one component may fail which results in a second component running out of spec which might result in a number of other component failures. Getting the joint probabilies right entails a very faithful system model and data collected from thousands of simulations with varying inputs and Monte Carlo methods to take into account the variability of the components. The output of such a simulation could be used to improve the system design.
I want to focus on another pra challenge: estimating the likelihood of the devastating single cause event, such as earthquake and the tsunami. Clearly some sort of data are needed for the estimate, but what sort of data? As pointed out in the New York Times article, there was a deep historical data search of the size and frequency of the earthquakes in the relevant geographic region. That led to the conclusion that planning for 18 feet of water was sufficient. Recall that the reactor was inundated by 40 feet of water. So past performance was not predictive. In retrospect, that is not surprising given that tectonic plate movements are hardly a stationary process. An alternate approach is to use modern geologic models and plate measurements. Then one could and should run simulations to get a distribution of the flood depths. One could argue that this approach is also suspect, since they depend on the quality of the models which introduces a subjective element. However, using historical data also is based on an assumption that earthquake generation is based on a stationary process, a very dubious model and its adoption is equally subjective. To be fair, presumably one could not run the needed simulations in the 1960's and so the frequency model may have been the best available. It can be argued that earthquake prediction is notoriously difficult, especally pinpointing when an earthquake will occur. However, using Monte Carlo methods and simulations, it seems reasonable that one can create a probility distribution of the time to an earthquake above a certain size and use this to estimate the likelihood of the event over the lifespan of the plant.
The point of this discussion is that frequency model data is no more 'objective' than data used to build and apply models. Both involve subjective assumptions of the validity of the model. Note Baysian data analysis methods can be used to validate the various models and so we can assess their usefulness in the estimation process.
Finally, these safety estimates are used to set policy and, in particular, to make economic decisions about nuclear energy. The cost of a failure is huge. For example, an estimate of the cost of Fukushima failure is $184 Billion. The proponants of nuclear energy argue that they make economic sense, assuming they are safe enough and the new designs are much safer. Maybe so. But knowing they are safe enough will take much better analytics than we have seen to date.
One of the common criticisms of estimation methods is that
the calculation is no better than the assumptions: garbage in, garbage out (affectionately known as GIGO). That is, if you make poor or
dishonest assumptions then you will get misleading forecasts. It is especially egregious
that occasionally someone might take advantage of the system by gaming the system
and intentionally feeding assumptions that lead to false forecasts to get a
desired business decision.
However, estimation is an essential part of any disciplined
funding decision process (such as program portfolio management). The funding decision
relies on estimates of the costs and benefits. But for reasons just described,
estimation is suspect.
So, what to do? I suggest the answer is not to abandon
estimation; the answer is the not input garbage, or if you do, detect it as
soon as possible to minimize the damage.
First note that the future costs and benefits are uncertain,
so any serious approach to the GIGO
problem is to treat the assumptions as random variables with probability distributions and work from there. Generally, this allows one to use the limited information at hand to enter the assumptions and calculate
Douglas Hubbard, in How to Measure Anything, gives us one way to proceed. Briefly, when an uncertain
value is needed, ask the subject matter expert (sme) to give not one but three values:
low, high, and expected. The three values may be used to specify random
variables with triangular distributions [ref].
In this case, the greater the difference between the high
and low values, the wider the triangular distribution of the estimate reflecting
the uncertainty of the sme who is honestly making the assumptions.
One can use the random variables as values in the estimation
algorithm using MonteCarlo by repeatedly
replacing the single values with sampled values of the triangular distributions
and assembling the distribution of the estimated value. Note the estimate is
again just as good as the assumptions, however we assess our faith in the
estimate by the width of the 10%-90% range of its distribution.
For example, one might estimate to the total time for
completing s project by a project by entering, for each task, the least time,
the most time, and the most likely time. Then one could apply Monte Carlo
simulation or more or
more elementary methods to rollup the estimates to compute the distribution
of the time to complete.
Hubbard goes further by suggesting that as actuals in the
assumptions come available to review if they fall within the 10% -90% range of
the initial distributions.If they do,
fine. If they don’t, questions are asked about the underlying reasoning and
beliefs. Over time the organization becomes more capable and accountable at
making good assumptions.
Further, we can also deal with the garbage in garbage out
problem by using actual data whenever possible. There are at least two techniques.
In the first, as actuals in the assumptions become available
in the, they can used to replace the distributions. For example if there are
month-by-month sales projections captured as triangular distributions to
forecast sales volumes, the distributions are replaced by the actual sales numbers.Also, one should update the remaining triangular
distributions reflecting the actual sales trends. The resulting estimate will usually
have a narrower distribution.
A second technique is Bayesian trend analysis. In this case
we use actuals for evidence of the estimate. For example, if a project were on
track, then we can expect that certain measures, such as burn down rate and
test coverage reflect that. If a project were to ship on time, the number
unimplemented requirements would be going to zero, Similarly, the code coverage
measure would be trending towards the target. So these measures are evidence of a healthy
project. Using Bayesian trend analysis, we
can turn the reasoning around and update the initial (prior) estimate of the
time for completion using the actuals as evidence for an improved estimate. The
result is an improved probability distribution of the time to complete the
project. As more actuals become available, the distribution becomes narrower,
increasing the certainty of the forecast.
This way one can detect early if the system is being gamed
and at the same time, use the actuals to estimate the likelihood of an on-time
So generally, one can use actuals to not only improve the
estimation process as Hubbard suggests, but also to apply Bayesian techniques,
to improve the estimates of the program variables.
An ongoing theme of this blog is that development processes differ from other business processes in that there is a wide range of uncertainty inherent in the efforts. It follows that tracking and steering development efforts entails ongoing predicting, from the evolving project information, when a project is likely to meet its goals.
Late last year, Nate Silver author of the Fivethrityeight blog and well know predictor of elections published The Signal and the Noise, a text for the intelligent layperson on how prediction works. I was impressed by the book as it explained the principles behind the sort of Bayesian analytics we need for development analytics without any explicit math. However, I felt for the folks in our field would greatly benefit by having the mathematical blanks filled in. So I decided to write a series of papers introducing the topics to folks who had some statistics and maybe some calculus in college, but not a solid background in prediction principles.
In the previous entry, I introduced a probabilistic view of a commitment. The main idea is that when you commit to deliver something in a future, you are making a kind of bet. The odds of winning the bet is the fraction of the distribution of the time=to-deliver before the target date. For example, in the following example, the project manager has a 47% likelihood of winning the bet.
The raises a couple of questions. First, how is the distribution of time-to-complete determined? There are variety of methods to estimate time to complete of an effort. I am not taking a position on what method to adopt. The important point is that the estimation method should not return a number but a distribution! The major estimation vendor have this capability even if it not always surfaced. I will expand on this point in the next blog entry. For now, the key point is that you should be working not with point estimations, but with the distributions.
Second is how the project manager affects the shape and position of the distribution and therefore affects the odds. Some of the techniques are intuitive, some not so much, There are two things one might do: move the distribution relative to the target date, and change the shape if distribution typically narrowing it so that more of it .lies within the target date.
In the first, one can either move the target date out, so that the picture looks like this
This is, of course, intuitive - moving out the date lowers the risk. Another intuitive thing a project manager might do is the descope the project - commit to deliver less functionality. This may have two effects on the distribution: It will move it to the left as there will be less work to do. Depending on the difficulty of the descoped feature, the descoping may also narrow the distribution. By removing a difficult to implement feature. one is more certain of delivery, narrowing the distribution, removing risk resulting in this diagram:
Now comes the unintuitive part. Suppose the target date and content are not negotiable. What is a project manager to do then? The idea is to take actions that will narrow the distribution in Figure 1 so that it looks like
How is this done? Many project managers, in the name of making progress choose the easiest functions to implement first, "the low hanging fruit". However, by doing this the shape of the curve in figure in minimally affected, The less intuitive approach, Following the principle of the Ration Unified Process, is to work on the most difficult, riskiest requirements first! These are the requirements of which
the team has the least information and so should tackle first in order to have time to gain the information needed to succeed. Putting off the riskier requirements and doing the easy stuff first gives the appearance of progress, but by putting off the riskier requirements, one will run out time to do the riskier requirements and fail to meet the commitment.
All this has to be while ensuring their is sufficient time to fulfill all the requirements, risky or not. So in the end, one must account for both the time to complete tasks and their uncertainty to meet commitments. Some techniques for doing that will be discussed in the next blog entry.
In a conversation with a development lab productivity team, I was reminded that the first challenges software and system organizations face when starting an improvement program is 'what to measure'. In particular, some organizations start with what is easy to measure with the understandable thought "we need to start somewhere". I have found this approach tends not get traction. Over the years I have settled on some principles that seem to apply:
Be careful that the measurement drives the intended behavior. Organizations respond to how they are measured. This is useful in that measurement provides a way to change behavior. On the other hand, it is dangerous in that measurements can drive undesired behavior. For example, insisting that every project exactly meet the initial estimate of schedule, cost, and content will lead to risk aversion and drive innovation out of the organization.
You can't manage what you don't measure (taken from Lord Kelvin) -This is one of those universal truths that applies to every field.
Don't measure what you do not intend to manage - Measuring without a way to respond to measurement adds overhead and lowers productivity
For development organizations, one size does not fit all -The measures need need to reflect the mixture of work the team is doing. For example measurements for a team doing level 3 maintenance work may not apply to a team building a new platform.
The Einstein test, make the measures as simple as possible, but no simpler.
I suspect there some other principles that should be added. But these are a good start.
In a later blog, I will discuss levels of measurement.
As readers of this occasional blog know, this blog has been less of a 'web log' and more a series of small essays on the topic of development analytics. I have decided to start writing less formal entries more frequently and have realized I would be comfortable doing that on my own web site, murraycantor.com.
I want to be entirely clear. IBM has in no way looked over my shoulder in the writing of the blog and has been very generous in providing me a forum. Nevertheless, I will be freer sharing my opinions when there is no opportunity of confusing my often idiosyncratic opinions of those of the company's.
In my previous couple of blog entries, I used triangular distributions for examples. For many who suffered through (or maybe enjoyed) their stat classes (what are the odds?), this might be a surprising choice. They were taught the default choice would be a Gaussian distribution. For those more attuned with modern business analytics, they are likely to be familiar with triangular distributions. In this entry, I'll briefly the reasoning beyond each of them.
First, as you hopefully recall, both are distributions associate with random variables (Those who don't recall migh benefit from the series of tutorials at The Khan Academy site). Each are non-negative functions with integral (area under the curve) one. (There are fancier mathematical definitions, but no matter.) Each describes the likelihood of each of set of possible outcomes of some random variable. The difference in shape between Gaussian (aka Normal) and triangular distributions reflects the nature and use of the random variables.
Briefly, normal distributions are often arise as the histogram of a set of measurements. They have some central value (called the mean) and some dispersion (called standard deviation) around the mean. Anyone who took a stat class studied these distributions. They show up in a many contexts:
The distribution resulting from tabulating the histogram of repeated, but imprecise measures of some quantity and then divided the entries by the sum of the measures is often assumed to be normal. The mean of the distribution is the estimator of the actual measure.
Statisticians like the normal distribution for several reasons. First, it is easy to parameterize. If you know the mean. mu (μ), and the standard deviation, sigma (σ), you have completely characterized the distribution. For example, the likelihood of a measurement occurring is often characterized as being within some number of σ's from the mean. Figure 1 shows how this works.
The likelihood of a value falling in a range is given by the area under the curve. For example, the probability of a value of the normally distributed random variable falling within one standard deviation of the mean is 68.2%.
Normal distributions have one really cool feature called the Central Limit Theorem, which states that under remarkably general conditions, the sum of a set of random variables will be close to normal. Notice, in the previous blog entry, when we added two triangular random variables, the sum appeared smooth and in fact started to look normal.
All that said, I do have have a pet peeve. Normal distributions are overused. Most things in nature and economics are not normally distributed. For example, as as documented in Wikipedia, these phenomena are nowhere near normal, but are closer to a Pareto distribution:
The sizes of human settlements (few cities, many hamlets/villages)
File size distribution of Internet traffic which uses the TCP protocol (many smaller files, few larger ones)
Hard disk drive error rates
The values of oil reserves in oil fields (a few large fields, many small fields)
The length distribution in jobs assigned supercomputers (a few large ones, many small ones)
The standardized price returns on individual stocks
Fitted cumulative Pareto distribution to maximum one-day rainfalls
Sizes of sand particles
Sizes of meteorites
Areas burnt in forest fires
Severity of large casualty losses for certain lines of business such as general liability, commercial auto, and workers compensation.
Getting back to our topic, let's turn to triangular distributions. They are not used to describe a set of measured outcomes from an experiment. They are used to describe what we know or believe about some unknown random variable. For example, the sales of a new product one year after delivery generally can not be determined by measuring the sales of a bunch of new products. As pointed out by Douglas Hubbard, treating the future sales as a single fixed variable is unreasonable (although all too common). What is more reasonable is setting the low (L), high (H) , and most likely (E) values of the future sales. As I wrote in an earlier entry, these are the values that specify a triangular distribution. I.e. triangular distributions are set to zero below a given low value, L, and above the high value, H, and peaks at the expected value E. The distribution is then a describe be a triangular curve so that the total area is 1. Here is the distribution for L = 1, E=6, and H=7.
Some would argue there is a 'real' distribution of the future sales random variable and it is unlikely to be triangular. My response is for all practical purposes, it does not matter. The triangular distribution is a good-enough approximation to whatever the real distribution might be. By 'good enough' I mean they may be used to support decision making: they are a big improvement over using single values. They are also practical as they easy to specify and there is no assumption of symmetry, No wonder they are common in business analytics.
To wrap up, normal distributions are occasionally useful to describe outcomes of measurements while triangular distributions are useful for giving rough estimates of one's belief of the liklihood of outcomes based on the evidence on hand. More generally, normal distributions are useful in frequentist statistics and triangular in Bayesian statistics. See this Wikepedia article for a discussion of the kinds of statistics. Much of what we do in development analytics is more Bayesian than frequentist. I hope to write more about that in the near future.
I would like to build on the theme of reasoning about what to measure. The goal of business analytics is to track what matters to the organization (what it is you are trying to manage) and respond to the measure in some way to gain improvement. The science of measuring outcomes and In manufacturing and some service delivery domains is statistical process control (SPC), SPC lies at the heart of the Six Sigma movement. Even so, there will be no need to have a 6-six sigma belt to participate in this discussion . While there is reason to believe that not all of the Six Sigma practices do not apply all that well to our domain, the idea of tracking outcomes, applying statistical analysis to detect change change, and applying some sort of controls to affect the change applies in all business domains, including software and system development and delivery.
Briefly then, the outcomes are the operational goals and controls are the actions you take to achieve the outcomes, So naturally we need too kinds of measures.
Outcome measures - tracking the measures of effectiveness of the business organization
Control tracking whether - tracking whether the controls are in fact enacted.
Here is a thought experiment:. Imagine there is a potato chip factory with an operational goal of achieving the right amount of salt on its chips. There is an target amount and the factory needs to stay within small limits for market acceptance. So everyday they grab a sample of chips and record the saltiness. They apply salt by running the recently deep fried chips under a salt shaker. The two controls are the frequency of the shaker and the speed of the belt. Both the shaker frequency and belt speed are measured to confirm the controls are properly responded to. In this example the saltiness is the outcome measure and the shaker frequency and belt speed are control measures.
The simplest way, for me at least, to think about SPC is to measures trends in outcome measures and control measures to determine the likelihood that the controls are in fact affecting the outcome. In our potato chip example we might find that we cannot control the outcome well enough by the shaker and belt controls. In that case, we might look for some other factor to control, say the factory humidity.
If you look at many measurement programs in software and system you often find that outcome and measures are confused. In fact even sorting the measures into the two buckets is hard. No wonder measured process improvement for our domain has been so hard.Anyone have good examples of measurement patterns or antipatterns of measuring controls and outcomes?
When I started this experiment in blogging, I wrote that I am not a natural blogger. I am not the affable, chatty web presence who on a daily or weekly basis shares one's thoughts. I have learned since then what kind of blogger I am. My style is to write little essays that might take weeks to prepare, given the priorities of my day job. I have also found I do enjoy writing the blog as it gives me a chance to share some issue that is top of mind. So here goes:
A few weeks ago, my good colleague displayed a chart in one of his PowerPoint decks entitled something like "How to Understand Murray." The chart was an explanation of probability distributions. It was both flattering and a bit of a wakeup call. As Arthur mentioned and the readers of this blog know, much, if not all, of my writing assumes an understanding of probability and probability distributions (aka probability densities). My experience in discussions with folks from our industry is that most of them have vague memories from some stat class in college and so can generally follow the discussions, but most could use a refresher. I could simply refer readers to a good Wikipedia article, but, instead, let me given a domain-specific example.
Let's go with a topic I wrote about in a previous entry: time to ship. For explanation purpose, let's take a fictional example. Suppose you are starting a project expected to ship in 110 days. That said, we cannot be 100% certain of being ready to ship on exactly that day, no sooner, no later. In fact, it is very unlikely we will exactly hit that day. Maybe we will be ready the day before, or maybe the day before that. Since being ready on or any day before day 110 is success, we can sum up the probabilities on being ready on any of those days to get the probability we really care about. All that said, the probabilities for each of the days matter because we need them in order to get the sum. The set of probabilities for each of those days is the probability distribution or density that is our topic.
Let's look at a simple example.
In this example of a triangular distribution there is 0% probability that the product will be ready before day 91 and we are 100% certain we will be ready before day 120. We think the days become more probable and reach a peak as we approach day 110 and fall off after that. This graph then shows the probability, day by day, of being ready on exactly that day. You might notice that the peak is less than 0.07 (actually 0.06666...). This makes sense since we are assuming that the project may be complete on any of 30 different days and so the densities would be in the neighborhood of 1/30 = 0.03333. In our case, some are above and some below.
These distributions are the basis of calculating the likelihood of outcomes. The principle is very simple: The probability of being ready within some range of dates is the sum of the probabilities of being ready on exactly one of those dates, i.e. , we add up the density values for those days. As I explained above, if we want to compute the probability of being ready on or before day 110, we would add up all of the densities for days 90 to 110 to get 0.7. Using the same reasoning the probability of being ready on some day before day 120 is the sum of all the densities which comes to exactly 1.0, which was one of our going in assumptions. In fact the property that the sum of densities for all possible outcomes equals 1 is a defining property of distributions. Those who want to try this out at home could use this spreadsheet.. For example, can you find on what day, being ready on or before that day is an even bet?
For most development efforts, the overall state of the program (some would say 'health') is characterized by the shape of the distribution, This shape changes every day. Every action the team takes changes the shape. So, one key goal of development analytics would be to track the shape of the distribution throughout the lifecycle, a daunting task. More on this (probably ) in future postings.
Today, April 1, seems like a good day to bring forward an important new idea. In fact, I think this may be the next big thing.
One of the well-understood problems with software development project management is that it is often impossible to completely specify the complete work breakdown with certainty. The longer the project and the more innovative the project, the more uncertain the work breakdown items. This is addressed in iterative, agile planning by identifying the summary work items and then adding detail as the project evolves. Another source of uncertainty is the dependency between the summary items. This uncertainty in turn makes critical path analysis for such programs problematic. In fact there is a whole ensemble of project critical paths, each with some likelihood. For the physics literate, this ensemble of paths is much like Feynman Path Integrals in quantum theory. The math is pretty hairy (see this elementary description). Fortunately, as Feyman also pointed out, one can simulate quantum mechanics with quantum computers. I am no expert in quantum computing, but even so I have a proposal: Quantum Informed Projects (QuIPs). The idea is to represent work items as QItems using QBits from quantum computing.. Then we can represent the project as a set of entangled QItems and using a suitablly large quantum computer to calculate the wave function for the critical path.
My understanding is that we do not yet have large enough enough quantum computers to make this practical. However, the same is true for implementing other useful quantum algorithms (see this example). So we can start by building algorthms. There is no time like the present (not accounting for the quantum uncertainty of measuing time) So on this special day, lets turn our attention to QuiPs.