One of the common criticisms of estimation methods is that
the calculation is no better than the assumptions: garbage in, garbage out (affectionately known as GIGO). That is, if you make poor or
dishonest assumptions then you will get misleading forecasts. It is especially egregious
that occasionally someone might take advantage of the system by gaming the system
and intentionally feeding assumptions that lead to false forecasts to get a
desired business decision.
However, estimation is an essential part of any disciplined
funding decision process (such as program portfolio management). The funding decision
relies on estimates of the costs and benefits. But for reasons just described,
estimation is suspect.
So, what to do? I suggest the answer is not to abandon
estimation; the answer is the not input garbage, or if you do, detect it as
soon as possible to minimize the damage.
First note that the future costs and benefits are uncertain,
so any serious approach to the GIGO
problem is to treat the assumptions as random variables with probability distributions and work from there. Generally, this allows one to use the limited information at hand to enter the assumptions and calculate
Douglas Hubbard, in How to Measure Anything, gives us one way to proceed. Briefly, when an uncertain
value is needed, ask the subject matter expert (sme) to give not one but three values:
low, high, and expected. The three values may be used to specify random
variables with triangular distributions [ref].
In this case, the greater the difference between the high
and low values, the wider the triangular distribution of the estimate reflecting
the uncertainty of the sme who is honestly making the assumptions.
One can use the random variables as values in the estimation
algorithm using MonteCarlo by repeatedly
replacing the single values with sampled values of the triangular distributions
and assembling the distribution of the estimated value. Note the estimate is
again just as good as the assumptions, however we assess our faith in the
estimate by the width of the 10%-90% range of its distribution.
For example, one might estimate to the total time for
completing s project by a project by entering, for each task, the least time,
the most time, and the most likely time. Then one could apply Monte Carlo
simulation or more or
more elementary methods to rollup the estimates to compute the distribution
of the time to complete.
Hubbard goes further by suggesting that as actuals in the
assumptions come available to review if they fall within the 10% -90% range of
the initial distributions.If they do,
fine. If they don’t, questions are asked about the underlying reasoning and
beliefs. Over time the organization becomes more capable and accountable at
making good assumptions.
Further, we can also deal with the garbage in garbage out
problem by using actual data whenever possible. There are at least two techniques.
In the first, as actuals in the assumptions become available
in the, they can used to replace the distributions. For example if there are
month-by-month sales projections captured as triangular distributions to
forecast sales volumes, the distributions are replaced by the actual sales numbers.Also, one should update the remaining triangular
distributions reflecting the actual sales trends. The resulting estimate will usually
have a narrower distribution.
A second technique is Bayesian trend analysis. In this case
we use actuals for evidence of the estimate. For example, if a project were on
track, then we can expect that certain measures, such as burn down rate and
test coverage reflect that. If a project were to ship on time, the number
unimplemented requirements would be going to zero, Similarly, the code coverage
measure would be trending towards the target. So these measures are evidence of a healthy
project. Using Bayesian trend analysis, we
can turn the reasoning around and update the initial (prior) estimate of the
time for completion using the actuals as evidence for an improved estimate. The
result is an improved probability distribution of the time to complete the
project. As more actuals become available, the distribution becomes narrower,
increasing the certainty of the forecast.
This way one can detect early if the system is being gamed
and at the same time, use the actuals to estimate the likelihood of an on-time
So generally, one can use actuals to not only improve the
estimation process as Hubbard suggests, but also to apply Bayesian techniques,
to improve the estimates of the program variables.
In the previous entry, I introduced a probabilistic view of a commitment. The main idea is that when you commit to deliver something in a future, you are making a kind of bet. The odds of winning the bet is the fraction of the distribution of the time=to-deliver before the target date. For example, in the following example, the project manager has a 47% likelihood of winning the bet.
The raises a couple of questions. First, how is the distribution of time-to-complete determined? There are variety of methods to estimate time to complete of an effort. I am not taking a position on what method to adopt. The important point is that the estimation method should not return a number but a distribution! The major estimation vendor have this capability even if it not always surfaced. I will expand on this point in the next blog entry. For now, the key point is that you should be working not with point estimations, but with the distributions.
Second is how the project manager affects the shape and position of the distribution and therefore affects the odds. Some of the techniques are intuitive, some not so much, There are two things one might do: move the distribution relative to the target date, and change the shape if distribution typically narrowing it so that more of it .lies within the target date.
In the first, one can either move the target date out, so that the picture looks like this
This is, of course, intuitive - moving out the date lowers the risk. Another intuitive thing a project manager might do is the descope the project - commit to deliver less functionality. This may have two effects on the distribution: It will move it to the left as there will be less work to do. Depending on the difficulty of the descoped feature, the descoping may also narrow the distribution. By removing a difficult to implement feature. one is more certain of delivery, narrowing the distribution, removing risk resulting in this diagram:
Now comes the unintuitive part. Suppose the target date and content are not negotiable. What is a project manager to do then? The idea is to take actions that will narrow the distribution in Figure 1 so that it looks like
How is this done? Many project managers, in the name of making progress choose the easiest functions to implement first, "the low hanging fruit". However, by doing this the shape of the curve in figure in minimally affected, The less intuitive approach, Following the principle of the Ration Unified Process, is to work on the most difficult, riskiest requirements first! These are the requirements of which
the team has the least information and so should tackle first in order to have time to gain the information needed to succeed. Putting off the riskier requirements and doing the easy stuff first gives the appearance of progress, but by putting off the riskier requirements, one will run out time to do the riskier requirements and fail to meet the commitment.
All this has to be while ensuring their is sufficient time to fulfill all the requirements, risky or not. So in the end, one must account for both the time to complete tasks and their uncertainty to meet commitments. Some techniques for doing that will be discussed in the next blog entry.
One of the things that characterizes software or systems development is that the project manager routinely commits to deliver certain functionality on a given date at an agreed-upon level of quality for a given budget. It is the role of the project manager to make good on the commitment, The software and systems organization leadership may count on the commitments being met in order to meet their business commitments or there may be an explicit contract to deliver on time for a fixed budget. The measure of a good project manager is the ability to make and meet commitments.
iIn this blog entry, I will discuss the nature of that commitment and how it relates to project analytics. First of all, lets define 'commitment' in this context. Of course, I do not mean the confinement to a mental institution, I mean, as suggested above, the promise to deliver certain content with acceptable on or before a certain date.
The first thing to notice is that the future is never certain, and so we are in the realm of probability and random variables, i.e. a quantity described by probability distribution. Going forward, I will assume the reader is familiar with the concepts of random variables and their associated distributions . Soon, I will devote a blog entry just that topic.
Meanwhile, the best way to describe the likelihood of meeting a commitment is the use of a random variable. Consider the distribution of the time it will take to meet the commitment. It might look something like this:
A similar distribution would apply to cost to complete.
Recall, the probability then of the commitment being met is the area under the curve that falls before the target date:
The manager, in making the commitment, is essentially betting (perhaps his or her career) that he or she will meet the commitment. According to this measurement, the odds are about 50-50. The key measurement then is the amount area of the random variable that lies prior to the target date, which in turn relies on the the ability to calculate the probability distribution. I also will discuss some techniques to do that in a later entry.
Now consider for example, "project health". What I believe what is meant is the likelihood of meet the commitment to deliver the project on time.
If it highly probable the project will ship at the target date, the project is 'green' otherwise it is 'yellow' or 'red' like in the following figure.
There are three reposes to a yellow or red project. One can move the target date, move the distribution, or change the shape of the distriburion, again a topic for a later bog.
I know. It has been a long time since my last posting. Over the last few months, nothing much happened that prompted an entry. I was thinking of writing something sort of philosophical on the nature of estimates, but never got to it. Then there was the tsunami and the associated nucler reactor failures at the Fukushima power plant. Suddenly, the topic became more urgent. This is relevant to this blog, because our domain includes the engineering and economics of safety critical systems. Presumably the nuclear reactor industry uses the state of the art methods. I have been exploring what is going on there and while, I am far from an expert, I have found out some things worth sharing in a blog.
We have been told that reactor failure is a 1 in over a hundred thousand year event. Sounds reasurring.Yet, in my lifetime, there have been three that I know of: Three mile island, Chernoble, and now Fukushima. Discounting Chernobyl which apparently was greatly under-engineered, something must be wrong for the there to be two meltdowns in what has been estimated to be a one in over 100,000 year event. Apparently, there have been many near misses, e.g. the loss of coolant at the Brown's Ferry plant, something one would not expect from such safe systems. This raised some questions. What does a 'one is N year event' mean? Does it mean that we should not expect the event until N years has passed or that we can be certain one will occur within N years? More importantly, if there are K systems, each having a 1 in N year safety rating, what is the rating of the poputation? As I have pointed out in previous blogs, we do not need estimates, we need probablility distributions to get any practical understanding.
Here are some of the sources I found. This New York times article helps explain what is going on. A few points in the article caught my attention. First, they carried out 'deterministic' risk analysis (see this NRC page) because probabilistic methods are "too hard. A good summary of the difficulty of the problem and the history of how it is addressed is found in Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis by M. Granger Morgan and Max Henrion. Briefly there is little to go on to estimate the likelihood of individual events and their dependencies in event chains. So the distribution of estimated time to failure must have huge variance. This article by M. V. Ramana summarizes and criticizes the current practice including probablistic risk assessment (pra). The key idea is that failure results from a sequence of component failures. Each component is reliable and so the probability of a system failure is the joint probability of the component failures which is very low. This assumes that the component failures are independent events. However, as parts of thr system, the joint probabilities are hard to estimate. For example, one component may fail which results in a second component running out of spec which might result in a number of other component failures. Getting the joint probabilies right entails a very faithful system model and data collected from thousands of simulations with varying inputs and Monte Carlo methods to take into account the variability of the components. The output of such a simulation could be used to improve the system design.
I want to focus on another pra challenge: estimating the likelihood of the devastating single cause event, such as earthquake and the tsunami. Clearly some sort of data are needed for the estimate, but what sort of data? As pointed out in the New York Times article, there was a deep historical data search of the size and frequency of the earthquakes in the relevant geographic region. That led to the conclusion that planning for 18 feet of water was sufficient. Recall that the reactor was inundated by 40 feet of water. So past performance was not predictive. In retrospect, that is not surprising given that tectonic plate movements are hardly a stationary process. An alternate approach is to use modern geologic models and plate measurements. Then one could and should run simulations to get a distribution of the flood depths. One could argue that this approach is also suspect, since they depend on the quality of the models which introduces a subjective element. However, using historical data also is based on an assumption that earthquake generation is based on a stationary process, a very dubious model and its adoption is equally subjective. To be fair, presumably one could not run the needed simulations in the 1960's and so the frequency model may have been the best available. It can be argued that earthquake prediction is notoriously difficult, especally pinpointing when an earthquake will occur. However, using Monte Carlo methods and simulations, it seems reasonable that one can create a probility distribution of the time to an earthquake above a certain size and use this to estimate the likelihood of the event over the lifespan of the plant.
The point of this discussion is that frequency model data is no more 'objective' than data used to build and apply models. Both involve subjective assumptions of the validity of the model. Note Baysian data analysis methods can be used to validate the various models and so we can assess their usefulness in the estimation process.
Finally, these safety estimates are used to set policy and, in particular, to make economic decisions about nuclear energy. The cost of a failure is huge. For example, an estimate of the cost of Fukushima failure is $184 Billion. The proponants of nuclear energy argue that they make economic sense, assuming they are safe enough and the new designs are much safer. Maybe so. But knowing they are safe enough will take much better analytics than we have seen to date.
I have the honor of giving one of the keynotes at the Conseg2011 conference this February in Bangalore. I have chosen a large, perhaps overly ambitious topic: "The Economics of Quality". Here is my conference proceedings document. My goals in preparing the paper and presentation is to make the case is that quality is fundamentally an economics concern and to suggest an overall approach for reasoning about when the software has sufficient quality for shipping. For those who have read some of my earlier entries, you will see have different my thinking on the topic differs from those who take a technical debt approach.
Anyhow, the brief proceedings paper is very high level and there is considerable work to filling in the details and validating the approach. This paper really is the beginning of a program that I believe, when carried out, will have great benefit to our industry. So, I would like to hear from anyone who has similar interests and perspectives. There must be some existing relevant research. Perhaps we find enough like-minded folks to build a community exploring the topic.
It should not be a surprise that I have been following the
BP oil spill with much interest. In fact, as I starting typing this entry, I
was watching the grilling of the BP CEO, Tony Heyward, by Congress. Rep. Stupak
is focusing on the BP’s risk management.
Some of you have read my earlier posting on my thoughts of
the BP decision process that led to the Deepwater Horizon blowout. So far,
information uncovered since that posting is remarkably consistent with my
earlier suppositions. In this entry I would like to step back a bit and discuss
what broader lessons might be learned from the incident. While it is all too
easy to fall into BP bashing, I would rather use this moment to reflect more
deeply on risk taking and creating value. (BTW, some of you might now that my
signature slogan is ‘Take risks, add value’.)
In our industry we create value primarily through the
efficient delivery of innovation. Delivering innovation, by definition,
requires investing in efforts without initial full knowledge of the effort
required and the value of the delivery. This incomplete information results in
uncertainties in the cost, effort, schedule of the projects and the value of
the delivered software and system, i.e. cost, schedule and value risk.
Deciding to drill an oil well also entails investing in an
effort with uncertain costs and value. In this case, the structure of the
subsystem and productivity of the well cannot be know with certainty before
drilling. As I pointed out in an earlier blog, a good definition of risk is
uncertainty in some quantifiable measure that matters to the business. So in
both our industry and oil drilling we deliberately assume risk to deliver
So, what can we learn from the BP incident? Briefly, one
creates value by genuinely managing risk. One creates the semblance of value
for a while by ignoring risk.
Assuming risk, investing in uncertain projects, provides the
opportunity for creating value. That value is actually realized by investing in
activities that reduce the risk. The model that shows the relationship is described
in this entry. So, reducing risk has economic value, but reducing risk
takes investment. In the end, the quality risk management is measured with a
return on investment calculation. This in turn requires a means to quantify and
in fact monetize risk.
I wonder what was there risk management approach was
followed by BP. A recent Wall Street Journal article suggested they used a risk
map approach – building a diagram with one axis a score of the ‘likelihood of
the risk’ and the other a score of the ‘severity of a failure’. So with this
method, they would score the risk of a blowout as very low (based on past
history) with a very high consequence. So, such a risk needs to be ‘mitigated’.
(Some actually multiply the scores to get to some absolute risk measure.) Their
mitigation was the installation of a blow-out preventer. They could then
confidently report they have executed their risk management plan. Note these
scores are at best notionally quantified and not monetized.
Paraphrasing my good colleague, Grady Booch (speaking of
certain architecture frameworks), risk maps is the semblance of risk
management. As pointed out by Douglis Hubbard in The Failure of Risk Management (and in
an earlier rant in this blog), this sort of risk management is not only
common, but dangerous: It is a sort of business common failure mode that leads
to bad outcomes. Also, Hubbard points out, useful risk management entails
quantification and calculation using probability distributions and Monte Carlo
analysis. I would add that since risk management in the end is about business
outcomes, risks need to be monetized as well as quantified. I am willing to bet
a good bottle of wine that BP did no such thing. Any takers? The business
common failure mode was over-reliance on the preventers, even though there are
several studies showing they are far from ‘failsafe’.
Further, it appears BP assumed risk by consistently taking
the cheaper, if riskier. design and procedure alternative, the one with greater
uncertainty in the outcome, even when the cost of an undesired, if unlikely,
outcome was possibly catastrophic.The laundry list of such decisions is long; some outlined in Congressman
Waxman’s letter to Tony Hayward.CEO’s of Shell and Exxon testified before congress that their companies
would have used a different, more costly designs and followed more rigorous
procedures. According the congressional and journalistic reports, this behavior
is BP standard operating procedure. So BP assumed risk by drilling wells but did not invest in reducing
For quite a while they got away with the approach of
assuming but really reducing risk, and appeared to be creating value as
reflected in stock and value and dividends to the investors. The BP management
raised the stock price from around $40/share in 2003 to a peak of around of
$74/share prior to the Deepwater Horizon incident. At this writing the stock is
trading at $32/share and the current dividend has been cancelled. Investors
might rightly wonder if there is another latent disaster and so discount the
apparent future profitability with the likelihood of unknown liabilities. The
total loss of stockholder value is over $100B, which is in the ballpark of the
eventual liability of BP. So, whatever approach BP used to manage risk failed.
BTW, some may recognize this same pattern in the management
of financial firms that participated in the subprime mortgage market. In that
case, they ‘mitigated risk’ by relying on the ratings agencies. Those who
actually built monetized models of the risk realized there was a great
opportunity to bet against the subprime mortgage lenders and made huge fortunes
(See, e.g. The Big Short: Inside the
Doomsday Machine by Michael Lewis .).
Readers of the blog will notice a recurrent theme is some of
the postings. It is essential that we assume and manage risk. To repeat a
favorite quote, “One cannot manage what one does not measure.” The risk map,
score methods, while common are insufficient to the needs of our industry; they
do not measure, nor really manage risk. We as a discipline need to step up to
quantifying, monetizing, and working off risk in order to be succeed as drivers
of innovation. We need to step up to the mathematical approach found in the
Douglas and Dan Savage’s (see
this posting) texts.
I came to this same realization probably a decade ago. I
held off at first because I had not deep enough understanding of how to
proceed, and I knew I would encounter great skepticism. I tested the waters in
2005 and posted my first
paper on the subject in 2006. I indeed received a great deal of skepticism
and resistance, but enough acceptance to go forward. I have learned some
important lessons from all that. In my next blog, I will share my experiences
of bringing more mathematical thinking to risk management for SSD.
This blog experiment seems to be working. The entries are
gietting around 100 visits and growing - good enough to keep at it. I have
found that writing the entries has given me the opportunity to clarify and
express my thoughts. This entry is a case in point.
We are deploying a BAO solution for the level 3 support
organizations in our IBM India Software Labs. That deployment provides a case
study in how to integrate two concepts I introduced in earlier blogs. This
entry is longer than the others. I hope you find it worth the wait and effort
In those previous entries, I discussed two frameworks for reasoning
These frameworks address different aspects of the problem of
using measures to achieve business goals by measuring the right things and
taking actions to respond to the measurements. In fact, these frameworks fit
together hand and glove.
that level 3 support teams provide fixes to defects found in delivered
code.Each of the teams deals with
an ongoing series of change requests (aka APARs, PMRs). An organization goal is to reduce the time to and cost of completion of these
requests. To achieve the goal, they are adopting some Rational-supported practices
and supporting tools. So the questions
that need to be answered are:
is the time trend of the time to complete of the change requests?
is the time trend of the cost to complete of the change requests?
each case how would I know that some improvement action resulting in
significant improvement in the trends?
comes the hard part: determining the measures
that answer the questions. The change requests come arrive somewhat
unpredictably. Each goes through the fix and release process and presumably
gets released in a patch or point release.So at any given time there is a population of currently open
and recently closed releases. The measures that answer the question are a time
trend of some statistic on the population on some population of change requests.
of the change requests requires different amount of time and effort to
complete. So to measure if the outcome is being achieved, one must reason
statistically: defining populations of requests, building the statistical
distribution of say time to complete for that population, defining the outcome
statistic for the distribution.So
we need to do things to define the measure:
the population of requests for each point on the trend line
the statistics on that population
keep it simple (as least as simple as possible), lets form the population by
choosing the set of change requests closed in some previous period, say the
previous month or quarter. To choose a statistic, one needs to look at the data
and pick the statistic that best answers the question. Most people assume the
mean of the time (or cost) to complete is the best choice.However, that choice is appropriate
when the shape of the histogram of the time to complete is centered on a mean
as is common in normally distributed data.
of the advantages of working in IBM is that we have lots of useful data. Inspection
of some APAR data of the time to complete from one of our teams in the IBM
Software Lab in India shows the distribution is not centered on a mean. and so reduction of the mean time to
complete is not the best measure of improvement.
have looked at literally tens of thousands of data points for time to complete
of change requests across all of IBM and have found the same distribution. For
you statistics savvy, it appears to be a Pareto Distribution, but statistical
analysis carried out by Sergey Zeltyn of IBM Research’s Haifa lab shows that
this distribution does not well fit any standard distribution. A possible
explanation is that is the time required to fix the defects is Pareto
distributed, but since the resources available to fix them is limited, the
actual time to complete is not pure Pareto. In any case, a practical way to
proceed is to choose a simple (non-parametric) measure: width of the head, i.e.
the time it takes to complete 80% of the distributions.
with this analysis in place, the organization decides to precisely specify the goal such as a 15% reduction in time and
cost to complete 80% of the requests closed each month.So the outcome measures are the time it took to close and costs of 80% of
the requests closed each month.
chosen this measures, we are ready to identify the data sources and instrument the measures. So far so good. But wait, we still need
to answer questions 3.
I mentioned, in order to improve the outcome measure and achieve the goals, the
lab teams have agreed to adopt appropriate Rational practices and tools to automate certain processes. The
practices were selection using the Rational MCIF Value Tractability Trees (a
development causal analysis methed). Adopting and maturing the practices and
their automations are the controls. Some
control examples are automating the regression test and build process, and the adoption
of a stricter unit test discipline to reduce time lost in broken builds. There
are control mechanisms with associated control
measures such as time-to-build, regression test time-to-complete, percent
of code unit-tested, and a self-assessment by the team of their adoption of testing
and build practices.
answer question 3, we need statistical analytics
to determine if the changes in the control measures have had a significant impact
on the outcome measures. Our Research staff has settled on those analytics, but
I will discuss that in a later entry. This entry is already too long.
case study is both reasonably straightforward and far from trivial. It does
show as promised that GQM(AD) and Outcome and Controls work together. I leave
you all with a thought problem. How would you apply the pattern to teams
developing new features to existing applications?
Today, April 1, seems like a good day to bring forward an important new idea. In fact, I think this may be the next big thing.
One of the well-understood problems with software development project management is that it is often impossible to completely specify the complete work breakdown with certainty. The longer the project and the more innovative the project, the more uncertain the work breakdown items. This is addressed in iterative, agile planning by identifying the summary work items and then adding detail as the project evolves. Another source of uncertainty is the dependency between the summary items. This uncertainty in turn makes critical path analysis for such programs problematic. In fact there is a whole ensemble of project critical paths, each with some likelihood. For the physics literate, this ensemble of paths is much like Feynman Path Integrals in quantum theory. The math is pretty hairy (see this elementary description). Fortunately, as Feyman also pointed out, one can simulate quantum mechanics with quantum computers. I am no expert in quantum computing, but even so I have a proposal: Quantum Informed Projects (QuIPs). The idea is to represent work items as QItems using QBits from quantum computing.. Then we can represent the project as a set of entangled QItems and using a suitablly large quantum computer to calculate the wave function for the critical path.
My understanding is that we do not yet have large enough enough quantum computers to make this practical. However, the same is true for implementing other useful quantum algorithms (see this example). So we can start by building algorthms. There is no time like the present (not accounting for the quantum uncertainty of measuing time) So on this special day, lets turn our attention to QuiPs.
Last week I briefed an IBM customer on some of our recent thoughts on the role of estimation in business analytics. I feel the briefing was not entirely successful. The customer asked about a use of estimation I had not considered previously My first reaction is that the approach desired by the customer was 'not possible'. I then realized it might work in some cases, but I was emotionally opposed to the idea. Then I realized I should not let my emotions interfere and think through the question and its implications. Hence this blog:
In Agile projects or in maintenance organizations, workers are assigned 'work items'. Often workers are asked to estimate the time it will take to complete the work item. Asking an employee to commit to a time-to-complete is both reasonable and unreasonable. Team leads and managers need to have some idea when the current work will be done to plan resource assignments, manage content, make commitments and the like. The management also wants to identify the more reliable, productive
workers. After all, development teams are meritocracies. It is right
that the more productive employees are identified and rewarded. So we
need a way for employees to make reasonable estimates while providing a
way for (cliche aler!) the cream to rise. It is unreasonable in that the worker is asked to guess and, in fact, commit to a time to complete. In some cases, the worker may be confident in the estimate. In some cases, there will be less confidence for a variety of good reasons: The task may have dependencies, the solution to fixing a bug report may not be apparent and so on. So asking to commit to a fixed time is unreasonable and measuring the worker against these commitments is oppressive. Under these circumstances, the intelligent worker will pad the estimate so to insure that the commitment is meant. This unintended consequence of asking for the duration is longer than needed estimates and, since people work to the commitments, lower productivity.
In the Agile Planning feature shipped in Rational Team Concert (RTC), we provided means to somewhat mitigate this phenomenon. RTC provides the mechanism for letting the worker enter the best case, likely, and worse case for the time to complete the task. This way the worker can enter numbers that reflect her or his uncertainty. This supports more reasonable commitments and less adversarial conversations. In the tool, the numbers are rolled up using a Monte Carlo algorithm that accounts for task dependencies and shows the likelihood of completing the iteration or scrum. A benefit of this approach is that the worker can be held accountable not to a single value, but to staying within the range of estimate and so need there is no need for padding. There remains the problem of knowing if the estimate is reasonable and how to find the meritorious, which finally brings us to the client request.
The client asked if we could turn this around. Could we use some sort of algorithm to compute the expected time to complete for the task? In other words, the system tells the worker the amount of time it should take to complete the task and the worker then is measured against this expectation. As I said at the beginning of the blog, my first reaction is 'probably not' and this is undesirable. Lets dive deeper. First, like the RTC agile planner, this computation can and should include some best, likely, and worse case in order not to be overly oppressive and roll up to show iteration and/or project schedule risk. Further, building out this approach raises the following statistical question: "Can we sort work items into equivance classes of similar enough tasks, so that we use these classes as populations to build time-to-complete statistics?" If we could do this, then we could properly set expectations on the worker, detect the superior and inferior workers, reward the former and better train the latter. Further, we could measure improvements over time in the execution of the tasks due to team or proecess improvements. All good things. However, this approach needs to be implemented very carefully and not over applied or it could lead to more oppresion and untended consquence.
I suspect the more creative architecture and design tasks simply do lend themselves to this sort of analysis. So teams that create new platforms and build new applications will rely more on expert opinion for the estimates and not predictions solely based on historical data. Not everyone would agree with this. For example, there are some estimation tools provided by various vendors that in fact do try to estimate design and architecture tasks effort and duration by using parametric models or classifications. However, there is so much variation in the amount of novelty of the efforts and the team skill and experience, the uncertainties in the estimates are large enough that they that they should be applied to projects with great care and to individuals not at all.
On the other hand, most of what development organizations do is more routine and for those tasks something along the lines of what the customer asked for might be possible. One would need a way of characterizing the different task classes, track the times-to-complete and do the statistical measures. With this in place, one could explore not only automated task estimates, but also process optimzation by what I believe is a novel application of statistical process control.
In summary I believe we need to pursue task analytics and estimation, but I have serious misgivings. Automated analytics-based business processes can go seriously wrong. We need to ensure that some judgment and subjectivity is part of the process. The misuse of analytics in the subprime mortgage business is a case in point
I realize something along the lines I am describing may already be available. Has anyone heard of a tool that supports this method?
Yesterday, I was at the Conference on System Engineering Research (CSER) held this year at Stevens Institute. I sat through a talk which stimulated my curmudgeon tendencies. In the spirit of hopefully generating some contraversy, I will not hold back.
The talk was about an expert-system based engineering risk management system. Essentially, the authors got a set of experts together to identify catagories of risks (people, delivery, product ...), risks in the categories, and a method for identifying level of risk and their consequence and then summing the products of the levels and the consequences. The end is the total amount of category risk. Looking at the output is supposed to give you insight of the overall program risk and the contributing risks.
My problem is that I cannot parse the last sentence. In fact I do not understand terms like "program risk" and say "people risk". There may be a clash of cultures here; to many those terms seem reasonable.
My argument starts here: One can ask 'What is my risk of going over budget?' or 'what is my risk of missing the delivery date?' The answers to these sort of questions are answered using stardard business analytics. See, for example, Mun's text on risk analytsis that defines risk as statistical uncertainty of a quantity that matters. For example, 'time to complete' is a quantity that does matter to a project. The uncertainty in making the date can be measuresd as the variance (or standard deviation) of the estimate of the time-to-complete. (Note, for the math aware, time-to-complete is what the statisticians call a continuous random variable.) So the answer to the question, 'what is my schedule risk?' has an unambiguous, quantified answer. What is 'my people risk' has no such answer. In fact, 'people risk' is not a concept defined in business analytics.
Of course, it does make sense to ask what contributes to the schedule risk. One might fear that the inability to staff the project contributes to the schedule risk. Fair enough. In my mind, that does not make staffing a 'risk', but say a schedule risk factor.
I am not sure why I am so adamant about this, but I am. It could be that I believe that the less precise use and measurement of risk is holding our industry back.
Anyone want to comment or defend the so-called risk management practice underlying the talk I found so annoying ?
I would like to build on the theme of reasoning about what to measure. The goal of business analytics is to track what matters to the organization (what it is you are trying to manage) and respond to the measure in some way to gain improvement. The science of measuring outcomes and In manufacturing and some service delivery domains is statistical process control (SPC), SPC lies at the heart of the Six Sigma movement. Even so, there will be no need to have a 6-six sigma belt to participate in this discussion . While there is reason to believe that not all of the Six Sigma practices do not apply all that well to our domain, the idea of tracking outcomes, applying statistical analysis to detect change change, and applying some sort of controls to affect the change applies in all business domains, including software and system development and delivery.
Briefly then, the outcomes are the operational goals and controls are the actions you take to achieve the outcomes, So naturally we need too kinds of measures.
Outcome measures - tracking the measures of effectiveness of the business organization
Control tracking whether - tracking whether the controls are in fact enacted.
Here is a thought experiment:. Imagine there is a potato chip factory with an operational goal of achieving the right amount of salt on its chips. There is an target amount and the factory needs to stay within small limits for market acceptance. So everyday they grab a sample of chips and record the saltiness. They apply salt by running the recently deep fried chips under a salt shaker. The two controls are the frequency of the shaker and the speed of the belt. Both the shaker frequency and belt speed are measured to confirm the controls are properly responded to. In this example the saltiness is the outcome measure and the shaker frequency and belt speed are control measures.
The simplest way, for me at least, to think about SPC is to measures trends in outcome measures and control measures to determine the likelihood that the controls are in fact affecting the outcome. In our potato chip example we might find that we cannot control the outcome well enough by the shaker and belt controls. In that case, we might look for some other factor to control, say the factory humidity.
If you look at many measurement programs in software and system you often find that outcome and measures are confused. In fact even sorting the measures into the two buckets is hard. No wonder measured process improvement for our domain has been so hard.Anyone have good examples of measurement patterns or antipatterns of measuring controls and outcomes?
In a conversation with a development lab productivity team, I was reminded that the first challenges software and system organizations face when starting an improvement program is 'what to measure'. In particular, some organizations start with what is easy to measure with the understandable thought "we need to start somewhere". I have found this approach tends not get traction. Over the years I have settled on some principles that seem to apply:
Be careful that the measurement drives the intended behavior. Organizations respond to how they are measured. This is useful in that measurement provides a way to change behavior. On the other hand, it is dangerous in that measurements can drive undesired behavior. For example, insisting that every project exactly meet the initial estimate of schedule, cost, and content will lead to risk aversion and drive innovation out of the organization.
You can't manage what you don't measure (taken from Lord Kelvin) -This is one of those universal truths that applies to every field.
Don't measure what you do not intend to manage - Measuring without a way to respond to measurement adds overhead and lowers productivity
For development organizations, one size does not fit all -The measures need need to reflect the mixture of work the team is doing. For example measurements for a team doing level 3 maintenance work may not apply to a team building a new platform.
The Einstein test, make the measures as simple as possible, but no simpler.
I suspect there some other principles that should be added. But these are a good start.
In a later blog, I will discuss levels of measurement.
I am not a natural blogger, but do like to share thoughts with a broad community of folks with shared interests.
My longterm passion is how to enable organizations develop and deliver software and systems more effectively. I am thoroughly convinced that well organized, governed organizations not only deliver value to the businesses, but also enhance the lives of the staff. In such organizations, people get to work on cool things, innovate, work well with colleagues, and build a legacy by being a part of making things of value.I also am a mathematician by training. I am especially energized by the opportunity to apply mathematical reasoning to the improvement of software and system organizations. My current assignment is to lead the Business Analytics and Optimization (BAO) strategy for Rational.
So, with this blog, from time to time, I will share my thoughts on BAO for software and system organizations. I hope this blog will be a catalyst for building a community. I especially look forward to comments and conversations.