This blog experiment seems to be working. The entries are gietting around 100 visits and growing - good enough to keep at it. I have found that writing the entries has given me the opportunity to clarify and express my thoughts. This entry is a case in point.

We are deploying a BAO solution for the level 3 support organizations in our IBM India Software Labs. That deployment provides a case study in how to integrate two concepts I introduced in earlier blogs. This entry is longer than the others. I hope you find it worth the wait and effort to read.

These frameworks address different aspects of the problem of using measures to achieve business goals by measuring the right things and taking actions to respond to the measurements. In fact, these frameworks fit together hand and glove.

Recall
that level 3 support teams provide fixes to defects found in delivered
code. Each of the teams deals with
an ongoing series of change requests (aka APARs, PMRs). An organization **goal** is to reduce the time to and cost of completion of these
requests. To achieve the goal, they are adopting some Rational-supported practices
and supporting tools. So the **questions**
that need to be answered are:

1. What is the time trend of the time to complete of the change requests?

2. What is the time trend of the cost to complete of the change requests?

3. In each case how would I know that some improvement action resulting in significant improvement in the trends?

Now
comes the hard part: determining the **measures**
that answer the questions. The change requests come arrive somewhat
unpredictably. Each goes through the fix and release process and presumably
gets released in a patch or point release. So at any given time there is a population of currently open
and recently closed releases. The measures that answer the question are a time
trend of some statistic on the population on some population of change requests.

Each of the change requests requires different amount of time and effort to complete. So to measure if the outcome is being achieved, one must reason statistically: defining populations of requests, building the statistical distribution of say time to complete for that population, defining the outcome statistic for the distribution. So we need to do things to define the measure:

1. Specify the population of requests for each point on the trend line

2. Specify the statistics on that population

To keep it simple (as least as simple as possible), lets form the population by choosing the set of change requests closed in some previous period, say the previous month or quarter. To choose a statistic, one needs to look at the data and pick the statistic that best answers the question. Most people assume the mean of the time (or cost) to complete is the best choice. However, that choice is appropriate when the shape of the histogram of the time to complete is centered on a mean as is common in normally distributed data.

One of the advantages of working in IBM is that we have lots of useful data. Inspection of some APAR data of the time to complete from one of our teams in the IBM Software Lab in India shows the distribution is not centered on a mean. and so reduction of the mean time to complete is not the best measure of improvement.

We have looked at literally tens of thousands of data points for time to complete of change requests across all of IBM and have found the same distribution. For you statistics savvy, it appears to be a Pareto Distribution, but statistical analysis carried out by Sergey Zeltyn of IBM Research’s Haifa lab shows that this distribution does not well fit any standard distribution. A possible explanation is that is the time required to fix the defects is Pareto distributed, but since the resources available to fix them is limited, the actual time to complete is not pure Pareto. In any case, a practical way to proceed is to choose a simple (non-parametric) measure: width of the head, i.e. the time it takes to complete 80% of the distributions.

So
with this analysis in place, the organization decides to precisely specify the **goal** such as a 15% reduction in time and
cost to complete 80% of the requests closed each month. So the **outcome measures** are the time it took to close and costs of 80% of
the requests closed each month.

Having
chosen this measures, we are ready to identify the **data** sources and instrument the measures. So far so good. But wait, we still need
to answer questions 3.

As
I mentioned, in order to improve the outcome measure and achieve the goals, the
lab teams have agreed to adopt appropriate Rational practices and tools to automate certain processes. The
practices were selection using the Rational MCIF Value Tractability Trees (a
development causal analysis methed). Adopting and maturing the practices and
their automations are the **controls**. Some
control examples are automating the regression test and build process, and the adoption
of a stricter unit test discipline to reduce time lost in broken builds. There
are control mechanisms with associated **control
measures** such as time-to-build, regression test time-to-complete, percent
of code unit-tested, and a self-assessment by the team of their adoption of testing
and build practices.

To
answer question 3, we need statistical **analytics**
to determine if the changes in the control measures have had a significant impact
on the outcome measures. Our Research staff has settled on those analytics, but
I will discuss that in a later entry. This entry is already too long.

This case study is both reasonably straightforward and far from trivial. It does show as promised that GQM(AD) and Outcome and Controls work together. I leave you all with a thought problem. How would you apply the pattern to teams developing new features to existing applications?