firstname.lastname@example.org 110000CH6X Tags:  procees analytics delivery and optimization development bao business 3,967 Views
I am not a natural blogger, but do like to share thoughts with a broad community of folks with shared interests.
My longterm passion is how to enable organizations develop and deliver software and systems more effectively. I am thoroughly convinced that well organized, governed organizations not only deliver value to the businesses, but also enhance the lives of the staff. In such organizations, people get to work on cool things, innovate, work well with colleagues, and build a legacy by being a part of making things of value.I also am a mathematician by training. I am especially energized by the opportunity to apply mathematical reasoning to the improvement of software and system organizations. My current assignment is to lead the Business Analytics and Optimization (BAO) strategy for Rational.
So, with this blog, from time to time, I will share my thoughts on BAO for software and system organizations. I hope this blog will be a catalyst for building a community. I especially look forward to comments and conversations.
So stay tuned.
In a conversation with a development lab productivity team, I was reminded that the first challenges software and system organizations face when starting an improvement program is 'what to measure'. In particular, some organizations start with what is easy to measure with the understandable thought "we need to start somewhere". I have found this approach tends not get traction. Over the years I have settled on some principles that seem to apply:
I suspect there some other principles that should be added. But these are a good start.
In a later blog, I will discuss levels of measurement.
I would like to build on the theme of reasoning about what to measure. The goal of business analytics is to track what matters to the organization (what it is you are trying to manage) and respond to the measure in some way to gain improvement. The science of measuring outcomes and In manufacturing and some service delivery domains is statistical process control (SPC), SPC lies at the heart of the Six Sigma movement. Even so, there will be no need to have a 6-six sigma belt to participate in this discussion . While there is reason to believe that not all of the Six Sigma practices do not apply all that well to our domain, the idea of tracking outcomes, applying statistical analysis to detect change change, and applying some sort of controls to affect the change applies in all business domains, including software and system development and delivery.
Briefly then, the outcomes are the operational goals and controls are the actions you take to achieve the outcomes, So naturally we need too kinds of measures.
The simplest way, for me at least, to think about SPC is to measures trends in outcome measures and control measures to determine the likelihood that the controls are in fact affecting the outcome. In our potato chip example we might find that we cannot control the outcome well enough by the shaker and belt controls. In that case, we might look for some other factor to control, say the factory humidity.
If you look at many measurement programs in software and system you often find that outcome and measures are confused. In fact even sorting the measures into the two buckets is hard. No wonder measured process improvement for our domain has been so hard.Anyone have good examples of measurement patterns or antipatterns of measuring controls and outcomes?
Again stay tuned for more....
Some business analytics for development organization entrail measureing return on investment for software and system programs. I just finished a long version of our RoI approach. This is a link to a more technical version of the paper..
I have mentioned in the first posting, I am still getting the hang of blogging. I guess one use of blogs is to share what on my mind while staying in the neighborhood of the topic of analytics. So, I have been putting a lot of thought to Toyota's diemma about how to deal with the reports of dangerous acceleration in their cars. The recent reports of Prius incidents (see this article in the New York Times) confirmed some of my earlier suspicions and hence this blog.
First, I need to come clean; all I know is from news accounts. I have had no contact with Toyota or any IBMers working with Toyota. Further, I need to say the the opinions here, in my opinion not controversial, are my own and do not reflect any IBM position.
So what do we know:
Now, say there is one chance in a million miles of driving of the latent defects manifesting. They may be impossible to find with standard testing and will inevitably happen every so often to drivers. This is the standard insight that with large volumes unlikely events become inevitable. So with Toyota's large sales, they may be the victim of their success.
The avionics community has developed a discipline around safety-critical software. There are design and model testing methods to validate that the embedded software is good enough to stake people's lives on the code running correctly. (There is a good article is the latest Communications of the ACM on model checking for avionis) It seems Toyota and the entire auto industry needs to adopt these safety-critical disciplines going forward. The cost of these practices is overshadowed by the costs of costs of the highly publicized incidents, the suits, and other liability.
Yesterday, I was at the Conference on System Engineering Research (CSER) held this year at Stevens Institute. I sat through a talk which stimulated my curmudgeon tendencies. In the spirit of hopefully generating some contraversy, I will not hold back.
The talk was about an expert-system based engineering risk management system. Essentially, the authors got a set of experts together to identify catagories of risks (people, delivery, product ...), risks in the categories, and a method for identifying level of risk and their consequence and then summing the products of the levels and the consequences. The end is the total amount of category risk. Looking at the output is supposed to give you insight of the overall program risk and the contributing risks.
My problem is that I cannot parse the last sentence. In fact I do not understand terms like "program risk" and say "people risk". There may be a clash of cultures here; to many those terms seem reasonable.
My argument starts here: One can ask 'What is my risk of going over budget?' or 'what is my risk of missing the delivery date?' The answers to these sort of questions are answered using stardard business analytics. See, for example, Mun's text on risk analytsis that defines risk as statistical uncertainty of a quantity that matters. For example, 'time to complete' is a quantity that does matter to a project. The uncertainty in making the date can be measuresd as the variance (or standard deviation) of the estimate of the time-to-complete. (Note, for the math aware, time-to-complete is what the statisticians call a continuous random variable.) So the answer to the question, 'what is my schedule risk?' has an unambiguous, quantified answer. What is 'my people risk' has no such answer. In fact, 'people risk' is not a concept defined in business analytics.
Of course, it does make sense to ask what contributes to the schedule risk. One might fear that the inability to staff the project contributes to the schedule risk. Fair enough. In my mind, that does not make staffing a 'risk', but say a schedule risk factor.
I am not sure why I am so adamant about this, but I am. It could be that I believe that the less precise use and measurement of risk is holding our industry back.
Anyone want to comment or defend the so-called risk management practice underlying the talk I found so annoying ?
Last week I briefed an IBM customer on some of our recent thoughts on the role of estimation in business analytics. I feel the briefing was not entirely successful. The customer asked about a use of estimation I had not considered previously My first reaction is that the approach desired by the customer was 'not possible'. I then realized it might work in some cases, but I was emotionally opposed to the idea. Then I realized I should not let my emotions interfere and think through the question and its implications. Hence this blog:
In Agile projects or in maintenance organizations, workers are assigned 'work items'. Often workers are asked to estimate the time it will take to complete the work item. Asking an employee to commit to a time-to-complete is both reasonable and unreasonable. Team leads and managers need to have some idea when the current work will be done to plan resource assignments, manage content, make commitments and the like. The management also wants to identify the more reliable, productive workers. After all, development teams are meritocracies. It is right that the more productive employees are identified and rewarded. So we need a way for employees to make reasonable estimates while providing a way for (cliche aler!) the cream to rise. It is unreasonable in that the worker is asked to guess and, in fact, commit to a time to complete. In some cases, the worker may be confident in the estimate. In some cases, there will be less confidence for a variety of good reasons: The task may have dependencies, the solution to fixing a bug report may not be apparent and so on. So asking to commit to a fixed time is unreasonable and measuring the worker against these commitments is oppressive. Under these circumstances, the intelligent worker will pad the estimate so to insure that the commitment is meant. This unintended consequence of asking for the duration is longer than needed estimates and, since people work to the commitments, lower productivity.
In the Agile Planning feature shipped in Rational Team Concert (RTC), we provided means to somewhat mitigate this phenomenon. RTC provides the mechanism for letting the worker enter the best case, likely, and worse case for the time to complete the task. This way the worker can enter numbers that reflect her or his uncertainty. This supports more reasonable commitments and less adversarial conversations. In the tool, the numbers are rolled up using a Monte Carlo algorithm that accounts for task dependencies and shows the likelihood of completing the iteration or scrum. A benefit of this approach is that the worker can be held accountable not to a single value, but to staying within the range of estimate and so need there is no need for padding. There remains the problem of knowing if the estimate is reasonable and how to find the meritorious, which finally brings us to the client request.
The client asked if we could turn this around. Could we use some sort of algorithm to compute the expected time to complete for the task? In other words, the system tells the worker the amount of time it should take to complete the task and the worker then is measured against this expectation. As I said at the beginning of the blog, my first reaction is 'probably not' and this is undesirable. Lets dive deeper. First, like the RTC agile planner, this computation can and should include some best, likely, and worse case in order not to be overly oppressive and roll up to show iteration and/or project schedule risk. Further, building out this approach raises the following statistical question: "Can we sort work items into equivance classes of similar enough tasks, so that we use these classes as populations to build time-to-complete statistics?" If we could do this, then we could properly set expectations on the worker, detect the superior and inferior workers, reward the former and better train the latter. Further, we could measure improvements over time in the execution of the tasks due to team or proecess improvements. All good things. However, this approach needs to be implemented very carefully and not over applied or it could lead to more oppresion and untended consquence.
I suspect the more creative architecture and design tasks simply do lend themselves to this sort of analysis. So teams that create new platforms and build new applications will rely more on expert opinion for the estimates and not predictions solely based on historical data. Not everyone would agree with this. For example, there are some estimation tools provided by various vendors that in fact do try to estimate design and architecture tasks effort and duration by using parametric models or classifications. However, there is so much variation in the amount of novelty of the efforts and the team skill and experience, the uncertainties in the estimates are large enough that they that they should be applied to projects with great care and to individuals not at all.
On the other hand, most of what development organizations do is more routine and for those tasks something along the lines of what the customer asked for might be possible. One would need a way of characterizing the different task classes, track the times-to-complete and do the statistical measures. With this in place, one could explore not only automated task estimates, but also process optimzation by what I believe is a novel application of statistical process control.
In summary I believe we need to pursue task analytics and estimation, but I have serious misgivings. Automated analytics-based business processes can go seriously wrong. We need to ensure that some judgment and subjectivity is part of the process. The misuse of analytics in the subprime mortgage business is a case in point
I realize something along the lines I am describing may already be available. Has anyone heard of a tool that supports this method?
Today, April 1, seems like a good day to bring forward an important new idea. In fact, I think this may be the next big thing.
One of the well-understood problems with software development project management is that it is often impossible to completely specify the complete work breakdown with certainty. The longer the project and the more innovative the project, the more uncertain the work breakdown items. This is addressed in iterative, agile planning by identifying the summary work items and then adding detail as the project evolves. Another source of uncertainty is the dependency between the summary items. This uncertainty in turn makes critical path analysis for such programs problematic. In fact there is a whole ensemble of project critical paths, each with some likelihood. For the physics literate, this ensemble of paths is much like Feynman Path Integrals in quantum theory. The math is pretty hairy (see this elementary description). Fortunately, as Feyman also pointed out, one can simulate quantum mechanics with quantum computers. I am no expert in quantum computing, but even so I have a proposal: Quantum Informed Projects (QuIPs). The idea is to represent work items as QItems using QBits from quantum computing.. Then we can represent the project as a set of entangled QItems and using a suitablly large quantum computer to calculate the wave function for the critical path.
My understanding is that we do not yet have large enough enough quantum computers to make this practical. However, the same is true for implementing other useful quantum algorithms (see this example). So we can start by building algorthms. There is no time like the present (not accounting for the quantum uncertainty of measuing time) So on this special day, lets turn our attention to QuiPs.
First, I am pleased that many saw the humor in the April fools posting. That said, I wonder if there will be ever quantum project management. Also, I fear this blog lacks humor. I will do what I can, but there is only so much that can be done to spice up the topic of analytics for software and system organizations. So, back to the serious stuff.
But first a joke that I believe that dates back to vaudeville: Onstage, there is a streetlight. Under the streetlight, there is a man crawling around on hands and knees. A policeman walks up and asks what he is doing. The man says he is looking for his keys. The policeman asks if he is sure he lost them here. The man answers, "No, in fact I lost them down the street." The policeman asks why is he looking under the light. The man answers, "The light is brighter here."
OK, not so funny. So what's the point? A while back, I was discussing a client's management program with a colleague (who will remain nameless and I hope is reading the blog). I pointed it would not serve any purpose. My colleague answered "Well at least they are measuring something." I retorted, "First, you need to figure out what you need to measure, then figure out how to do the analysis and get the data." We left it at that. More generally, software and system organizations often measure what is easy, not what they need. They look where the light is brightest. We still have the question how to specify the needed measures, analytics, and data collection program.
In an earlier entry, I proposed some measurement principles. While these principles are sound for assessing a measurement and analytics program, they do not provide operation guidance for defining the set of measures, associated analytics, and data. What is also needed is the analytics version of a requirements analysis. Last Friday two colleagues (named Clay Williams and Peri Tarr who I believe do read the blog) introduced me to the Goal Question Measure (GQM) method. This method has been extended in various ways such as GQM+Strategy.
I have seen the method applied. It looks much like functional decomposition and so it is a requirements analysis technique for analytics solutions. I think it should be extended to include identification of the data sources. So we would have GQMAD (not kidding), my spin on the main idea:
For my waterfallphobic friends, I share the concern. Building an analytics solution this way should be more iterative than is described above. Probably something like the Unified Process can be applied using GQMAD as a good requirements practice.
Anyone out there with GQM experience they would like to share?
email@example.com 110000CH6X 2,052 Views
This blog experiment seems to be working. The entries are gietting around 100 visits and growing - good enough to keep at it. I have found that writing the entries has given me the opportunity to clarify and express my thoughts. This entry is a case in point.
We are deploying a BAO solution for the level 3 support organizations in our IBM India Software Labs. That deployment provides a case study in how to integrate two concepts I introduced in earlier blogs. This entry is longer than the others. I hope you find it worth the wait and effort to read.
In those previous entries, I discussed two frameworks for reasoning
These frameworks address different aspects of the problem of using measures to achieve business goals by measuring the right things and taking actions to respond to the measurements. In fact, these frameworks fit together hand and glove.
Recall that level 3 support teams provide fixes to defects found in delivered code. Each of the teams deals with an ongoing series of change requests (aka APARs, PMRs). An organization goal is to reduce the time to and cost of completion of these requests. To achieve the goal, they are adopting some Rational-supported practices and supporting tools. So the questions that need to be answered are:
1. What is the time trend of the time to complete of the change requests?
2. What is the time trend of the cost to complete of the change requests?
3. In each case how would I know that some improvement action resulting in significant improvement in the trends?
Now comes the hard part: determining the measures that answer the questions. The change requests come arrive somewhat unpredictably. Each goes through the fix and release process and presumably gets released in a patch or point release. So at any given time there is a population of currently open and recently closed releases. The measures that answer the question are a time trend of some statistic on the population on some population of change requests.
Each of the change requests requires different amount of time and effort to complete. So to measure if the outcome is being achieved, one must reason statistically: defining populations of requests, building the statistical distribution of say time to complete for that population, defining the outcome statistic for the distribution. So we need to do things to define the measure:
1. Specify the population of requests for each point on the trend line
2. Specify the statistics on that population
To keep it simple (as least as simple as possible), lets form the population by choosing the set of change requests closed in some previous period, say the previous month or quarter. To choose a statistic, one needs to look at the data and pick the statistic that best answers the question. Most people assume the mean of the time (or cost) to complete is the best choice. However, that choice is appropriate when the shape of the histogram of the time to complete is centered on a mean as is common in normally distributed data.
One of the advantages of working in IBM is that we have lots of useful data. Inspection of some APAR data of the time to complete from one of our teams in the IBM Software Lab in India shows the distribution is not centered on a mean. and so reduction of the mean time to complete is not the best measure of improvement.
We have looked at literally tens of thousands of data points for time to complete of change requests across all of IBM and have found the same distribution. For you statistics savvy, it appears to be a Pareto Distribution, but statistical analysis carried out by Sergey Zeltyn of IBM Research’s Haifa lab shows that this distribution does not well fit any standard distribution. A possible explanation is that is the time required to fix the defects is Pareto distributed, but since the resources available to fix them is limited, the actual time to complete is not pure Pareto. In any case, a practical way to proceed is to choose a simple (non-parametric) measure: width of the head, i.e. the time it takes to complete 80% of the distributions.
So with this analysis in place, the organization decides to precisely specify the goal such as a 15% reduction in time and cost to complete 80% of the requests closed each month. So the outcome measures are the time it took to close and costs of 80% of the requests closed each month.
Having chosen this measures, we are ready to identify the data sources and instrument the measures. So far so good. But wait, we still need to answer questions 3.
As I mentioned, in order to improve the outcome measure and achieve the goals, the lab teams have agreed to adopt appropriate Rational practices and tools to automate certain processes. The practices were selection using the Rational MCIF Value Tractability Trees (a development causal analysis methed). Adopting and maturing the practices and their automations are the controls. Some control examples are automating the regression test and build process, and the adoption of a stricter unit test discipline to reduce time lost in broken builds. There are control mechanisms with associated control measures such as time-to-build, regression test time-to-complete, percent of code unit-tested, and a self-assessment by the team of their adoption of testing and build practices.
To answer question 3, we need statistical analytics to determine if the changes in the control measures have had a significant impact on the outcome measures. Our Research staff has settled on those analytics, but I will discuss that in a later entry. This entry is already too long.
This case study is both reasonably straightforward and far from trivial. It does show as promised that GQM(AD) and Outcome and Controls work together. I leave you all with a thought problem. How would you apply the pattern to teams developing new features to existing applications?