Modified on by ScottWill
A couple of years ago a well-known franchise experienced a significant computer outage that affected hundreds of their stores throughout the U.S. The impact of the outage lasted for nearly a day, and the problem made the news headlines all over the place. Obviously these are the kinds of problems that businesses don’t want to have happen to them. Loss of sales, unhappy customers, and bad press don’t make for a good day…
With that in mind, let’s assume that it had been years since this franchise had experienced any kind of outage. Do you think customers (as well as shareholders) would have been happy to hear the company’s executives say something like, “But it’s been years since we’ve had any kind outage!” Probably not… Was anyone focused on how long it had been since the last outage, or were folks more interested in the “here and now?” Given the fact that the outage made the headlines, it’s obvious that the lengthy delay in getting back to normal was the primary concern.
Now, let’s look at a hypothetical situation where a company experiences several computer outages every day, but the length of each outage is only one second (or less). Would anyone even notice? Probably not… Because the recovery time was so fast, the only folks who would likely even be aware of any of the outages would be the operations folks, and only then because they were analyzing the logs from the monitoring tools in place.
Hopefully you can see where I’m going with this – a company that experiences a failure only once every couple of years (i.e. having a very good Mean Time to Failure), but one that takes a day or more to recover from an outage (i.e., having a very poor Mean Time to Recovery), is likely to have more negative results than a company that has a very poor MTTF but that has an extremely good MTTR.
So, just to be clear, I’m not suggesting that MTTF should be ignored – referring to my second example, I’d really like to know why outages are occurring so frequently, and then work to reduce the number of outages (even if they do only last a second or so). What I am suggesting is that MTTR should be one of your “front and center” metrics. The faster you can recover from an outage, the less noticeable the outage will be and, therefore, the more negligible the impact.
If you haven’t started measuring MTTR for your offering, please allow me to suggest that you need to begin doing so ASAP. And, once you have MTTR measurements in place, then begin working to improve them (no matter how good they may be right now).
Modified on by ScottWill
I have been a strong advocate over the years of using Story Points to track team velocity. In spite of many of the self-inflicted problems teams face when using Story Points (e.g., trying to initially assign some measure of time to a Story Point, or trying to normalize Story Points across teams, or confusing Story Points and task-hours, etc.), when used correctly, Story Points (and the velocity calculations they enable) make project tracking and projection incredibly easy and relatively accurate as well. (As a side note, check out the following article by Jeff Sutherland: link.)
What I’d like to cover in this posting is that using Story Points is not the only way to track a team’s velocity. For teams that have been writing User Stories for a while, and who have gotten good at breaking their work down into small enough chunks such that one or more stories can be completed (“Done!”) in an iteration, I would like to recommend that they forego the effort of sizing User Stories with Story Points. They’ve already shown that they know how to start with an Epic, and break an Epic down into User Stories that can be completed in an iteration. Thus, they’ve demonstrated that they have a relative amount of uniformity in the way that they come up with their backlog of User Stories – their stories are all in the same range regarding the amount of time and effort required to complete each one of them.
With that in mind, such teams could simply measure velocity based on completed User Stories – the principle is the same as Story Points, but mature teams can fore-go the effort required to size each User Story with Story Points (e.g., through Planning Poker).
Let me give a couple of examples – the first one using Story Points. Team A has 50 Story Point’s worth of User Stories on its backlog, and they’ve demonstrated over the course of time that they can generally complete about 10 Story Point’s worth of stories in an iteration (velocity = 10). In order to determine how long it will take to complete the remaining stories, it’s easy to divide 50 by 10 and come up with an estimate of around 5 iterations. If a customer is asking how long before the team can get to a feature that the customer is interested in, and the User Story associated with that feature appears about 30 points down the rank-ordered backlog of stories, then the team can tell the customer that it will be about 3 iterations (30 divided by 10).
The second example is that Team B has 10 stories on its backlog and they’ve demonstrated that they typically complete about 2 User Stories every iteration (velocity = 2). To determine how long it will take to complete the remaining stories, divide 10 by 2 and come up with an estimate of 5 iterations. If a customer is asking how long before the team can get to a feature that the customer is interested in, and the User Story associated with that feature appears about 6 stories down the rank-ordered backlog of stories, then the team can tell the customer that it will be about 3 iterations (6 divided by 2).
If you’re part of a fairly experienced team, and if your team already has an established velocity, consider trying this approach in lieu of assigning Story Points – it can help save some time by not going through the sizing exercises. However, if your team is fairly new to writing and sizing User Stories, or if your team doesn’t yet have an established velocity, I would recommend sticking with Story Points for now because the process of assigning Story Points (during a Planning Poker exercise, for example) is *very* helpful in aligning a team’s thinking, as well as building team synergy, as the team goes through the process of having the discussions that naturally arise when sizing stories.
As always, please feel free to comment, provide suggestions and recommendations, or even tell us about your experiences using Story Points and/or other ways of tracking velocity. Thanks!
Here is an update to my Part 1 blog posting on whether or not teams are really DONE when they release software. Just to recap, I worry that agile teams do not have a reliable mechanism to ensure that new features or capabilities released into production succeed as planned. Without a verification tool in place, teams may not be able to confirm success. And worse, once teams get focused on new work they may not track what was released, “out of sight, out of mind”. Of course most teams respond to issues that get reported, but if nothing is reported, does that mean success or not?
A team experimenting with this decided to add a verification safeguard to their DONE criteria. Each story has to have a “plan to monitor feature success” in production. For this particular team, monitoring typically requires that the software logs success and failure data and the team has a way to track what is logged over time. Their DONE criteria ensures that they add logging abilities when they design the feature and they test that the logged data delivers the information required. When the new feature is in production, the team uses a monitoring tool to report on usage and failures. Once an average logging pattern is obtained, they add alerts to notify if there are problems.
Every team may not have this type of logging and monitoring capability in place but adding this DONE criteria can serve as a driver to make that happen. It took the team mentioned multiple attempts to get the right tools in place but the effort was very beneficial. They have already used the new methodology to find an issue (and fix it) before their users did. And by looking at success trends, they have been able to verify that uptake of their features were as planned.
If your team does not have a strategy to validate success of new features or capabilities, I recommend that you add this initiative to your backlog. You may have to start from the beginning – get the right design strategy in your code and then add the tools to monitor. Use this new DONE criteria as a mechanism to get started. A story is DONE when there is a defined plan for monitoring the solution in production. Check!
One of the reasons that I found agile compelling when I first heard about it was the notion that you are “DONE” with the feature when your customers were successfully using it in production and were pleased with the capability. I have coached teams to establish success criteria for user stories but have not required that it include validation of success in production. That feels like an oversight now. I am discovering scenarios where the desired feature does not deliver the success promised and there is no automatic mechanism to address the shortfall.
Now that I work in eCommerce, the business pays keen attention to data that validates that the software is successful. Even with this attention our technology teams often lose the connection between getting capabilities into production and verifying the value they do or do not produce.
I now recommend that teams take a new look at success criteria and include a measure of post-production success. Maybe teams add a “post release validation” criteria to their DONE list. The first challenge is for teams to identify useful data that enables them to track and validate success. The second challenge is to figure out when to assess the data and how to respond to it.
Solving the first problem requires that the software or customer feedback mechanism provide a way to measure success. The process to collect data is usually hardwired into eCommerce platforms so measurement should not be hard. The software could track how often the capability is used or how well the capability works by tracking errors as a percentage of success.
Success may be harder to track when there are no metrics accessible by the team. In these scenarios teams can measure speed at uptake of a new release, leverage customer surveys, or perform post-release customer feedback sessions. None of these options provide the validation that data offers but they would prove the team with a tool to ensure post release success.
It is important to re-establish our focus on post release success validation. I recommend that each story include not only success criteria for release but also for post-release.
Modified on by ScottWill
Leslie and I will be hosting a webcast entitled, How Do Agile Teams Keep From "Waterfalling Backward?" as part of the Global Rational User Community (GRUC) webcast series.
"Waterfalling backward" describes the situation where a team that has started down the Agile path reverts to waterfall ways of thinking when difficult situations arise. We'll cover some examples of what "waterfalling backward" looks like as well as several easy-to-adopt techniques that help teams re-align with Agile if they've started "waterfalling backward."
Note that three attendees will win a copy of our book, Being Agile!
Register here: https://attendee.gotowebinar.com/register/113940558500710914
We are looking forward to the session and hope you can join us!
One of the things that has been extremely popular in software organizations for a long time is to count the number of defects that have been found via testing. I propose that teams no longer do this for two reasons.
First, one of the lines of thinking behind counting defects is that it is somehow a measure of quality. If we were running an assembly line in a manufacturing plant, and the assembly line didn’t change from one week to the next, but the number of defects increased, we would know there was a quality problem somewhere – something on the assembly line was likely broken. Despite some similarities, software development is not an assembly line – unlike an assembly line, every line of code that is written and tested has never been written or tested before, so expecting defect counts to be an indicator of quality doesn’t correlate. If a team found 25 defects last week, and 50 this week, what does that tell us? Is the code quality really getting worse, or did the team just do a lot more testing? If so, did the tests executed reflect real-world usage, or were the tests all edge-cases? Did the team deploy the code into a new environment that had never been used before? I’m sure you can come up with many more scenarios… Anyway, the possible reasons for the higher number of defects this week are almost boundless, and the time spent trying to determine why the variation occurred would be better used to: work more with customers to better understand their usage patterns and particular needs; create more needed functionality; improve their processes further; cross-train team members; adopt a new tool; etc., etc., etc. All of which leads to my second point…
Agile software development has its foundations in Lean Thinking, and one of the Lean principles is to eliminate waste. Does counting defects, and spending time trying to figure out what causes variations in the numbers of defects from one time period to the next, contribute directly to the success of the project? If not, then the time spent doing so should be viewed as a waste – time that could be better spent doing more important work.
In conclusion I’d like to leave you with two thoughts regarding defects: first, when defects are found, they should be fixed immediately – period. Don’t allow a backlog of defects to accrue. If it does make sense to do some root-cause analysis to determine why a particular defect occurred, then that’s fine – do so when it makes sense to do so, but it likely does NOT make sense to do so for the vast majority of the defects found. And the second point is that a mature, Agile team should be able to have “difficult” conversations when needed: “Hey Scott, I’ve noticed that I’m finding a lot of defects in your code this week – is there anything that’s distracting you, or anything I can help you with?” Instead of taking umbrage at such comments, I should be thankful that my team is willing to raise the issue and have the discussion.
So, the next time you’re asked to track the number of defects found, ask “Why…?” The answer will likely shed light on ways to eliminate waste and help overcome the “That’s the way we’ve always done it!” mentality, as well as foster a relentless, continuous improvement mindset.
As always, thoughts, comments, and questions are most welcome! Thank you!
I recently read a fascinating article in a farm journal that made me think about software engineering. You are likely thinking that there’s not much that farming and software engineering have in common and, for the most part, you’re right. However, in this particular instance, the story relates directly to typical practices in the software industry today.
The article told the story of a group of research scientists who had spent years working with bell peppers. The scientists embarked on a program to cultivate peppers that were more disease resistant, could flourish under sub-optimal conditions such as poor soil, limited water, & lack of full sun, and that would also produce greater numbers of peppers than the original.
After quite a bit of time, effort, and funding they managed to achieve some moderate success. Then someone suggested that they take some of their latest crop of peppers to a local restaurant and get some input from one of the chefs. While the article didn’t go into specifics about the chef’s reaction, it was clear that the reaction was akin to throwing the pepper in the garbage and suggesting that the scientists could have spent their time better elsewhere. The one thing that mattered most to the chef – the taste of the pepper itself – had not even been considered as one of the criteria to be taken into account by the research scientists. The scientists got caught up in their own little world of science and totally forgot that peppers were food and not specimens.
This immediately reminded me of the software industry and how so many teams have forgotten why they’re creating software. It’s not to simply write code, or execute test cases, or try out some new tool, or deploy a capability to a website – the software should ultimately meet customers’ needs. And the best way to do that is to work directly with customers as the capabilities are being created to ensure what’s being developed meets their needs.
Fortunately, the move to Agile and DevOps is making it much easier to work with customers due to the practices of continuous development and continuous deployment, as well as an intense focus on customer engagement and the monitoring of customer usage of the offerings.
The moral of the story…? Don’t forget the customer! J
Just like the drip, drip, drip of a leaky faucet, we lose productive time by being regularly distracted. Some distractions can't eliminated, but many can... I recently wrote an article for InformIT that was adapted from one of the chapters in the book that Leslie and I co-authored entitled Being Agile. The article discusses a pervasive form of regular distractions that many people have taken for granted nowadays. I thought I'd pass along a link here:
I hope you enjoy the article and that it provides some food for thought... As always, comments and questions are welcome. Thanks!
Modified on by LeslieEkas
Short feedback loops are one of the key benefits to the success of agile because they enable teams to learn fast and adjust. This technique is leveraged in numerous practices used by agile teams including the daily standup, sprint demos, code checkin, build and validation cycles, pair programming, and continuous delivery. One of the key benefits that convinced me to try agile was its claim to deliver greater value with higher quality. After I had experimented with agile for a while, I started to understand just how critical short feedback loops are to achieve these goals.
Short feedback loops enable you to learn fast so that you can adjust while the costs are low. It has been well understood in software development that the cost to fix a defect increases and in many cases increases dramatically the longer you wait to fix it. Some of the reasons are that more code may have been added making the defect more complex to fix, the new code requires additional testing and fixing, and by the time you get to the fix, you may have lost familiarity with the code in question. To combat this issue, it is critical to find defects soon after the code is checked in. One short feedback loop used to combat this is to automatically test the code through a series of progressive test gates. Additionally if the code is checked in frequently (another short feedback loop) developers get quick feedback on quality.
Another manifestation of short feedback loops is the daily standup which enables teams to identify critical blockers and get help to get them fixed before the next meeting. Someone has a problem, someone jumps into help fix it. Shorten the time to fix a problem; reduce the cost.
To ensure that agile teams are meeting the needs of their customers, they demo their new work at the end of each sprint to get timely feedback. This version of a short feedback loop is fairly well known to agile teams but it may be surprising how few teams actually do these with their end users. These demos give the product users ongoing updates on how the new code is progressing and the opportunity to provide feedback. This kind of feedback is like bug fixing, the sooner it is applied the more cost effective it is. Going a step further, releasing functionality frequently enables users to get traction and provide feedback before the functionality is complete. Customers have to be willing to experiment and respond but the win for them is that they do get an active voice in getting the value they require.
As teams continue their agile adoption, it is useful to remember why we adopt various techniques to make sure that we get the value promised. To deliver higher value with better quality we use short feedback loops so that we learn fast and adjust.
Modified on by LeslieEkas
Teams sometimes struggle to write down the first story in user stories sessions, even if it is not the first time the team has written stories together. Brainstorming often works to get the team headed in the right direction. Some of the following questions might help:
Describe the problem you are trying to solve?
What benefit will solving the problem provide the business?
What are the consequences if you do not solve the problem?
What are the critical edge cases to consider? (establish limits)
What capabilities of the solution need to be considered? E.g. security, performance, error recovery, etc?
What is the minimal problem set to solve? What is the best way to validate that a solution will solve the problem quickly? Think minimal viable product.
I have run many brainstorming sessions before we start to write stories. In the process I have come up a few rules to help the process be successful.
Rule 1: Brainstorm. What I mean by this is to allow the session to be a “storm”. Just write down everything you think of without filtering the relevance. By working this way you may discover what does and what does NOT matter. It will help you discover the limits of the stories that you need to write. It will allow you the opportunity to think “out of the box”.
Rule 2: Brainstorm with your whole team and your stakeholders. Writing stories without your stakeholders is doable but I have seen far better results with your stakeholders helping with the process. In fact I was in a brainstorming/user writing session with a stakeholder. The team started by writing the stories assuming they knew what was required. When I asked the stakeholder what problem he was trying to solve, we discovered through the following conversation, that the software already provided a solution. Frankly, it was the best user story session I have ever attended!
Rule 3: Take notes. I know we said this in our book and we may have said it in our blog but it is worth repeating. Take notes while you write stories and take notes during the brainstorming session. In fact it does not hurt to have someone take notes verbatim. How many times does someone come up with the perfect phrase but cannot repeat what they just said!?
Rule 4: Avoid getting into the “how” you are going to do it. Avoiding the “how” in user stories may sound like a broken record, so I probably do not need to elaborate further. Sometimes however it may be necessary – but proceed with caution.
Rule 5: Limit your time. Brainstorming can be very useful but don't overdo it. I suggest 30-90 minutes, 2 hours tops if the scope is large. Once you have an idea of what your minimal viable product or in this case, solution is, start by writing a story that takes the first step to achieve that.
Try brainstorming if you are off to a slow start writing stories or are just plain stuck. If it works, use it. If it does not work, skip it.
Extra Credit: Make your first few stories vertical stories. Vertical stories are harder to write because they force teams to think across the architecture and let go of their concern that the story is not shippable. But vertical stories provide so much value that it is worth the effort to try to think vertical.