Share this post:
According to the 2017 “State of DevOps Report,” the key metrics used to indicate high performance in IT organizations are deployment frequency, lead time and mean time to recover. As a continuous delivery guy, I’ve spent the last decade helping clients in the first two areas: rapidly determining if a change is good or bad, and if it’s good, getting it shipped.
However, simply testing and releasing software isn’t enough. Every change carries some risk. Rapidly identifying and resolving issues is critical to gaining the permission to change, and, quite frankly, critical to everyday operations, whether or not something is changing.
The report states that high-performing teams deploy 46 times as often and are one-fifth as likely to have any given change cause a failure. That sounds great, but simple arithmetic shows that high performers have nine times as many failures as a result. This is why better response to failure is critical, and high performers resolve issues 90 times faster than low performers. So even with increased failures, total outage time is one-tenth as much.
Software developers have instrumented more systems to rapidly identify when things start going wrong. Unfortunately, that can mean many alerts that aren’t serious distract delivery and software reliability estimation (SRE) teams from the real, incident-causing problems.
This is an area my friend James Moore has spent a great deal of time working on as a product manager for IBM Cloud Event Manager. He works to answer questions such as the following:
- How do you manage the noise?
- How do teams work in a more “closed-loop” way?
- How do we make the key operational trends observable and understandable?
These challenges are growing as developers deliver change into environments and as the environments themselves become increasingly dynamic with the advent of cloud infrastructure and containers.
James will dive into these topics in an upcoming webinar, “Building Ops Automation in DevOps,” which I am quite excited about. Register here.
I hope you join me at this event. Too many teams are leaving the “Ops” out of DevOps and their transformation is hurting as a result. As we speed up the injection of change into our environments, we need to make sure we have strong immune systems keeping the environments healthy and strong.