12 Steps to Better DevOps
mdelder 120000CYNE Visits (7046)
For those that follow Joel on Software, you may be familiar with his blog entry "The Joel Test: 12 Steps to Better Code". While authored way back in 2000, its content and statements are still relevant today. As people tried to understand Agile, the 12-step test provided a simple litmus test of concrete actions to help improve your over all software delivery.
Roll forward nearly a decade where DevOps is seeking to drive similar improvements to how operational aspects of software are delivered, managed, and maintained over time. Many people have explained their view of DevOps and what it takes to "live DevOps". I'd like to propose a similar set of concrete actions like Joel did for software to help explain my view of this topic.
Ultimately, DevOps is about trust. Trust between the Business & Development. Trust between Development and Operations. Trust among these groups enables the organization as a whole to be more agile and to better manage risk.
So how do we build trust? Consider the following 12 steps:
These steps are inspired by Joel's approach to explaining a new discipline, but follow the mantras that we've already recommended. Let's talk a bit more about these steps and I'll flag the association with the mantras in each case. You can reach more about the mantras in our first blog entry on the Invisible Thread.
1. Do your developers and operators communicate the production realities and the application's requirements?
[Track and Plan everything]
Almost every customer we've ever talked to cites the lack of effective communication as a barrier to accelerated delivery. Often development and operations teams work in silos because that's how the executive hierarchy is configured. Each group is measured against a different set of metrics: change vs. stability. However, to ensure smooth delivery, it is absolutely imperative to establish how the needs of the application will be supported by the final production environment early in the process. The communication might be as lightweight as a face to face meeting where all concerns are discussed, a template to understand what capabilities are needed and provided, or something more detailed like a topology diagram. Regardless of the format, the information must be communicated early.
2. Do you version deployment configuration and scripts along with your source code?
Often, specialists are already creating deployment scripts today; bash, perl, python, wsadmin, ruby, chef, puppet are all examples of assets which help stand up environments. But for whatever reason, these assets aren't generally handled like the same .java, .jsp, .html or .ddl assets which make up the business logic of the application. Step 2 says that deployment configuration and script assets should be treated like valuable content just like the application source code.
The idea is that the automation scripts and configuration files which have been left by the wayside as middleware evolved are brought back into the fold and considered part of the overall solution. Incorporating these artifacts into the same version control systems that have proven valuable for application source code means that you can more easily view and compare changes from prior deployments. If you're deploying the system completely through automation, it may also mean that rolling back a new deployment means simply re-applying the last approved version of the automation.
3. Do you have patterns for platforms and applications, designed jointly by development and operations?
4. Can your developers launch and destroy production-like environments from those patterns?
[Track and Plan everything]
Steps 3 and 4 are focused on enabling production-like systems to be leveraged earlier in the development process. Standard application and platform patterns are decided with collaboration between development and operations. Those patterns are then used by developers in an on demand form when they need to iterate quickly on changes for the application in their own isolated environment. When changes are ready for delivery, the build and deploy process should use the same patterns to deploy and verify each new build. The net is that both development and the continuous delivery process are using production-like patterns to verify their approach far earlier in the delivery process than what you may be used to today.
5. Are your patterns based on reusable deployment configuration scripts?
6. Can you deploy an environment (platform and application) in one step?
7. Do you deploy your applications daily into production-like environments and verify them?
Steps 5, 6, and 7 each deal with automation -- but unlike traditional automation approaches, DevOps generally uses automation technologies which have file-based persistence, which are declarative rather than imperative, and which are applied, applied, and applied over and over again. Many organizations have already bought into the value of automation, but often focus on automation above the platform layer -- to deploy the application into an existing platform. DevOps takes the end goal a step further to stress automation to create the entire solution -- from the platforms all the way to the end user.
DevOps asks the same kind of questions about deployments that agile software asked about builds. The ability to do a build in one step enabled us to perform builds more frequently ensuring that everything compiles. Similarly, being able to do a deployment in one step enables more frequent deployments. If a problem is introduced, you want to find it early and fail fast.
There's a subtly here that may be overlooked and I want to call your attention to it. If you're providing production like environments to validate the application, you're also able to validate the platforms that are supporting the applications. Hence any new platform updates that you plan to roll out feed into the pipeline just like application builds. If your middleware platform needs to be updated, you would roll that change out first early in a development stage of your pipeline. Just as application artifacts are graduated through the pipeline as they pass their verifications, so are changes to the infrastructure components like your middleware. Treating application and infrastructure components like equal citizens means that there isn't a special exception process for one kind of change or another -- all changes go through the same journey together in harmony.
Ultimately, steps 5-7 hint at a role transition that occurs in DevOps where operations engineers no longer act as administrators but more as content creators -- as infrastructure developers. Like software developers, infrastructure developers create automation to provision and maintain the infrastructure, platforms, and applications in each stage of your pipeline.
8. Do you link bugs and work items to changes in the application and configuration?
9. Do you associate tickets for production issues with relevant bugs opened for development to fix?
[Track and Plan everything]
Steps 8 and 9 talk about how changes are tracked. Joel asked you if you had a bug database. We're asking if you link bugs or work items directly to the affected changes in your version control. It's important to have this level of traceability so you can identify why a particular change was introduced and ideally what related files were touched. If you want to see what this looks like, take a look at jazz.net. And since you're tracking changes to your deployment configuration and scripts (as in Step 2), that means that changes to those artifacts also go through the bug database and are then deployed and tested just like application code.
No matter how much some of us developers would like to think the real work is done once an application reaches production, it's not! When problems happen in production, there's generally an existing process for tracking incidents. We're suggesting that whenever incidents are encountered, an incident ticket should be opened and linked directly to defects in either the software or configuration layer, ensuring traceability from end to end. The bug or work item enables you to track when a fix is released to the application or configuration and know when the ticket can be closed. Knowing what the incident looked like in production means you can also beef up test cases so that a regression isn't introduced in the future. It doesn't matter if the root cause is in the business logic or an invalid datasource configuration, the changes should be treated and managed in the same way. Since bugs are linked to changes (Step 8), you'll know what had to change to fix the incident.
10. Do you have automated tests to validate your application and platform function and characteristics?
The goal here is to ensure that your provisioning process is meeting the expectations of stakeholders including development, quality assurance, and operations. Other stakeholders like performance and security testing may also play a role. Software development did this first with compilation. It ensured that all of your source code could be translated into a machine readable format. Errors were quickly reported as part of the build process. Then agile promoted the notion of incorporating unit tests, static analysis, and other security compliance scans. DevOps expands that notion to also testing the deployed software and its supporting platforms. Steps 7 and 10 ensures that you actually use it regularly!
11. Do you monitor software against expectations after deploying your application?
[Audit and Monitor everything]
Most production systems already have some level of monitoring in use today. Exposing production-like patterns and environments earlier should mean that monitoring capabilities are available to developers. Hence, developers and operators have the opportunity to verify that application code is being written to be easy to monitor. Just like knowing what an incident looks like can help you beef up test cases, knowing what kind of information is (or isn't) coming from an application's monitoring capabilities will help you understand how to provide better feedback in production. Like Steps 7 and 10 are about providing continuous feedback about the function of the application, Step 11 is about validating expectations for performance and reliability through monitoring early in the process.
12. Do you have a delivery pipeline exposed through a summary dashboard to assess delivery velocity?
Almost all organizations have a loosely defined, loosely coupled notion of delivery pipeline. Establishing a common understanding of what stages an application must pass through on the way to production ensures that hand offs are predictable and tracked to ensure accountability. Beyond that, providing progress reports about applications and their location through the pipeline -- from inception to happy end user -- is critical to ensure that you're constantly improving the overall delivery response time.
A lot of this process also means establishing the right metrics for your organization at a higher level and ensuring that those metrics can be tracked. An example metric might include something like "how long does it take to release a no-change version of my application?" That is, if you don't change a single line of source code, how quickly could you get the current version rolled out into production (starting in development)? Often (surprisingly) this time is non-zero. A lot of it may include manual configuration and intervention which limits how quickly it can be done. Again, focusing on automation from barebones to happy user means that you can increase your delivery velocity with managed risk.
In closing, while this article is by no means fully prescriptive of how to transform your organization, I hope it has at least inspired some thoughts about how you might begin incremental adoption of the techniques encouraged by DevOps.
I would like to thank Bill Mitlehner and Jeff Imholz for their contribution to the ideas expressed in this article.