This article describes the challenges inherent in transforming from a traditional deployment approach to one using the emerging "DevOps" concepts and techniques and the changes in approach required to address them. The scope of this article includes the changes required in the definition of the deployed artifact, as well as the organizational and cultural transformations required to take advantage of the DevOps approach.
The general tenet of agile development is that an organization should be continuously delivering working software at a sustainable pace. At the same time, a number of new operations tools and infrastructure components have become available through the related movements of virtualization and cloud computing. Although agile methods have been readily adopted by many development teams, the software that they produce is often kept from users by deployment processes that are fraught with gaps, mistake-prone manual tasks, or other delays related to operational work.
Meanwhile, the pressures of an ever more competitive marketplace are causing business leaders to require faster cycle time for getting new software and features into the hands of users and customers. These three factors -- agile development, new operations technologies, and a market imperative for business responsiveness -- are causing a fundamental rethinking of how organizations think about delivering software.
The common denominator among the three trends is that the speed of change has increased. It is no longer acceptable for new software capabilities that are needed by business users to sit on a shelf for months waiting for a large deployment event. Software has value only when it is in use, and getting to that value quickly is more critical than ever in today's marketplace. This shift is causing people to challenge assumptions about how organizations deliver software, and the answers to those challenges have taken on the moniker of the "DevOps" movement.
In order to understand software delivery in general, and deployment in particular, under a DevOps approach, it is important to understand two other items:
- First, it is essential to have a clear and consistent understanding of DevOps and why it is interesting to various stakeholders.
- Second, it is important to understand how assumptions about deployment change when attempting to apply DevOps approaches.
When those items are understood, it is possible to look at a framework for executing deployments in support of a DevOps approach.
The term DevOps, derived by combining development and IT operations, generally refers to an approach of unifying work on a software system across all disciplines in order to deliver changes to that system at the pace that the business needs them. It usually combines the agile approach of making rapid, small changes in order to ensure focus on top-value work, to minimize the risk of defects that is associated with large changes, and to minimize the value drift that is often associated with long gaps between a software business case and project completion. The operations worldview, on the other hand, is to treat change as a cause of instability. Instability can, of course, lead to downtime, which is the number one negative occurrence in the operations disciplines. Therefore, operations organizations generally resist change.
DevOps is usually viewed as a change movement, however, because in most organizations, development, operations, and business have a series of conflicting behaviors and rewards that lead to countless misbehaviors. Business rewards development for new features and changes but punishes operations for downtime. So development pushes operations for more deployments and operations pushes development for more structure and rigor in the delivered artifacts. These conflicts have become entrenched in many organizations and are made worse by intrinsic differences in the maturity of the respective organizations. The DevOps approach seeks to eliminate these barriers, level the playing field among the competing groups, and focus them all on a common goal. To do this, it is important to understand a bit about the mindset of each group.
Development is generally perceived to be more advanced in its efforts to become more responsive to business needs. The Agile Manifesto is more than 10 years old, and the manifesto itself is, in part, the result of earlier extreme programming and pair programming approaches and experiments. To be fair, the software piece of the puzzle was viewed as the low-hanging fruit -- easily changed and theoretically isolated "above" the infrastructure and platform concerns. And infrastructure has traditionally been treated as a very expensive capital expenditure, with long amortization cycles, that is much harder to change. Unfortunately, sophisticated software is very infrastructure intensive and requires infrastructure to evolve at the same pace as the software itself evolves. This connection is why DevOps was sometimes called "Agile Operations" in the early days.
Whatever it is called, the notion of keeping software and infrastructure separate is not sustainable if technology is to keep pace with the business needs. This is pushing development to become more engaged with the realities of sustaining complex software infrastructures in production.
Fortunately, a number of new technologies and techniques have come to the forefront in operations to help it become more responsive. The primary disruptive technology in the operations realm is the widespread availability of inexpensive virtualization on commodity hardware. This has given rise to new systems management approaches and, of course, cloud computing. This technology gained popularity by giving an immediate value by allowing organizations to quickly and easily consolidate their underutilized computing resources. Gartner analysts estimate that 50% of all applications are now running in virtual environments.
Simple consolidation, however, is just a cost-cutting measure with a finite rate of value capture. Virtualization technology also enables a level of changeability to infrastructure that allows operations to evolve the infrastructure without affecting stability at rates that were previously unattainable. The techniques used to exploit these capabilities of virtualization often appear within the cloud computing-related disciplines. These emerging techniques have given operations a way to be as agile as development and answer the responsiveness requirements of the business.
Businesses, for their part, have figured out that understanding and exploiting technology is more critical than ever to achieving results. According to IBM's 2012 CEO Survey, which was based on interviews with more than 1,700 chief executives in 64 countries, "technology factors" are the number one factor impacting organizations. That has risen steadily since 2004, when they ranked only sixth out of nine factors listed.
So business leaders have noticed that the ability to respond to their customers' needs is a competitive advantage. The corollary, of course, is that poor technological execution represents an intrinsic threat to the business. This is not an overnight realization, but it has reached a tipping point. The same survey goes on to discuss how the respondents gave an equal weight (seven out of ten) to understanding what individual customers wanted and to being able to respond to them with shorter time-to-market cycles. Ultimately, this is the business pressure that is colliding with the procedural realities of how the technical disciplines of development and operations have been operating in the past and stimulating the discussion of how to do it better.
Achieving the ideals of DevOps represents several fundamental changes of approach and behavior for all three constituencies.
One of the most obvious aspects of achieving these ideals is the assumption that getting new software capabilities into the hands of users is what makes software changes valuable. This assumption implies that delivering the new capabilities sooner is therefore better, because it means that the value from the new capabilities is achieved sooner. Most organizations, however, have historically been oriented toward delivering very large deployments at very infrequent intervals.
It is the assumption of a short interval, or high frequency of delivery, that is at the root of the transformation in the definition of deployment. DevOps does away with the notion that deployment processes are things that are pulled out a couple of times a year to move very large releases and replaces it with the notion of a delivery system that is always on and is very good at deploying a large number of small things very consistently. The key point is replaces'. Traditional systems oriented to large deployments are usually far too heavy to move fast enough to support a DevOps environment. Attempts to "do it the old way, just faster" will usually fail, because the assumptions that were used to create the old way were never intended to support a high frequency of activity. This is not "good" nor "bad;" it is simply an engineering reality that must be addressed.
Given that this new delivery process becomes an intrinsic part of deriving value from the software change investment, it becomes an extended part of the overall system. It becomes the primary route to value for development investment. This adds the assumption, then, that managing the efficiency and effectiveness of the delivery process has direct and quantifiable business value associated with the system as a whole. Implicit in this assumption is that it is not acceptable to ignore the cost of sustaining the delivery capability itself.
Another hallmark of the DevOps deployment is the focus on the complete system rather than simply the code changes that deliver the functionality. DevOps deliberately acknowledges that the application code depends on an infrastructure of servers, networks, and databases to deliver value. Therefore, a DevOps deployment treats all changes to all components of the system as equal and tracks them as such. Some infrastructure changes, such as a deliberate upgrade of a switch or the addition of storage, can be considered enhancements -- new functionality for the system -- even though they might be less visible. Similarly, patches to a web server or SAN firmware might be regarded as fixing bugs or defects. However an individual organization classifies things, the crucial point is that the other pieces are treated with equal thoroughness to ensure the consistent stability of the system.
Applying the whole system perspective to a top-level model yields four core categories of changes to be delivered in a DevOps environment:
- The application is the actual feature producing code in the system. This is the high-visibility functionality that drives the core business processes that provide the reason for the system. Its visibility has garnered it a lot of focus over time, yet that focus has rarely led to proactive consideration of the value of the environment that supports it.
- OS services
- The OS services category is a catch-all for the machines (virtual or otherwise), operating system services and libraries, and middleware that allows applications to run. The consistency of configuration across all environments, from test through production, represents a key driver of application stability and quality.
- Network services
- Network services embody all of the network devices and configurations around the application. These configurations obviously affect performance and availability, but can also affect application behavior and architecture as the application scales. For example, the application and the load balancer have to agree on session handling, among other things.
- The database underlying the application contains the critical data that the application uses and produces. The database and the application must remain perfectly synchronized with regard to the structure of the data schema at all times throughout the lifecycle.
Figure 1. Visualizing the whole system
DevOps-style deployments require a highly disciplined approach to delivering changes to the software systems that they support. The more exceptions, variances, or special cases that the system has to tolerate, the more expensive the system will become to create and, more importantly, to maintain. Controlling complexity of the delivery process means understanding and controlling what is being delivered as much as how it is being delivered. That means that the organization must agree on a set of defined packages that make up each delivery unit and a system for consistently getting them to the desired environment.
An effective delivery system requires the items being delivered to have a reasonable level of standardization. The physical world has a good example in the form of shipping containers. A standard container can be moved by standard equipment through an amazingly complex logistics network consisting of many different means of transport, including trains, ships, and trucks. With supporting cranes and tracking, it is possible to ship nearly anything anywhere as long as it can fit into a container. The act of standardizing those containers revolutionized how the entire world moved goods and conducted trade.
Figure 2. Shipping container
The good news is that there is no need to define an international standard to ship changes into a software system. This can often be accomplished through a pragmatic extension of activities that many teams already perform. Many organizations are already producing builds in a steady rhythm, either through continuous integration or through some other scheduled event. These builds have, or should have, some sort of unique identifier that enables those who use the build, such as testers, understand what to expect from that build.
This approach can provide a baseline for organizations to begin understanding change groupings in the other three components. If each of the four components can have a standard way of identifying their changes, it becomes relatively straightforward to track combinations of the four components through their unique change designators. That combination of build number of the application, configuration number servers, schema version for the database, and so forth can be tracked and deployed to any environment for any type of testing or production use.
Unlike shipping containers, however, the ecosystem of deployable packages does not necessarily involve hydraulics and diesel fuel. Instead, it involves a group of tools and processes that allow the organization to manipulate their dev, test, and production environments in substantively the same way and put the package into any of those environments at any time. These tools and processes deal with a variety of functions within an environment. These functions cross domains of expertise and are applied differently, depending on which of the four delivery components is in question. Given the complexity, it is helpful to apply a framework to visualize and categorize the functions into a set of capabilities that must exist at one level or another to be successful.
For DevOps, one taxonomy of capabilities involves six categories, each with a set of subcategories. This framework does not prescribe tools for the categories, nor does it mandate that all components must have all capabilities. It does, however, provide a useful tool for understanding and prioritizing gaps in achieving a functioning DevOps delivery system.
Figure 3. Delivery system capabilities map
These definitions summarize the six categories:
- Change management
- The set of activities for ensuring that improvements to the system are properly tracked
- Deals with the need to coordinate activity across a distributed system in a synchronized manner
- Covers the activities related to managing the lifecycle of artifacts of all kinds that are running on the infrastructure
- Provides the instrumentation for keeping the environment healthy and providing system behavior feedback all stakeholders
- System registry
- Provides a centralized archive of the shared infrastructure information that the entire system requires to operate
- Makes sure that the infrastructure environment provides sufficient numbers of the correct components to run the system
The value of a taxonomy is that it is a tool for understanding an environment, identifying priority needs within it, and evaluating solutions for fit. It enables an organization to quickly create a structured understanding of both its needs and the offering's strengths. Alternatively, it can be used to assess how well an organization or solution performs in a particular area over time. This kind of structured taxonomy also allows an evaluation to be more or less detailed, depending on the situation.
For example, in August 2012 IBM released a beta version of a continuous deployment framework called SmartCloud™ Continuous Delivery. The purpose of this framework is to delivers tools and integrations that help organizations adopt DevOps delivery methods.
By applying only the top levels of the taxonomy to that offering, it is possible to see how it is laying a very broad framework for covering all six major capability areas:
- First, the SmartCloud Continuous Delivery beta leverages IBM® Rational Team Concert™ and Rational® Asset Manager to provide some change management , orchestration, and deployment capabilities.
- Second, there are also integrations that leverage IBM's line of cloud-style system infrastructure tools, such as the IBM Workload Deployer, IBM® SmartCloud™ Provisioning, and IBM® PureSystems™, to provide system registry and provisioning capabilities.
- Third, there is some monitoriing capability in the form of instrumentation of the process through reporting.
- Finally, there are integrations with tools such as the Rational Automation Framework and Rational Build Forge® that fill in the rest of the Deployment and Orchestration capabilities.
Even a shallow look using a structured approach shows that SmartCloud Continuous Delivery has some coverage in each of the six capability areas, but there are different levels of depth in each. Varying levels of depth among the capability areas are to be expected as tool providers emphasize different approaches, integrate with different products or product portfolios, and are otherwise optimized for the needs of their users. Having a consistent way to assess such solutions is the key for customer organizations to ensure that they are getting what they really need.
Whatever structure is used to understand it, a DevOps delivery system becomes the central broker of all changes to an application system. This centrality applies to all environments in which the application runs, including both production and preproduction environments. This ensures that the application is always running in a known state and running in an environment with a known configuration. Getting to this state requires more than a clear understanding of the principles and a structure within which to build capabilities. It requires actual application of the principles, and organizations must respect a number of factors to do so effectively.
There are almost always multiple environments in which a given software system runs. There is production, of course, but there are also any number of quality assurance, or test, environments. Those quality assurance environments are where the organization verifies that a given change has the intended impact. These environments are often treated as second-class to production, but the truth is that false information from a test environment could lead to a production outage. This dependency means that the organization is well-served to take them seriously. Successfully executing DevOps-style delivery depends on it.
The first step in taking the environments seriously is ensuring that all environments are truly representative. This is an engineering exercise to validate that an assumed configuration in a qualification environment is a good proxy for the production environment. After this baseline is established, it becomes a more straightforward exercise to maintain that state.
The second step is to ensure that all changes are promoted through the environments to production. Because the application system is treated as a whole, there should never be any sort of configuration change to any aspect of the production environment that is not run through the quality assurance process. This requires a change in perspective that some changes are easy or different from others. Beyond the obvious benefits of reducing risk and improving reliability, this also reduces the cost of environment maintenance and synchronization. This approach means that the environments are always in a known state and, because all changes are applied using the same infrastructure, it reduces duplication of effort to manage the configuration of multiple environments.
Even with a unified environment management approach, emergencies happen. An exception should be a very rare event in the environment. A critical externally driven event is the usual scenario. Good examples are vendor bugs in a platform or an emergency security patch. These should not be internally driven feature changes.
The process for handling an exception should be treat the exception as a truly important event and should include corrective and post-mortem activities. The corrective activity must focus on the resynchronization of the environments. For example, if the standard change delivery system was not used to apply the change to the emergency environment, the change must be added to the standard change delivery system. Regardless of how the change was applied, there must also be a post-mortem process to answer the question of why the emergency happened and, more importantly, how to reduce the likelihood of that emergency ever happening again.
Using DevOps methods to deliver software changes provides ample opportunity to measure and improve the process. In addition to the consistency of a standard system, the high frequency of a DevOps approach provides more data points. Various metrics for things such as cycle time, packaging gaps, or process failures are obvious things to measure. These metrics are not typical operations metrics, such as availability or uptime. Instead, they are geared toward the process that ensures things like uptime and availability as an outcome. Other good things to track are metrics about the effectiveness and efficiency of the process. Examples here are the number of person hours required to sustain the application system, the number of person hours required to maintain the DevOps infrastructure, or the wait time for delivery to a given environment.
A DevOps delivery method also enables great instrumentation for the application system. These metrics are very application-specific, but they reveal performance, user-experienced responsiveness, and so on. Thus, these metrics inform decisions about comparing the operating environments, identify bottlenecks in an application system, or manage capacity requirements proactively.
Tools in DevOps delivery must be "loosely coupled" to the environment in which they are deployed. This means that teams must be able to replace individual tools without a major disruption to the overall system.
Architecturally speaking, there are several models for achieving this. For example, focusing on tools that use web services APIs as their prime integration mechanism is a popular doctrine for organizing DevOps tools. Regardless of the technical approach, the teams must realize that changing one of the tools has some level of impact on the organization's overall ability to deploy changes. When a team wants to replace one of their tools for a particular capability category, they must deliberately analyze that impact and weigh it against the timing and benefit that replacing the tool provides.
DevOps requires a fundamental change in how organizations approach delivering software to operating environments. The payoff for doing so, however, is better-quality software delivered more frequently. A structured approach to the problem that treats an application system as a unified whole can help organizations collaborate on the effort. That collaboration can be supported by a systematic approach to delivering and improving capabilities that deliver the whole application system progressively more efficiently. The DevOps approach is one that favors the agile approach of steady, incremental improvement over many cycles and provides organizations the structure and insight to consistently deliver valuable software to users as it is needed. That is really the point of deploying software in the first place.
- Explore the Rational software area on developerWorks for technical resources, best practices, and information about Rational collaborative and integrated solutions for software and systems delivery.
- Subscribe to the developerWorks weekly email newsletter, and choose the topics to follow.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools, as well as IT industry trends.
- Watch developerWorks on-demand demos, ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.
Get products and technologies
- Download a free trial version of Rational software.
- Evaluate other IBM software in the way that suits you best: Download it for a trial, try it online, use it in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.
- Join the Rational software forums to ask questions and participate in discussions.
- Ask and answer questions and increase your expertise when you get involved in the Rational forums, cafés, and wikis.
- Join the Rational community to share your Rational software expertise and get connected with your peers.
- Rate or review Rational software. It’s quick and easy.