DevOps is an application of lean
DevOps practices set out to better integrate IT operations and software development to improve their joint responsiveness to addressing change requests from the business. These changes might include reacting to a problem ticket or providing a new service that the business requires. DevOps has the following goals:
- Reduce the time required for a change request to become an accepted response.
- Reduce the wasted time and effort to service the change.
- Streamline the pipeline of the request from operations through development to deployment.
- Balance the capacities of the organizations to address the change.
These are the same goals for adopting lean computing principles, known as lean. Lean can be described as a mindset and a set of techniques to manage and improve end-to-end cycle time (or lead time) and to take full advantage of an organization's capacity to optimize the flow of material (the inventory) through the pipeline. The familiar phrase, often attributed to Lord Kelvin, "to measure is to know" influences lean visualization techniques such as value stream mapping, which contains product flow measures. Flow measures include end-to-end (black box) and step-by-step (white box) measures of the inventory, the times of the processes, and the times work products spend in backlogs, waiting to be processed.
More recently, the IBM point of view of DevOps extends the original scope to the following principles:
- DevOps is an enterprise capability for continuous software delivery that enables organizations to seize market opportunities and reduce time to customer feedback.
- It extends lean and agile principles across the lifecycle and across the enterprise with richer feedback cycles everywhere.
- Lean transformation across the enterprise enables more efficient delivery and continuous feedback enables more effective steering.
- DevOps adoption helps organizations balance delivery speed with trusted outcomes.
From this perspective, the following principles emerge:
- DevOps is about streamlining the extended lifecycles that include software such as:
- IT: Problem identification to successful resolution.
- Continuous engineering: From embedded software specification to successful integration.
- Product development: From initial idea to success in the market.
- Service delivery: From request from the business to customer enthusiasm.
- DevOps is the application of lean principles and practices to these disciplines
Traditionally, lean has been applied to manufacturing, back-office processes, and logistics. These processes are more predictable with less variation than those entailed by DevOps. Applying lean to software (Poppendieck and Poppendieck, 2013) and to knowledge work in general (Odegard, 2013) is not new. This article extends the existing practices, by treating DevOps as an artifact-centric business process. That perspective leads to the ready application of lean practices and measures to the DevOps processes.
When adopting lean practices and principles, it is essential to instrument the flow measures to detect bottlenecks and inefficiencies so that you can find opportunities for improvement and take action. One of the key advantages of the artifact-centric perspective is that specifying and instrumenting the flow measures is straightforward.
The following sections describe this artifact-centric perspective and why it applies to DevOps. You will learn how to apply it to specify the flow measures. The articles finishes with an overview of the two key usage models: program steering and continuous improvement.
DevOps is an artifact-centric process
You can visualize a process in two ways:
- Activity-centric: The process is described by the set of operations and their order needed to carry out the process to create the work products. They are well described by flow charts or IDEF diagrams.
- Artifact-centric: The work products and their lifecycles describe the process. The work products are treated as state machines that undergo state transitions. Taking a work product through a state transition specifies each process step.
It can be argued that all processes move work products through their lifecycles. The artifact-centric view describes the work to be done and the activity-centric view describes the steps to carry out the work. For example, when defining a software development process, the artifact-centric view describes what it means for a code module to be done; for example, the code must be dropped, built, and successfully pass a system integration test. The programmer is told to take a code module from "designed" to "done." The programmer is not told what steps to take to accomplish the transition (open an IDE, create a file, write code, hold a code review, run a unit test, document bugs, and so on). For good reasons, programmers (in fact, all knowledge workers) resent being told how to do their work. The actual steps they go through and the level of effort for any step cannot ever be predicted in advance. Knowledge work involves too much variation to accurately predict the exact steps.
Typically, lean techniques are applied to processes such as manufacturing that deal with creating identical or very similar work products. In fact, the goal of these processes is to minimize the variation of the output. Activity-centric views of such processes work fine. If you are making a kind of light bulb, every light bulb is the same, and so each process step can be specified in advance. If you are addressing problem tickets or business requests, every one of those is different, and the actual steps can vary accordingly. Sometimes, the worker might scour logs, pour over kernel dumps, struggle to reproduce the problem, add a new trace and run a new test script, or none of the above. So to apply lean to DevOps, you must account for the variation of work products (artifacts) and processes. Taking an artifact-centric view is required.
Value stream mapping for artifact-centric processes
One of the central lean transformation techniques is value stream mapping, as shown in Figure 1. You can find the use of value stream maps (VSMs) and their metrics for carrying out the transformation in the book Learning to See: Value Stream Mapping to Add Value and Eliminate MUDA (Rother, Shook, Womack, & Jones, 1999). You can use VSMs in many ways:
- Use VSMs to help everyone involved to visualize the as-is flow of work products.
- Use the flow measures to identify where time and effort are wasted and to set baselines and set goals.
- Use the in-process measures to determine if the bottleneck can be broken by improving efficiency, for example with automation or elimination of non-value added activity, or with increased capacity.
- With this information, identify a potential VSM with a more streamlined flow.
- Plan and enact the changes required to move from the current VSM to the target VSM.
- Adopt continuous improvement. Use the measures to monitor progress and make adjustments in the plan.
Figure 1. A simple value stream map
Figure 1 is an example of a simple value stream map. It shows the process steps and the time in each step. Generally, each process step is associated with a detailed description of the activities needed to carry out the process. Such an activity description for manufacturing might be the following steps:
- Take part from inventory.
- Inspect part for rough edges.
- Position part in drill press jig.
- With 15mm bit, drill holes 1 and 4.
- Change to 22 mm bit.
- Drill holes 2 and 3.
- Remove part.
- With steel wool, polish holes.
Figure 1 uses the following symbols:
- The shelf-like icon at the far left shows the inventory of the parts that flow through the process.
- The I icon is the count of items in inventory.
- The dotted arrows are pull arrows that represent the fact that Step 1 starts by the worker pulling the parts from inventory.
- The other arrows are push arrows signifying that the workers who complete one step are to push the work into the next step.
To apply value stream maps to an artifact-centric process, start by identifying the process artifacts. Each artifact has its own lifecycle, which is a set of states that it goes through from being initiated to being completed.
For example, a simple business request state model includes the following stages:
- Customer evaluated
The key idea is that for artifact-centric work, the process steps consist of the state transitions.
In particular, the process blocks in value stream maps represent the work to move the artifact from one state to the next. In this case, the process step is not associated with a description of the activities, but a specification of the criteria for accomplishing the state transition.
Taking this approach has two advantages:
- It fulfills the intent of conveying the work to be done, not how to do the work.
- It truly captures value. Presumably, value accrues when artifacts progress through their lifecycles.
With this mapping, the inventory is the number of artifacts in a given state and the times are measured by the trends of the statistics of the durations of the population of artifacts in a given state. These measures are described in a following section.
To create value stream maps with more fidelity to how DevOps teams work, a more elaborate state model is required. For value stream mapping, there are two kinds of artifact states:
- In process: Undergoing a state transition with one of more assigned resources.
- Wait: In a backlog waiting to be picked up by a team and to be assigned a resource to affect the state transition.
Consider this DevOps example. Imagine that a business analyst decides there is a need for a new web service. The analyst submits that feature to the IT organization that assigns it a priority and puts it in the backlog for development. At some point, the development manager picks up the work. Now the entire software process comes into play. In this example, this is treated as a fully encapsulated subprocess. Eventually, the updated application with the feature emerges from development and awaits deployment by IT. It is now in the IT backlog. When IT picks it up, it is once again in a wait state, this time awaiting customer feedback from the analytics team. The end state for this example is a delivered, customer-evaluated feature.
This example can have the following flow state model for a new feature for a software application:
- Submitted for prioritization (wait)
- Under consideration (in process)
- Approved for development (waiting)
- In work (in process)
- Completed (wait)
- In deployment (in process)
- Deployed (wait)
- Customer evaluation (in process)
- Customer evaluated (end state)
In this case, the value stream map might look like Figure 2.
Figure 2. A value stream map for DevOps example
Notice in this value stream map:
- The process blocks do correspond to the state transitions.
- The inventories or the backlog are the artifacts in wait states.
- The teams pull artifacts off their backlog and push them onto the next team's backlog.
- The push can and often does come from upstream. For example, a new feature that fails testing during deployment gets pushed back into the development backlog.
Note that the use of value stream maps does not in any way imply a waterfall process. However, you can use value stream maps to show the inherent inefficiencies of various processes. For example, in a classical waterfall process, the artifacts are stacked up in a set of backlogs waiting for the next phases. For example, all of the code is collected waiting system test. Value streams are used to understand the flow of work through whatever process the team is using.
As shown in Figure 3, two sorts of measures can be found in the value map:
- Stream measures: Address how the artifacts flow through the state transitions. These include the end-to-end (lead) times, the times for each of the states (the lines at the bottom) in the process, and the counts of artifacts in each of the states (the symbols) with the letter I.
- In process measures: These address how efficient the team is at carrying out the transitions. These measures detect if the bottlenecks are caused by lack of capacity or inefficient processes. Yellow boxes in the process blocks provide measures of process inefficiency.
Figure 3. Measures on the value map
The stream and process measures are used in tandem. The stream measures are used to identify what processes need attention. The process block measures identify how one might improve the processes. For example, if you discover a bottleneck at a process step, consider whether there is adequate team capacity. Explore the possibility of addressing the bottleneck by removing wasted effort either through process streamlining, automation, or both.
Lean organizations focus on minimizing inventory, explicitly managing backlogs, and matching batch sizes to team capacity. The benefits of these practices are described in the book The Principles of Product Development Flow: Second Generation Lean Product Development (Reinertsen, 2012). These practices are well established in manufacturing, back-office systems of record, or logistics. The stream measures provide the information needed to adopt the practices to achieve the lean goals.
There are two kinds of stream measures:
- Volume: The trend of the number of items in various states. These measures reveal bottlenecks in the flow.
- Time: The trend of the statistics of the duration of time the artifacts spend in the various flow states or combination of flow states.
The volume measures are relatively straightforward. They are the trends of the count of items in each of the wait and in-progress states. Figure 4 is an example of a graph.
Figure 4. Trend of number of work items in a given state
The time measures are more interesting. Each artifact is likely to spend a different duration in each state or combination of states. This variation is intrinsic to software; it's not the result of lack of control of the process. Unlike manufacturing, where it is expected that every work product is the same and variation is an indication of an area that needs to be controlled, work products for software are different, and the variation of times and effort are to be expected.
For this reason, one of the key elements of applying the flow principles to software development is that there is an ongoing variation in the work product arrivals in the queues. There is also a variation in the level of effort to enact the state transitions.
Because you are dealing with a measure of how long artifacts from a particular population have spent in a given state, the populations of times need to be measured statistically. In manufacturing, because work products are similar, the standard choice of statistic is the mean time. Teams might track the average age items in the backlog or items in the work-in-progress state.
In manufacturing, the artifacts are all the same; therefore, distribution of times is often assumed to be close to a common value with narrow variation. More precisely, the distribution of times is modeled to a normal (also known as Gaussian) distribution with narrow standard deviation. In this case. the mean of the distribution works well. In a manufacturing workflow, the products are close to identical. However, when creating software, every application is different from the others. For software, the distribution of times is typically not normal or even narrow. Our research at IBM finds the distribution looks more like Figure 5.
Figure 5. Typical cycle time statistics. The arrow points to the T80 point.
Although this distribution might seem surprising, it's reasonable to expect a peak near the left axis. Most of the items get addressed in a short amount of time, but some stay in a state for a long time. A good choice of measures is the 80 percent point: T80. That is, 80% of the artifacts in the current state spent time less than or equal T80. The others spent longer and are in the tail of the distribution. By using the T80 point, you can measure the trend over time of the time artifacts spent in backlog or in transition.
Consider the more subtle issues at play. Because each work product is different, the time required to address them varies. The work items get pushed to the next team's backlog sporadically. The time between arrivals of work products varies. A good mathematical model for the arrivals is a Poisson Distribution. This situation is different from a manufacturing assembly line that has a steady flow of products between stations. In the case of software, with a fixed capacity at a given station, you can expect the size of the backlog to show variation. The variation promulgates through the lifecycle and results in the histogram in Figure 5.
To capture the trend of the time statistics, a good choice is the history of the T80 point, as shown in Figure 6.
Figure 6. Time trend chart
A significant property of software development is that work products often move upstream in the product flow. For example, a code module might fail a particular level of testing and be pushed back into the backlog of the development transition process. This action does not affect how to measure the volumes or the end-to-end times. The end-to-end times include the aggregate of the times the artifacts spend in the states.
However, the question remains how to treat the duration statistics for the individual states. Do you assign to the artifact the time in state, the duration since it recently re-entered the state, or the total time it has been in that state over its lifecycle? The answer is both are important: each is used for a different purpose. The first is used for ongoing process tuning, and the second measures rework and is used for overall process improvement.
Process block measures
Lean practitioners measure flow between the process blocks and they measure the efficiency and capacity of the team.
The following measurements are appropriate for artifact-centric work:
- Availability of personnel (AOP): Percentage of time the assigned employees actually work on the state transitions
- Availability of equipment (AOE): Percentage of time the assigned equipment (for example, the servers) are devoted to state-transition-related processing
- Work content time (WCT): Percentage of effort spent actually doing the state transition
- Non-value added time (NVA): Percentage of effort doing other things that do not advance the work products
The first two give a view of the efficiency of the team addressing the state transitions. The second two give a view of the capacity of the team.
Information architecture and drill down
In software and software development, the status of artifacts often depends on sets of subordinate artifacts. For example, a request from the business is analyzed by the system architects who are generating a set of plan items that might be assigned to one or more development teams. These plan items are further elaborated into user stories that result in code modules. The code modules are dropped into builds for system testing and release. To determine how closely the delivered code matches the original business request, you need to know the status, the state of the child artifacts (user stories, code modules, and test cases), and their integration into applications and releases. Similarly, a bottleneck in delivering the business request or one of the plan items can be caused by a bottleneck in the flow of the child artifacts. For example, the state of delivering a new feature depends on the states of the code modules. You might say a feature is in development only when all of the identified modules have been identified and are in development. The feature is not complete unless all of the modules are coded and integrated into a change set.
For that reason, the queries to support the stream measures of the artifact require linkages that capture the artifacts' information architecture. The process blocks annotation includes references to the parent and child artifacts.
Using the measures
In business, measures are used in sense and respond loops to steer an organization toward achieving a goal. These measures can be used to support both aspects of lean computing:
- Lean transformation
- Lean project or program monitoring
To support these purposes, you need to understand the subtle differences between them so that you know what to measure.
Lean transformation consists of incrementally applying a set of operational practices, such as the following actions:
- Explicitly managing queues
- Managing batch size
- Balancing capacity against demand
- Reducing non-value added effort
To apply the principles successfully, an organization's culture must shift. Addressing the cultural issues is central to applying lean practices (Poppendieck and Poppendieck, 2013). As mentioned previously, value stream maps and their metrics are a common tool for carrying out the transformation.
The measures provide the truth so that all can see the opportunities for improvements, such as bottlenecks in the process, points at which time is wasted, and areas in which there might be excess capacity. With that information, the team can determine what steps to take. In fact, the measures can be used to empower the team.
In this case, you are interested in the trend of how the organization has performed over a period of recent history for a targeted set of transitions. Essentially, the question is "How long does it take for artifact X to go from state A through state B?" This concept is illustrated in Figure 7. The measures are used to get a vertical view of the organization. For example, if there is more than one team managing the same state transition for a kind of artifact, consider using an aggregate measure that includes the measures for both teams' backlogs and transition performance to get an overall view of the organization and to compare teams.
Figure 7. Stream measures for lean adoption
With these sorts of flow measures, you can incrementally adopt lean, focusing on where the need is greatest.
In the second case, imagine the lean transformation is already underway and the measurement framework has been put in place. Because of the variation intrinsic in software, the flow needs constant monitoring so that bottlenecks can be addressed. By monitoring the size and time statistics of the flow, the manager and the team collectively can make small tuning changes, such as cutting off submissions, or temporarily shifting staff roles such as moving developers to test. In this case, for each of the state pairs, you need to see the numbers and age of the artifacts by day, as shown in Figure 8.
Figure 8. Stream measures for monitoring
Another use of the flow measures is to understand the state of completion of the effort. The best way to determine the completeness of a program is to assess the state of the artifacts. Standard burn down charts are important; they give a view of which artifacts are complete and which are in progress. However, they lack the detailed status of the artifacts that are working their way through their lifecycle. These measures can be used to see which artifacts are taking the most time. A key feature of these reports is the ability to drill down on those items that are taking too long in a given state, to identify the progress of the subordinate artifacts using the information architecture.
It is said that for implementing lean, you focus on the work, not the worker. What better way to focus on the work than to focus on the work products and their state transitions? Doing so enables you to apply lean not only to highly repetitive processes working on a consistent set of work products, but also to processes that deal with significant variability.
Instrumenting the state transitions provides the basis for defining the measures to achieve the following objectives:
- Planning and tracking the results from a lean transformation to meet DevOps goals and to measure the results from adopting DevOps practices
- Monitoring and steering DevOps organizations
I would like to thank Frode Odegard of the Lean Systems Institute and Cindy Vanepps and Walker Royce from IBM for ongoing conversations that helped inform this article.
- Explore The Lean System Framework: Extending Lean for Knowledge Work.
- Read The Lean Mindset: Ask the Right Questions.
- Learn about The Principles of Product Development Flow: Second Generation Lean Product Development.
- Gather details about value streams maps in the book Learning to See: Value Stream Mapping to Add Value and Eliminate MUDA.
- Check out the other DevOps best practices articles on developerWorks.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefingto get up-to-speed quickly on IBM products and tools, as well as IT industry trends.
- Watch developerWorks on-demand demos, ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.
Get products and technologies
- Evaluate IBM software in the way that suits you best: Download it for a trial, try it online, use it in a cloud environment.