Automation in IT is, especially in the age of Cloud and complex highly distributed computing systems, an absolute no-brainer. Whenever I think about the speed required (and pushed) by most businesses and the ways to make IT organizations deliver consistent and predictable high-quality services, automation comes up as one of the corner-stones to enable a sustainable execution model. Therefore, to automate, or not to automate, it’s not a question we should ask ourselves. We rather ask what to automate and how to enable effective and reproducible automation at scale.
On the Pro side of things, automation brings a number of advantages and, although this following list is far from being exhaustive, in the context of IT, important items are :
- Automation can eliminate the need to hire new administrative employees at significant cost savings.
- Automation eliminates human error and fosters standardization, which leads to better overall quality at scale.
- Automation enables faster action and faster repair, which is mandatory as we know how systems can quickly degrade or become unusable due to secondary cascading failures if immediate action is not taken upon first failure data capture.
- Automation can also remove most tedious busywork tasks, freeing up employees to do tasks that humans would do better than computers.
On the Cons side of things, automation should not be seen as a panacea. The expression “Automate Yourself Out of the Job: Automate ALL the things” (Site Reliability Engineering - How Google Runs Production Systems) is a great moto to drive people to look for opportunities to improve, in real-life complex organizations, some concerns should be taken in careful consideration, they are:
- Automation implementation is usually a technical endeavor, which requires people with skills and specialized tools. Before automating we better ask if the task is worth automating and how much value will be captured with it.
- Automation is change, from both human behaviors as well as from an IT operations perspective. Change creates entropy, therefore change needs to be managed before chaos takes over.
- Automation can potentially hide systemic deficiencies or even worse, it can exponentiate bad processes, which can result in a catastrophe if deployed on a large scale.
- Automation decouples the operation from the operator. If managed with great care, it can preserve enterprise knowledge over time as people either retire or leave the company. If not, organizations might become so dependent on tools and automation, that they may loose control of their environment and also loose the ability to rethink their own processes.
All things said and knowing that automation of not a matter of IF, but a matter of WHAT and HOW, where should I start? What approaches would allow enterprises to place their bets to enable sustainable and fruitful automation?
Although this should be intuitive enough, the temptation of finding low hanging fruits, and start automating as many tasks and processes as one could possible identify, might create a bigger problem down the road. I’m absolutely not saying enterprises should not experiment and “get their feet wet” before they can establish a structured program to automate at scale, in fact, I really think they should do that to understand the challenges this endeavor entails on a large scale. When it’s time to really take things seriously, I believe the first and possibly the most important step is to understand the existing processes and create the “Treasure Map for Automation”.
TREASURE MAP FOR AUTOMATION
Figure 1 - Simple Value Steam Map
A valid approach to understand and document current processes is through Value Stream Mapping. This approach lends itself to understand "the current state and designing a future state for the series of events that take a product or service from its beginning through to the customer with reduced lean wastes as compared to the current map.” (Value Stream Mapping, Wikipedia). The purpose of the value stream mapping is to identify and remove the wastes in the processes, increasing efficiency, and productivity. Upon the evaluation of the value stream, the following types of wastes can be analyzed:
Types of waste (from https://en.wikipedia.org/wiki/Value_stream_mapping)
Daniel T. Jones (1995) identifies seven commonly accepted types of waste. These terms are updated from the Toyota production system (TPS)'s original nomenclature:
- Faster-than-necessary pace: creating too much of a good or service that damages production flow, quality, and productivity. Previously referred to as overproduction, and leads to storage and lead time waste.
- Waiting: any time goods are not being transported or worked on.
- Conveyance: the process by which goods are moved around. Previously referred to as transport, and includes double-handling and excessive movement.
- Processing: an overly complex solution for a simple procedure. Previously referred to as inappropriate processing, and includes unsafe production. This typically leads to poor layout and communication, and unnecessary motion.
- Excess Stock: an overabundance of inventory which results in greater lead times, increased difficulty identifying problems, and significant storage costs. Previously referred to as unnecessary inventory.
- Unnecessary motion: ergonomic waste that requires employees to use excess energy such as picking up objects, bending, or stretching. Previously referred to as unnecessary movements, and usually avoidable.
- Correction of mistakes: any cost associated with defects or the resources required to correct them.
For each and each one of them, at a first glance, there seems to be room to leverage automation to take wastes out the picture. But try not to be so optimistic and start automating everything, for each step of your value steam mapped processes, with special attention to the hand-offs (where most of the wastes reside), perform the traditional 5-Whys technique to iterate over them and determine if the step could be completely ripped off the process (utmost automation) or if the processes should be partially or totally automated.
In my humble opinion and experience, when it comes to IT, most common wastes are associated with Waiting, Processing, and Correction of Mistakes. Benefits go way above and beyond the savings in labor hours. Delivery excellence through standardized and “always available” products and services also brings tangible value to the business and needs to be tracked with new indicators, like Net Promoting Score (NPS). What do you think?
With your treasure map at hand, organizations would have all the information available to determine WHAT to automate. HOW to actually execute it, it’s something for the next article. Any suggestions?
What to lean about Value Steam Map ? Identify bottlenecks with value-stream mapping
Keywords: #automation #sre #devops #ioperationsmanagement #itoperations #cloud