Examining IT as a process
How to automate IT operations in the enterprise
What irritates both developers and operations teams the most in almost any software development project is rework.
Rework is the bane of our existence. You start the installation of a software package and come back 30 minutes later to find that you forgot to install a prerequisite and have to restart the entire process. Or, you install a WAR file into an application server and then realize when you’re 90% through your testing cycle that you installed the previous version of the WAR and not the current version that you’re supposed to be testing.
In the book The Phoenix Project by Gene Kim, the discovery that this kind of thing (called "unplanned work") is the "kryptonite" for the super-IT worker is a central plot point. The book even describes what the solution to this problem is – and it turns out that there are actually two parts to the solution: standardization and automation.
But there was a different and possibly even more important message that the book also conveyed which sometimes gets lost in the static – a message that the particular method described in the book as its solution obscures. One of the book’s central themes is that there is a similarity between the process of how work runs through a factory and the process of how tasks run through an IT organization. The solution that the team uses is one that many factories have adopted: a kanban board. This approach has been adopted by many IT teams, and is a reasonable and useful approach for managing work movement.
However, if you take a step back from that particular solution, you realize that there’s a separate, more basic message that the book is also trying to get across, and that is that IT is a describable, manageable process for getting work done. And once the team makes that realization (near the end of the book) they then take a critical part of their process and automate it, leading to success and promotions all around.
In a sense, this is a case of the shoemaker’s children having no shoes. For years, IBM has been telling companies how to identify, improve, and automate their key business processes. One of our IBM software product families, the Smarter Process family, is dedicated to just that approach. But until very recently, no one has really looked at IT processes as being just another business process that needs to be identified, improved, and automated. That perception is rapidly changing.
Types of automation
What we’ve found is that if you look at the process of implementing systems automations as a set of capabilities – capabilities that can be automated, thus making your operations procedures more reliable and repeatable – then you have a more concrete understanding of how various products and technologies can be positioned. In other words, think less about how a product does what it does, and more about what problem it is intended to solve. Figure 1 shows what those various problems are. We will look at each of them in turn.
Figure 1. Types of automation
Looking at things through this lens, the first capability we need to explain is the notion of release automation.
The problem that this set of automation tools is intended to solve is simply, how do I push my code out to my servers in the most error-free and reliable way possible? Like so many problems, this is actually a slightly more complicated problem than you might think. When you think about a release of a particular application, you immediately begin learning that the definition of an application is a pretty wide one. A single application as defined by a line of business might include multiple different components, deployed on many different servers. So, if you take a simple example of an insurance claim processing application, which may at its heart consist of a set of Java™ EAR files deployed onto a set of Java application servers, that’s just the start of what it means to deploy an application. The next issue is that the Java code in those EAR files relies on a set of settings and configurations that are made to the applications servers. This could include installing JDBC drivers, and creating data sources and JMS objects. In addition to the code, then, you have some set of configuration options that you have to track as well.
The final piece of this puzzle is that both the code AND the configuration are dependent upon the particular environment to which you are deploying your application. If you were deploying your insurance claim processing application into a system test environment, you’d want your data sources to point to your test databases and not your production databases. What’s more, code is promoted from environment to environment, so the version of your EAR files that are being tested in system test are most certainly not the same as those that are currently running in production.
That’s the role of release automation: to tie seamlessly into your build process (whether it’s continuous build or a more traditional build process) and then enable deployment of particular sets of code and configuration into particular environments.
But where do those environments come from? That’s the role of the second capability, workload automation.
The process of building out environments is often one of the most time-consuming and error-prone processes that an IT organization can take on. Building an even moderately complex environment like the one described here (Figure 2) can consist of hundreds of independent steps to be carried out on several different physical or virtual machines. An error or missed step anywhere in that process can result in difficult-to-debug problems, or in subtle errors that aren’t caught until months later. What is needed is a way of defining an environment and then automating the series of steps necessary to build that environment so that it can be done not just once, but many times, in exactly the same way (which, by the way, was the critical piece of automation that was built in The Phoenix Project).
Figure 2. Deployment architecture
If you can define a way to automate your environment build process, then you’ve gained repeatability as a primary benefit. However, that’s not the only benefit you can gain. A common problem that we see in many enterprises is that each application considers its environments to be different from those of any other application. In fact, we’ve found that the differences between application environments are generally minor, and can be defined with different configuration values through release automation tools. Once you make that conceptual leap, then you start to realize that it’s possible to standardize on a common, usually quite small set of environment definitions, all of which can be automated with reusable code. That makes it easier to, for example, introduce new versions of vendor code into your system – instead of updating a hundred individual servers, none of which are alike, you instead only have to change a single environment definition and then re-generate the new environments from the new definition.
But what exactly is the definition of a workload that we’re describing? Well, a good boundary to put around this is a workload is the set of local resources that you need to run your application. Now, often that’s defined by organizational boundaries more so than technical or technological issues. But when you’re looking at building applications for cloud computing, this becomes a central issue because you must carefully call out those things that can (and should) be deployed onto the cloud, and those services that the application will depend upon that might lie outside the definition of your application, and, in fact, might lie entirely outside of your cloud environment.
Figure 3. System boundaries
So if an environment (like a System Test environment for an Insurance Claim Processing application) is something that you create inside of a cloud (in other words, it’s a type of cloud system), then there are other systems that lie outside the cloud. In any typical enterprise, there are a number of systems that match this definition. For example, you might have existing legacy systems running on “Big Iron” that your new systems might need to query or update. You might have corporate databases that are housed in a corporate database farm that lies outside of the boundary of your cloud. Or, you might have third-party applications that are installed on their own hardware and managed by a vendor and are thus also outside the cloud.
Another class of things that are outside of the cloud systems are those systems that you might not think of because they’re not part of your business applications, but instead are part of the “plumbing” that is necessary to keep your business applications functioning. Simple examples of these are things like firewalls and routers; part of your networking infrastructure that you have to configure in order for traffic to flow to and through your business applications. But there are also things like source code control systems, problem ticketing and service desk systems, and the like to consider.
Because these systems exist, it is inevitable that your applications running in your cloud environments will have to use the services provided by those systems. Since this is true, this means that there has to be setup and configuration done to those systems, or information will need to flow from those systems into the systems that run in the cloud. That leads to the realization that there’s a third type of automation needed – I call this IT process automation.
A key aspect of IT process automation is that it’s almost always a mix of automation against systems that have predefined APIs and human tasks. That’s because the process of building a useful system that can meet a business need happens within the context of the rest of your IT systems and processes. So while you can automate the deployment of an instance of a standardized workload, and you can even automate the process of deploying application code into that standardized workload, it's still usually a human who needs to approve or deny whether or not that deployment should even take place. Likewise, even if you could automate the process of deploying an operating system patch into a running system, a set of humans needs to decide if and when that patch needs to be deployed. In projects we have evaluated that have all three types of automation, we’ve found that the amount of coding necessary to complete the IT process automation is often half or more of the entire project timeline – simply because it gets complicated whenever humans get involved.
There’s one more issue we need to address: How do we manage the ongoing maintenance and management of systems once they have been created? The cloud world has a strange split personality where this is concerned. In many online best practices, we hear the need repeated over and over again to treat systems as "cattle and not pets." The upshot of that dictum is that you are supposed to completely dispose of any VMs rather than making live changes to them. This is sometimes called making your VMs immutable.
That’s a great goal. But the reality of the situation in many enterprises is that while this might be the best approach for born-on-the-cloud systems built using a fully-fledged PaaS like IBM Bluemix™ and designed with a microservices architecture, that for systems that are built with more traditional middleware solutions and approaches (for example, cloud-ready systems instead of cloud-centric systems), this is a goal to strive for rather than to achieve in most cases.
In fact, most complex multi-VM cloud-ready systems represented as HEAT templates, IBM PureApplication® patterns, or complex sets of Chef and Puppet scripts may take many minutes, or even more than an hour to provision and start. What’s more, even though the benefits of the provision-and-replace red/black approach are clear, many enterprises, in fact, are loath to replace entire mission-critical systems on schedules of less than a week at a time (allowing for planned outages only on weekends). So due to the exigencies of the situations we find ourselves in, we still find the need to patch systems occasionally, or even nightly, and to maintain VMs for a time after they are created.
So that results in the last set of capabilities that we need – the processes that surround the management and maintenance of systems while they are running – we call that set of capabilities operations automation.
Mapping against IBM’s offerings
By defining each of these different capabilities, I’ve hopefully shown how each of the different types of automation is needed, and how each of them serves a different purpose. That is the key to understanding some of the different IBM product offerings, and more importantly, understanding how they interrelate to each other.
Figure 4. Unlabeled
Mapping release automation
Let’s start again, at the bottom of the stack and consider the services that these products and technologies provide to fulfill each of these capabilities.
When it comes to implementing the notion of release automation, the primary products that IBM provides in this space is IBM UrbanCode Deploy. UrbanCode Deploy enables you to define applications as specific set of configuration and code that can be applied or installed into different environments. So let’s consider our previous deployment architecture from Figure 2. That configuration might need to have a WAR file and one or more JAR files deployed onto the application server; perhaps that consists of the application WAR file and a number of JAR files for services like Spring and the Apache Commons utilities. Likewise, the Java batch processing program also has its own JAR files that have to be installed to enable that processing. But that’s not all. The JMS queue needs to be defined – both inside the two Java programs as a JMS object, and perhaps also inside the queuing software such as IBM WebSphere® MQ or RabbitMQ. Then finally there’s the database: that database comes with a schema that may need to be defined on the database, while the batch program needs to have a set of JDBC resources defined for it that would enable access to the database.
In UrbanCode Deploy, you can define all of these different bits of code and configuration, and tell to which parts of the environment they would apply. What’s more, not only can you define it as applying to one environment, but in the tool you can define multiple environments – and show how some or all of these configurations can have different values in each environment.
For example, let’s assume that the hostnames of the databases differ in each environment, which is not an unusual assumption. The specific parameter for the JDBC URL might then have the values shown in Table 1.
Table 1. JDBC URL for different environments
This is a trivial example, but when you extend this concept to the dozens of parameters that make up a real application of the type described above, you can see the utility of having these parameters managed and tracked so that you’re absolutely certain what value is applied in which environment.
What’s more, not only do the values change for each environment just due to environmental differences, but the code versions also change as part of the process of code promotion. So in our simple three-environment example, we might see the following versions of our Java code files for our web application deployed into each of the environments.
Table 2. Name of table
|Environment||JAR files||Version tags|
|DEV||claims-processing.jar, claims-batc.jar||CLAIMS-LATEST-FIXES-2.5, 2.5|
|QA||Claims-processing.jar, claims-batch.jar||2.5, 2.5|
|PROD||Claims-processing.jar, claims-batch.jar||2.4, 2.4|
In this example, you can see how while the set of properties and files remains the same, the actual values for those properties and files will differ from environment to environment based on the needs of the development team in each stage.
The remaining piece of automation that is important in this layer is the particular continuous integration automation that will initially trigger the deployment (or redeployment) of each of these bits of code or configuration into each environment. In a continuous integration process, as code is released into a source code repository, it is automatically built into the appropriate JAR and WAR files; you would then need to tie that continuous integration tooling (be it Hudson, Jenkins, Maven, or IBM Rational Team Concert™) into UrbanCode Deploy to make the build and deployment process completely automated, and tied into a continuous testing process. Luckily, UrbanCode Deploy smoothly integrates with all of those tools, making the setting up of a DevOps pipeline like this extremely easy.
Mapping workload automation
Moving up the stack, the next set of capabilities to address is in the notion of workload automation. There are a number of well-established technologies that occupy this space. In the open source world, tools like Chef, Puppet, and Salt each have large user communities. Each of these scripting tools enables you to define the definition of how to build a server in a domain-specific language starting from a bare OS installation and then extending and patching the operating system, installing middleware such as application servers and databases, and then configuring all the pieces of a solution together by opening firewall ports, and so on. There are strong and vibrant communities around both Chef and Puppet, making it possible to find example installation scripts for most types of software (although the configuration changes needed to make them work in an enterprise environment with high QoS attributes could be extensive).
The OpenStack HEAT engine took a slightly different tack from the DSL approach. In that project of OpenStack, a templating engine (HEAT) consumes files defined in a templating language called HOT. The template files define the same process of building a system up from a base operating system image up through installing extensions, middleware, and so on. It also defines how the different VMs in a multi-VM configuration connect together. IBM has a visual web tool for defining HOT templates that is part of the IBM UrbanCode Deploy with Patterns product. The community has provided dozens of HOT templates for common open source and vendor products that can be adapted and edited to meet a project’s specific needs.
Finally, the oldest and most mature of these technologies is the IBM Patterns engine that is used in IBM PureAppication System, PureApplication Services (on Softlayer®) and PureApplication Software. It also includes a visual tool for defining assemblies of multiple VMs (called software patterns) and defining the installation of operating system, middleware, and other software, such as agents for monitoring, and so on. Pre-built PureApplication patterns are available for most IBM middleware products and will often form the fastest, easiest route to productivity for teams building systems using IBM middleware.
So which one do you choose for a specific project? It is often a matter of personal preference or availability of example or "starter" templates for your particular middleware needs. As mentioned above, for the IBM middleware products, the head start provided by the PureApplication Patterns is often a deciding factor. For open source products, the assets provided by the communities around Chef and SALT often make those more attractive options. We can hope that, over time, the industry will converge on a de-facto or accepted standard format (perhaps like HEAT, which is where IBM is putting its standardization effort) and that the standard will enable both interoperability with and automated translation to other options.
Mapping IT process automation
We now come to the top of our stack and need to look at the mappings for the capabilities we defined in that layer. Here you need a tool that allows for the definition of both automated and human tasks to manage the activities in the layers we’ve defined below, while at the same time integrating into existing processes in the IT organization.
As mentioned earlier, this is exactly what the business process management (BPM) tools in the IBM Smarter Process product family do. This is the solution that IBM took when building the IBM Cloud Orchestrator product. IBM Cloud Orchestrator includes IBM BPM tooling as an integral part of the product. What’s more, the BPM tools have been extended with a rich set of user-interface coaches for common situations and integrations with common tools, such as the PureApplication System Patterns engine, routers from vendors like Cisco and F5, and dozens of other common integration points.
It’s important to define the difference between IT process automation and the workload automation that occurs in tools like Chef, HEAT/HOT, and the PureApplication Patterns Engine. The easiest way to think of the difference is as a difference between micro-level automation and macro-level automation. In a micro-level automation like a HOT template, you make some assumptions about the cloud environment that you are working within, and also about the types of integration points that you are making outside the cloud. In the HEAT engine, you’re assuming that you can request compute services from Nova, networking services from Neutron, and object storage from Swift. However, there is an endless number of potential services and connections that you might require to make your system useful that aren’t covered by OpenStack services. While those OpenStack projects acquire new features almost every day, the list of potential new requirements is endless. You also have to deal with the fact that clouds (public and private) have boundaries and interfaces – until the day when this is true and we have a single, seamless, and common set of APIs that enables you to deal with all systems regardless of type, geography, position or restriction, you’ll always need the ability to reach outside your cloud and enable or configure things to make your cloud systems useful. That’s why we fundamentally need systems-level integration from an orchestrator. and that’s why we need both types: micro-level automation sets up the workloads within the boundaries of your cloud, while macro-level automation interfaces those workloads with external systems, and also with the people that are also outside the boundaries of the cloud.
Operations automation - the final piece of the puzzle
As mentioned in the introduction, since not all enterprise systems can reach the PaaS-level goal of being able to treat all VMs like cattle and replace them at will due to restrictions in the design of the applications and middleware that run in those VMs, we still need to manage and maintain systems for some periods of time while they are running.
That’s the final missing link. We also need to automate the often manual tasks that go with system maintenance in traditional systems. This includes things like upgrading middleware environments with emergency fixes, restarting application servers in sequence when you install new application versions, and any of the myriad of other tasks, such as programming security proxies with new URLs.
Often these tasks are already partially automated; most good administrators will write little scripts for this kind of work, resulting in an administrator possibly having dozens of scripts for different situations.
Operations automation comes from taking this approach to its logical conclusion and system-atizing this practice. It complements the other automation approaches we’ve examined, but doesn’t replace them, just as they cannot completely replace it in a traditional middleware environment.
PureApplication System stands out as being an example of a system that was designed from the start to automate many of these types of tasks. It includes built-in automation for middleware emergency patch installation and a host of other similar tasks. Just as we saw that the key to integrating workload automation into a larger set of IT processes was in tying patterns into larger-scale automated BPM processes, the same holds true for operations processes. Automating them through BPM flows is just as useful and important for closing the circle.
This article examined several different types of IT process automation and showed how that you really need all of them in order to achieve the best results in an enterprise IT environment.