Cattle not pets: Achieving lightweight integration with IBM Integration Bus
In a time when servers took weeks to provision and minutes to start, it was fashionable to boast about how long you could keep your servers running without failure. Hardware was expensive, and the more applications you could pack onto a server, the lower your running costs were. High availability (HA) was handled by using pairs of servers, and scaling was vertical by adding more cores to a machine. Each server was unique, precious, and treated, well, like a pet.
Times have changed. Hardware is virtualized. Also, with container technologies, such as Docker, you can reduce the surrounding operating system to a minimum so that you can start an isolated process in seconds at most. In this new, typically cloud-based infrastructure, scaling can be horizontal, adding and removing servers or containers at will, and paying for only what you use. With that freedom, you can now deploy thin slithers of application logic in minimalist runtimes into lightweight independent containers. Running significantly more than just a pair of containers is common and limits the effects of one container going down. By using container orchestration frameworks, such as Kubernetes, you can introduce or dispose of containers rapidly to scale workloads up and down. These containers are treated more like a herd of cattle.
The analogy of "cattle not pets" is now well established in the application space, especially in relation to the microservice application architecture. However, as explained the "Lightweight integration: Using microservices principles in integration" blog post, it makes just as much sense to also consider this approach in the integration space.
This article focuses on the best practices for deploying IBM® Integration Bus by using a "cattle" approach in containers. But first, you need to better understand the potential challenges that deployment as a "pet" poses.
Integration pets: The traditional approach
Let's examine what pets currently look like and whether anything looks familiar from a traditional integration viewpoint. In the analogy, if you view a server (or a pair of servers that attempt to appear as a single unit) as indispensable, it is a pet. In the context of integration, this concept is similar to the centralized integration topologies that the traditional approach has used to solve enterprise application integration (EAI) and service-oriented architecture (SOA) use cases. The analogy lists additional characteristics that also map directly to centralized integration topologies as shown in Table 1.
Table 1. Characteristics of pets
|General characteristics of pets||How they are applied to a centralized or traditional integration context|
|Manually built||Integration hubs are often built only once in the initial infrastructure stage. Scripts help with consistency across environments but are mostly run manually.|
|Managed||The hub and its components are directly and individually monitored during operation with a role-based access control to allow administrative access to different groups of users.|
|Hand fed||The hub is nurtured over time, for example, by introducing new integration applications. As part of this process, new options and parameters are applied, changing the overall configuration of the hub. Gradually, the running instance becomes more bespoke with each change.|
|Server pairs||Typically pairs of nodes provide HA. Great care is taken to keep these pairs up and running and to back up the evolving configuration. Scalability is coarse-grained and achieved by creating more pairs or adding resources so that existing pairs can support more workloads.|
While this article focuses on moving to a more cattle-like approach, the more traditional pet-like approach also has benefits that might be more challenging to achieve with cattle. For a quick comparison, see Figure 1, which shows some of the characteristics that vary between cattle and pets.
Integration scenarios vary in the characteristics that they need. With modern approaches to more lightweight runtimes and containers, you have the opportunity to stand up each integration in the way that is most suited to it. You do not need to assume that, just because a cattle approach suits many integrations, it will suit all of them. You can use both approaches and even add hybrid options as required.
Integration cattle: An alternative lightweight approach
A strong desire exists to move away from centralized deployment of integration hub or enterprise services bus (ESB) patterns where all integrations are deployed to a heavily nurtured (HA) pair of integration servers. The aim is to move toward more modern and lightweight topologies and draw on the benefits of a microservices architecture:
- Agility: Different teams can work on integrations independently without deferring to a centralized group or infrastructure that can quickly become a bottleneck. Individual integration flows can be changed, rebuilt, and deployed independently of other flows, enabling safer application of changes and maximizing speed to production.
- Scalability: Individual flows can be scaled on their own, allowing you to take advantage of efficient elastic scaling of cloud infrastructures.
- Resilience: Isolated integration flows that are deployed in separate containers cannot affect one another by stealing shared resources, such as memory, connections, or CPU.
Simplistically, as shown in Figure 2, this shift means breaking up the more centralized ESB runtime into multiple separate and highly decoupled run times.
However, you need to recognize that the change involves more than just breaking out the integrations into containers. A more cattle-like approach must exhibit many, if not all, of the characteristics in Table 2.
Table 2. Characteristics of cattle
|Characteristics of cattle||How they are applied to a lightweight integration context|
|Individual policy-based management||Resilience and scalability are managed at the integration level, not at the infrastructure level.|
|Elastic scalability||Integrations are scaled horizontally and allocated on-demand in a cloud-like infrastructure.|
|Disposable and recreatable||Using lightweight container technology encourages changes to be made by redeploying amended images rather than by nurturing a running server.|
|Starts and stops in seconds||Integrations are run and deployed as more fine-grained entities and, therefore, take less time to start.|
|Minimal interdependencies||Unrelated integrations are not grouped. Functional and operational characteristics create colocation and grouping.|
|Infrastructure as code||Resources and code are declared and deployed together.|
The "12-factor integration" blog post in the IBM Integration Developer Center explains how you can use IBM Integration Bus to implement integration requirements by following the principles of 12-factor applications. It also explains how you can achieve many of the characteristics in Table 2.
To gain the benefits of this more fine-grained approach, you must consider how best to enable integration runtimes to be more lightweight and independently packaged. Thanks to the progressive enhancements in IBM Integration Bus to remove dependencies and simplify installation, IBM Integration Bus is now well suited to deployment in a containerized manner. It support such technologies as Docker, which has become an industry standard, and a container orchestration tool, such as Kubernetes.
The question then remains: how granular should the decomposition of the integration flows be? Although you can separate each integration into a separate container, do not assume that you can take such a purist approach. The goal is to ensure that unrelated integrations are not housed together. That is, a middle ground with containers that group related integrations together (as shown in Figure 3) can be sufficient to gain many of the benefits that were described previously. You target the integrations that need the most independence and break them out on their own. Alternatively, keep together flows that, for example, share a common data model for cross-compatibility. Changes to one integration can result in changes to all integrations, and any change to the shared data model might require regression testing across the complete set.
However, you must also make the other changes that are listed in Table 2 for the characteristics of cattle. Just breaking up a large pet into smaller pets does not in introduce any benefit. You must also ensure, for example, that you make them disposable and treat the infrastructure as code.
The contents inside a container
A Docker container is what you see at run time. Something running in a container acts as though it is running in its own operating system, but it is much more lightweight. In reality, many containers can run side by side on one operating system, each in their own namespace. Consider the smallest amount of software that you can have in a container to run an integration.
Figure 4 shows a summary of a 12-factor integration that is running a Docker container with IBM Integration Bus. The container is based on the core Ubuntu operating system and contains the IBM Integration Bus binary files. The code realizes the integration functions.
On the surface, this container might not look different from what you might expect to see within a traditional server. However, it has one fundamental difference: the container houses few integrations, perhaps only one. In IBM Integration Bus terms, you might have only a single message flow or a suite of message flows that all work together to perform one integration. For example, they might implement a single API or handle the processing of a certain type of message. This integration is isolated and therefore, has its own dedicated set of IBM Integration Bus binary files and an operating system for significant flexibility. For example, you can choose to use the latest version of IBM Integration Bus just for one integration to take advantage of new features. You don't need to regression test other integrations that are running on older versions as you might have to if they all ran on the same centralized server.
The illustration in Figure 4 also shows configuration for each of the previously mentioned components. Let's look at some examples to better understand this type of configuration:
- OS level configuration: This configuration can be as simple as applying kernel settings or creating the user to later run the processes inside the container. Good practice suggests running processes as a non-privileged user if possible.
- IIB level configuration: This configuration can
include many tasks. For example, it might include creating the
integration node and server that hosts the integration application
code, installing an IBM MQ client, or configuring any credentials that
are needed for integration with the
- Integration level configuration: These configuration settings are applied directly to the message flow and the integration code within it. You can often change these values by overriding placeholders in the bar file with environment-specific values from a configuration file. IBM Integration Bus has a long tradition of separating configuration from code, which aligns well with the config principle from the 12-factor guidelines.
Although you can apply various settings at different levels, this configuration is necessary only for the specific integration use case that the container addresses. Keep in mind the "Each container should have only one concern" guideline, which is a key differentiator between integration pets and cattle. In contrast, integration pets are taught various tricks (integrations) over time with the accumulated interwoven and complex configuration to match. Containers that represent cattle are much lighter and more specialized.
Raising cattle: Building images
Now that you know what is inside the container, let's focus on how to build
containers. Docker containers are created based on images. Images are
created at build time by using the Docker
build command and
instructions in the form of a Dockerfile. Each instruction in the
Dockerfile results in changes to a filesystem layer, and all layers
together make up the image. You can then deploy and run an image as a
Tips and techniques are available on how to decide what goes into these layers. Some are described in the context of IBM Integration Bus in the "IBM Integration Bus and Docker: Tips and Tricks" blog post in the IBM Integration Developer Center. This section uses and expands on those concepts.
After you build an image, everything inside the container is ready and available when the container starts. The running container has read access to all the files that result from the image layers. It also has an extra read-write layer that is added when the container is created and therefore, is not part of the image. The article "Visualizing Docker Containers and Images" explains this scenario well.
When you map this image to the building blocks of the 12-factor integration container, you will see a few obvious candidates to include in the image. All images must begin with a base operating system layer. Ubuntu is common and is supported by IBM Integration Bus. You must also add the IBM Integration Bus binary files. You must also perform certain activities, such as creating the non-privileged user. Because these files don't change, you can easily include them in the instructions in the Dockerfile. In fact, a Dockerfile for IBM Integration Bus exists on the GitHub repository that includes these instructions.
What about the integration code itself and the configuration? Should you include them in the image too? This question is one of the first big departures from a traditional server toward treating servers as cattle. If you include the integration code in the image, you change the meaning of deploying to an environment. Traditionally, you might deploy the integration code to an existing, heavily nurtured, already running integration server. Here, a deployment package can provide the integration code, its own instance of the IBM Integration Bus binary files, the operating system, and all the configuration that goes with it. A truly self-contained component is ready to be started, scaled, or destroyed completely in isolation from any other integrations.
Build time versus run time
How do you decide between what to include or configure at build time (in the image) versus at run time after the container is created and started? You must first consider the answers to these questions:
- Startup speed: How long can we afford to wait for a container to start?
- Image proliferation: How well equipped are we to manage a greater number of more specific images?
Consider the following example of a simple integration use case that exposes data from a database table as an HTTP REST interface. Starting from nothing, you need to perform the activities that are shown in the following figure.
Lightweight IBM Integration Bus: The IBM Integration Bus runtime is evolving to become more lightweight. Some of the activities in the diagram are likely no longer necessary in the newer version of the product.
Each activity that is started at run time rather than at build time adds to the time it takes before the integration is up and running. A fast startup time is important for the disposability aspect of 12-factor applications and is a key benefit of containers over virtual machines (VMs). It is used to scale elastically with the workload, rebalance if infrastructure failures occur, and facilitate continuous delivery. You want to do everything you can to reduce the container startup time by building whatever you can into the image.
However, loading everything into the image is not as simple as it might appear at first. The credentials for the database access or the Transport Layer Security (TLS) certificates from the previous example can differ between the development environment and production environment. This situation can lead you to have to choose between running environment-specific configuration steps at startup with a slower uptime or creating multiple images, one for each of the different environments with a faster startup time.
Clearly, the complexity of having many images to manage has some cost. However, if startup time is critical, it might be necessary. Regardless of the option that you choose, the resulting container is still much more disposable than a traditional pet style server.
Now that you understand the basics of the filesystem layering approach of
building Docker images, let's delve a little deeper into its practical
implications. The previous section briefly mentions the use of Dockerfile
instructions to build images. It is technically possible to realize the
installation and configuration of the components from Figure 4 into a
single Dockerfile. The file might start with a line that references the
base operating system (
FROM ubuntu:14.04) followed by a long
list of extra instructions. You basically have a single Docker image that
implements the integration functions as shown in Figure 6.
However, a much more realistic approach might spread the instructions across multiple images that each reference each other as illustrated by Figure 7.
This approach can lead to more images overall, but some of them are reusable. For example, you can use "intapp base" as the basis for all applications with similar prerequisites. When it comes to deciding how to break down the full list of instructions and spread them across multiple images, you can apply the following guidelines:
- Use the images to separate concerns. Keep the product installation in
a separate image. Keeping the configuration separate from the product
installation separates concerns. This approach has the following
- It makes the images more reusable.
- It helps with debugging if a problem occurs during the build.
- Expose versioning information of the included images and components for later inspection in the running container. You can use LABELS in the Dockerfile or other means, such as creating environment variables or a version.txt file.
- Include tests (which can be executed programmatically) with each image. If the build instructions (Dockerfile) change, you can use the tests to verify that the resulting image is still performing as intended. Specifying tests for images is akin to providing unit tests for code. They are the foundation for realizing such benefits as accelerated patch deployments.
- Automate the build of images. Each image change means that you must also rebuild any derived images. Therefore, a change further up the hierarchy of images can trigger a substantial number of builds. Combined with programmatic tests (see guideline 3), automation helps to avoid inconsistencies and manual errors.
- Consider the characteristics of the file system layers especially regarding caching and size on disk. For practical tips and tricks to create Docker containers for IBM Integration Bus, see "IBM Integration Bus and Docker: Tips and Tricks" in the IBM Integration Developer Center blog.
These guidelines help get you closer to building Docker images with IBM Integration Bus in a predictable and reliable process. While the process is always the same, several important characteristics shape the content of each image. Such requirements as whether you need a local MQ queue manager, shared caching, logging, or error handling framework can also impact the image. Addressing all of those requirements in any significant depth is too much to cover in this article. Some topics, such as incorporating IBM MQ, might result in multiple images in different branches of a hierarchy, such as the one shown in Figure 8. You can read more about building such an image in the "IBM Integration Bus and Docker: Tips and Tricks" blog post. Watch for future complementing posts about addressing some of the others.
Now that you have established a process for building cattle-like images, let's look at the bigger picture. Figure 8 shows an example of a hierarchy of images for five integration applications. You can see that the number of nodes increases, particularly toward the last two levels on the right.
This image shows two important inflection points that drive the overall number of images and therefore, the effort that is required to manage or maintain them:
- Whether you decide to build the application code and configuration into the image. This point multiplies the number of images that are created by the number of integration applications that you plan to deploy.
- Whether any environment-specific configuration changes are also built into images. If you choose environment-specific images, they are only built and exist in their respective environment. Therefore, the hierarchy that is shown in Figure 8 does not have images for different environments in the same Docker registry. They are spread across several Docker registries, one for each environment.
If you chose to include the integration applications in the image and have separate images for each environment, the total number of images jumps substantially. Imagine that you have 100 integrations to deploy to five environments. You now have 500 images rather than one image to which you deploy different integrations at run time. However, those images can be stopped, started, or scaled almost instantaneously and require no further nurturing at run time. More specific images move you closer to the cattle direction. Fewer more generic images move you back toward the pet direction. Yet, all options are valid. It all depends on your requirements around resiliency and the ability to scale quickly, balanced with the management implications of many images. Also, some implications might exist on storage requirements for many images. However, because of the layering system that is employed by Docker, the implications might not be as large as you might think.
The example of certificates and credentials in this article also raises the question: how do I get potentially sensitive information, such as passwords, into the container? Orchestration tools, such as Kubernetes or Docker Swarm, provide functions to make this process easier. However, you can use specialized tools, such as Hashicorp Vault, to manage secrets and make them available to containers in a secure and controlled manner. This topic is beyond the scope of this article, but is worth considering exploring in more detail in a separate post.
As is often the case, taking a radically new approach to a problem can lead to more questions than answers. Although breaking down a large monolithic integration topology can result in significant benefits in terms of agility, scalability, and resilience, it also raises inevitable questions that are familiar from the world of microservices. Let's visit each of these concerns briefly.
How I will manage the much larger number of running servers and their associated containers?
Container orchestration refers to managing the number of containers to deploy, when to deploy more containers based on workload, and enabling auto-recovery. The current favorite for this capability is Kubernetes, and an introduction to its use with IBM Integration Bus is in this post. The IBM Container Service was recently enhanced to provide management based on Kubernetes as described in this post.
IBM Integration Bus on Cloud uses Docker in the background. IBM Integration Bus on Cloud is an excellent option if you prefer to let IBM manage how to stand up containers and orchestrate them. It abstracts you from day-to-day infrastructure concerns so that you can focus on your integration design and implementation.
How can I securely pass confidential information, such as user credentials, to servers that are transitory and cloud based?
Again, Kubernetes provides mechanisms to enable safe storage and retrieval of this sensitive data known as secrets. For more information, see also the "Zero to Kubernetes on the IBM Bluemix Container Service" blog post.
How do I monitor and diagnose problems in a distributed set of components without interrogating each one individually?
Indeed, how do you even find out which components a request passed through? A common solution is centralized logging, where all containers use a standardized mechanism to push logs to an event stream that represents the audit trail of the activity from all containers. Many common capabilities are possible for interpreting logs. As an example, IBM Integration Bus recently introduced the capability to push logs to a consolidating service in Bluemix.
Moving to more a lightweight cattle style integration clearly offers tempting benefits in terms of agility, elastic scalability, and more individual resilience models. This article explored key concepts and provided recommendations around the construction of images. Containerized integration marks a significant turning point in integration topologies, and there's plenty more to say. Keep an eye out for more posts on this topic by following the IBM Integration Developer Center blog.
Thank you to the following people for their input and review of the material in this article: David Arnold, Peter Broadhurst, Rob Convery, David Currie, Geza Geleji, David Hardcastle, Peter Jessup, Rob Nicholson, Ben Thompson, and Jörg Wende.
- Lightweight integration with IBM Integration Bus
- 12-factor integration
- Overview of IBM Integration Bus and Kubernetes
- IBM Integration Bus, Kubernetes and the Bluemix Container Service
- IBM Integration Bus and Docker: Tips and Tricks
- IBM Integration Developer Center
- Container Orchestration Code Journeys