Mission:Messaging: Of mice and elephants

Multiple message types sharing resources is like all zoo animals in the same cage

If the messages in your system are wreaking havoc on your queues, clusters, and servers, then a trip to the zoo might help you understand why all your messages aren't getting along. Like different animals, different types of messages require unique conditions that enable them to live in harmony in your environment. This article will help you identify and accommodate the animals living in your messaging kingdom. This content is part of the IBM WebSphere Developer Technical Journal.

T.Rob Wyatt, Senior Managing Consultant, EMC

T.Rob WyattT.Rob Wyatt is a Senior Managing Consultant with IBM Software Services for WebSphere who assists customers with the administration, architecture, and security of WebSphere MQ. Recently he has focused on WebSphere MQ security, publishing in the IBM WebSphere Developer Technical Journal and presenting at the IMPACT and European Transaction and Messaging conferences. T.Rob also hosts The Deep Queue, a monthly WebSphere MQ security podcast.


developerWorks Professional author
        level

02 April 2008

Also available in Chinese

In each column, Mission: Messaging discusses topics designed to encourage you to re-examine your thinking about IBM® WebSphere® MQ, its role in your environment, and why you should pay attention to it on a regular basis.

The messaging animal kingdom

There comes a time in every project when, in hindsight, certain decisions seem painfully obvious. For many an SOA project, this is the sudden realization that not all messages are created equal. To illustrate, imagine a Web application in which a high volume of lightweight user inquiry interactions are converted to WebSphere MQ messages. These small and non-persistent messages are then sent to a backend system which generates replies that are sent back to the user, typically within a second or two. Think of these messages as the mice. Now, imagine a second application where users upload large, multi-megabyte files composed of many update transactions. These are converted to very large, persistent WebSphere MQ messages which are transmitted to the backend system for processing. These messages are the elephants.

Adding elephants to a network where mice are already running free predictably results in lots of trampled mice. What often catches people by surprise, though, is that the opposite is also true: adding the mice to a system tuned for elephants results in lots of frightened elephants. This can seem counter-intuitive when the problem is perceived as one of simple linear scalability. Just add CPU, disk, and memory until the system can handle the elephants and all the problems go away, right? After all, the reasoning goes, if we have the horsepower to move elephants around, mere mice should pose no challenge. The reality of course is that tuning the network to simultaneously handle these drastically different classes of data is extremely difficult, if not impossible.

I like to describe this as the "mice and elephants problem" because the issues are instantly recognizable in these terms. But there are many other species of message out there, too:

  • Depending on the business requirements, messages may or may not require encryption (porcupines and opossums).
  • The business value of a message might be long-lived or it may vanish in seconds (tortoises and hares).
  • Some messages have value only in the context of a coordinated transaction while others stand alone (dolphins and sharks).

Nobody expects to go to the zoo and find all of the animals housed in the same enclosure, or even the same type of enclosure. The notion that mice and elephants and dolphins and sharks should be housed separately is intuitive. But when building a messaging network, there is a tendency to reduce the requirements down to some volume of messages where “message” is an abstract atomic chunk of data and “volume” is simply an integer quantity of messages during some time period. While this is certainly useful information, it does not come close to describing all of the actual business requirements that a messaging network needs to support.


Designing habitats for your messages

Several of my clients have run into trouble because they designed their messaging network to scale up using identical components. The only answer they considered for a performance issue was to add more horsepower or more nodes and distribute traffic equally across them. As new applications were implemented, each with unique and often conflicting requirements, it became increasingly difficult to tune the network to handle all the different types of traffic, and this eventually led to outages. At some point, I was brought in to assist with the system tuning. It is always a big red flag for me when I show up on site and find mice, elephants, opossums, porcupines, dolphins, and sharks all running around in the same space. Tuning is not the answer to this problem.

The zoo provides classes of service by tailoring habitats to the species of animal they are to contain. Often, different animals with similar needs can share an area. The gazelles and wildebeests get along quite nicely. Other animals are fundamentally incompatible and do not share any resources. Aquatic animals get tanks, land animals get enclosures of some kind. Providing classes of service on the messaging network is not much different. You cannot build a monolithic messaging network and expect it to handle every type of message you throw at it, but you can build dedicated routes or nodes within the network that are tuned to accommodate a variety of differing needs.

The key then is to gain an understanding of which resources are shared between message flows, where the conflicts arise, and then segregate the traffic at the appropriate level. There are many different criteria with which you might distinguish classes of service on a messaging network. Some of the more common ones include:

  • Business value time to live: Some messages lose their business value if they are not processed within a short period of time. Others may be delayed hours or days with no loss of value. In general, messages with a short-lived business value are segregated to higher throughput hardware, while messages with a long-lived business value can be routed to slower, and probably cheaper, hardware.
  • Volume: Some interfaces pass a very high volume of messages per second, per hour, per day, and so on. Beyond a certain threshold, the sheer volume of messages could push an interface into a separate class of service.
  • Size: The key here is not the absolute size but rather the difference in relative sizes between message types. Tuning for consistently large or consistently small messages is not difficult, but tuning for extreme variability is.
  • Privacy and integrity: When the messaging network or volume of message data is large, encrypting all traffic can be prohibitively expensive. If the portion of traffic that requires encryption is relatively small, carving out a separate class of service for it might be appropriate.
  • Persistence: Throughput differs dramatically in both queues and channels depending on whether messages are persistent.

Let’s look at a specific example.

Solving the mice and elephants problem requires, at a minimum, that the different types of messages pass through different channels and transmit queues. Depending on the volume of messages and the capacity of the server, it might be necessary to take additional steps and either create a separate queue manager or separate the message streams onto different servers. The optimum solution will be different in each shop. The trade-off here is that as you reduce the number of shared resources, the complexity, license cost, and administrative overhead can increase. The business case is that these costs are usually much less than the cost of throwing money at the problem until you have enough horsepower to meet the service level requirements.

Table 1. Shared resources by topology
Queueing topologyChannelXMitQDiskCPU/MemoryNIC
Clustered queue managerXXXXX
Overlapping clusters-XXXX
Point-to-point channels--XXX
Multiple queue managers on one node---XX
Queue managers on multiple nodes-----

Table 1 shows that applications on a clustered queue manager share many resources. The rows are different topologies for WebSphere MQ and the columns are different resources that WebSphere MQ consumes. Let’s consider each topology:

  • Clustered queue manager: Applications on a clustered queue manager share many resources. This is the classic example of mixing all types of animals into a single enclosure.
  • Overlapping clusters: You can split up traffic onto different channels by adding an overlapping cluster. This provides the opportunity to differentiate based on channel tuning. For example, you can send string data in one channel with conversion enabled and send binary data down another. Perhaps one path uses encryption and another uses plain text. When the multiple paths are cluster channels, you can tune them differently, but they all still share the same transmit queue. If this is a problem, as it is with mice and elephants, you need to provide more isolation.
  • Point-to-point channels: Creating dedicated point-to-point channels is another way to establish a separate route through the network, and one which provides dedicated transmission queues. Now you have the ability to tune sets of channels and associated transmission queues, tailoring each to the types of message you expect. Creating an exclusively point-to-point network loses the benefits of clustering, so the usual implementation is a hybrid where the cluster handles most traffic and a limited number of point-to-point channels are created to provide a differentiated class of service.
  • Multiple queue managers on one node: In some cases, completely separate queue managers are required. This gives you the ability to tune almost everything at the queue manager level to a specific type of traffic, which can be very helpful when, for example, there are specific requirements for maximum handles, maximum outstanding units of work, log file sizes, broker parameters, or any other queue manager global settings. If your mice and elephants both employ persistent messages, they will have conflicting requirements as to log file tuning, which might lead you to a solution where separate queue managers are required. I usually advise not putting more than two or three queue managers on a server, though, because after that the operating system tunables start to be impacted.
  • Queue managers on separate nodes: The ultimate option is to split the traffic between queue managers on different physical servers, where each is tuned for a different type of message. It is common to use this pattern to segregate bulk, long-lived traffic from high-throughput traffic with short lifetimes. In this case, more expensive high capacity hardware is used for the fast traffic while the bulk traffic is handled by commodity hardware with large disk partitions.

It is worth mentioning that classes of service add a third dimension to what was a two-dimensional network. This adds complexity, which means it also adds cost. If the solution requires additional queue managers, it might result in additional instances of other software components in the stack up to and including additional servers. All of these new components should be considered in the contingency plan as well. The business case here is to build out classes of service only when doing so is cheaper than simply throwing horsepower at the problem.


Living with your implementation

If your messaging network is small, you might be wondering how to anticipate which classes of service you will need a year from now. Or, if you have an established network, you might already be struggling with these issues and wondering how to retrofit your network to accommodate classes of service. In either case, the key is to design in the flexibility to easily implement classes of service. It might seem counter-intuitive, but adherence to strict standards improves flexibility by improving interoperability and reducing dependencies in the network.

Here are some common best practices that apply to this discussion and are worth repeating as recommendations:

  • Keep it simple. When there are two configurations that provide the same functionality, the simpler one is preferred. Add complexity only when it addresses some business requirement that is not solved in a simpler configuration. For example, overlapping clusters to solve namespace issues is almost never a good idea, but doing so to provide an encrypted network path may be the right solution.
  • Standardize on nodes and node separators in queue names. Since the dot character is significant to the Object Authority Manager, it is usually the preferred separator. Nodes in object names are generally constructed left-to-right and generic-to-specific.
  • Name queues for the services they represent. In SOA terms, this is an "intention revealing" name. In accordance with SOA principles, the name of the object reveals what it does, not how the function is implemented. Do not include physical attributes, such as the queue manager name, the sending application name (this is an old point-to-point practice), or the type of queue. Do include the version of the service in the queue name so that new service versions can be deployed independent of service consumers.
  • Reserve space in object names for a class-of-service qualifier. You do not necessarily need to use it right now, but as long as you set aside a couple of characters for it you can add it later.
  • Include the cluster name in the CLUSRCVR channel name. The well-known example is <cluster>.<qmgr>, which results in a dedicated CLUSRCVR for each cluster. This makes administration of the clusters much easier and more reliable when clusters overlap, but it does restrict cluster and queue manager name lengths.
  • Know your data. Have instrumentation on the messaging network to capture run time traffic statistics and reconcile the actual numbers back to the estimates the development team provided prior to implementation. You cannot adequately manage the network based only on message counts. You need to know sizes, persistence, priority, and frequency distribution, plus have the ability to correlate message affiliations within the data stream, such as matching requests to replies. Unless the actual traffic is reconciled back to the estimates, the quality of the estimates never improves. Worse, if you do not know message latencies, then you have no warning when you are approaching a threshold and are about to exceed your SLA.
  • Continuing education is important! It would be impossible to condense a description of all the interactions and settings in the queues and queue manager down to an article or two. There is no shortcut to knowing the messaging products at a deep level and applying that skill to your specific environment. As much as I’d like to sell you services, your enterprise will benefit by growing the expertise in-house.

You will be able to recognize this problem when you see that two different message streams impose conflicting requirements on some system tuning value. For example, the occasional elephant interrupts your flow of mice for several seconds and you cannot tune channel batch sizes to accommodate both message types. At that point, the naming and other standards mentioned above will position you to easily add MQ channels, queues, or clusters into your existing network. The instrumentation mentioned will help you decide when and where to move message streams onto dedicated resources. And finally, education will enable you to perform the necessary analysis and design, and ultimately achieve a successful implementation.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=298857
ArticleTitle=Mission:Messaging: Of mice and elephants
publish-date=04022008