Mission:Messaging: If your queue manager could talk, would you hear it?

Experience working on client engagements for IBM® WebSphere® MQ troubleshooting has revealed that many problems were not only preventable, but -- more importantly -- they would have been predictable too, if the early warning signs had been noticed and heeded. A minimal up-front investment in WebSphere MQ training and tools can prevent costly outages down the road. This content is part of the IBM WebSphere Developer Technical Journal.

T.Rob Wyatt, Senior Managing Consultant, EMC

T.Rob WyattT.Rob Wyatt is a Senior Managing Consultant with IBM Software Services for WebSphere who assists customers with the administration, architecture, and security of WebSphere MQ. Recently he has focused on WebSphere MQ security, publishing in the IBM WebSphere Developer Technical Journal and presenting at the IMPACT and European Transaction and Messaging conferences. T.Rob also hosts The Deep Queue, a monthly WebSphere MQ security podcast.



23 January 2008

In each column, Mission:Messaging discusses topics designed to encourage you to re-examine your thinking about IBM® WebSphere® MQ, its role in your environment, and why you should pay attention to it on a regular basis.

"Hello, it's your middleware calling... anybody out there?"

If your queue manager could talk, what would it say? Would you recognize it if you heard it? In too many cases, there are queue managers out there calling for help but getting no answer.

I first joined IBM as a WebSphere MQ specialist. With plenty of MQ experience, I would be a natural fit for a consulting team that specializes in WebSphere-branded products. I soon discovered, however, that the folks in IBM Software Services for WebSphere, the group I joined, are expected to work on the bleeding edge with the newest WebSphere products, and that my deep specialist skills were sometimes viewed as more of a constraint than an advantage.

WebSphere MQ is considered a mature technology and, in my manager's words, I was a "one-trick pony," so I embarked on a plan to take as many WebSphere MQ assignments as I could find while training up on other WebSphere products. I remember wondering whether there would be enough MQ work to keep me employed until my training was done.

Now, eighteen months later, I am still a WebSphere MQ specialist. Oh sure, I've learned about some of the other WebSphere products, but I'm not getting assignments for any of those. So,if I am still primarily focused on WebSphere MQ, the question then becomes: How is it that I am still employed?

Despite WebSphere MQ being a mature product -- or perhaps because of that, it turns out there was (and is) no shortage of work. Some of my assignments from last year included:

  • A messaging network of more than 50 overlapping clusters that was taking daily outages.
  • A chronic QUEUE_FULL condition had been "cured" by intentionally deleting the underlying file out from under the queue.
  • A case where the event queues filled within minutes of enabling events.
  • Several organizations where the only backup of the queue managers was file-based -- with the backups were taken while the queue managers were running.
  • MQ administration teams that had been trained in the '90s and not been to a class or conference since.
  • Several messaging networks that had grown extensively, but had flat or reduced support teams.
  • Environments with hundreds of queue managers and no tooling whatsoever.

The common theme for these situations can be summed up in a single word: neglect. Because WebSphere MQ is so familiar now -- and essentially invisible -- to managers, administrators, developers, operators, and to many other disciplines in the enterprise, there is a tendency to steer resources to the care and feeding of newer and less familiar systems.

In the examples above, the company with all the overlapping clusters had grown their messaging network organically over the years. When the network was small the architecture worked well, but they never stopped to plan how it would scale, or try to redesign it when problems surfaced.

This same issue often applies to administrative tooling and the support staffing model. When the messaging network is small, you can get away with a team of three people (the minimum for 24x7 support) or with using the free desktop-based tools. But these approaches don't scale well to hundreds of queue managers. At that point, you want to staff the team up a little, install enterprise-class monitoring tools, and probably switch to a central administration tool.

The other common aspect of all these assignments was that they were not catastrophic failures, but had developed gradually over the years. But it is not as though the outages were unpredictable or without warning; they were simply cases where the investment in infrastructure and support fell short of the need, and the accumulated shortfall eventually reached critical mass.

Unfortunately, the outage in every case was far more costly than the investment that could have prevented it. Worse, the signs of impending trouble were there, but few on the ground recognized them, and those who did lacked the resources to respond adequately.


"Greetings, middleware. This is your administrator. How can I help you today?"

On one hand, this is all good news, because it means job security for yours truly. Over the last year I completely forgot about trying to stay employed and getting trained on another product. In fact, I have the opposite problem: I'd really like to train on some of the other WebSphere products, but there's so much MQ work I can't find the time.

On the other hand, the big picture reality is that MQ shops are taking outages every day for things that are detectable and preventable. Therefore, I've made it my mission to address both hands here.

This article kicks off a recurring column that encourages you to re-examine -- and re-ignite -- your thinking about IBM WebSphere MQ and maybe, just maybe, get you back on speaking terms with your queue manager. Besides, if MQ administrators and managers out there can better recognize signs of trouble and successfully intervene in time, then I'll be able to free up some time for training on something new -- or at least go back to worrying about staying employed.


The right tools: Desktop vs. enterprise

One of the most common problems that I see in troubled situations is that the administrators lack the tools to do their job. In order to be an "MQ Whisperer," an administrator needs visibility to the entire network and detailed information from each queue manager and its host. Enterprise-class tools provide these things, as well as a way to filter out the interesting information from the background noise.

When the messaging network is small, the free desktop-based user tools are usually sufficient for this purpose. New functionality, such as the WebSphere MQ Health Check Plug-In (SupportPac MH01) are extending WebSphere MQ Explorer with real monitoring capabilities. Desktop tools like WebSphere MQ Explorer and SupportPac MO71 are almost invariably the first MQ administration and monitoring tools used in the enterprise.

Unfortunately, these are often also the last MQ administration and monitoring tools used in the enterprise because, as the network grows beyond the capability of the desktop tools, the idea of switching to an enterprise-class tool frequently manages to get lost in the shuffle. The result is that some shops never get the tools they need, despite drowning in work and taking frequent production outages.

What do I mean by "enterprise-class" tools? Specific functionality I am looking for includes a real-time network-wide view, security, accountability, configuration management, and scalability. Although the current crop of enterprise tools use the central hub-and-spoke model, this is not a strict requirement. It would be possible to build these functions into desktop tools. Keeping that in mind, I'm going to argue that the central hub-and-spoke model provides enough advantages to be considered a requirement for an enterprise tool.

I will also lump administration and monitoring tools together in this discussion because, at least for the administrators, the two functions have a lot of overlap.

What's wrong with desktop tools?

Good question, I'm glad you asked. Nothing is wrong with desktop tools -- as long as you use a limited number of queue managers and are mainly concerned with queues and messages. But if you work in a large environment with desktop tools, you may have run into some of these issues:

  • Scope: Desktop tools get bogged down when the number of queue managers gets beyond a few dozen. To provide a dashboard view of the whole network, the tools need to maintain many open connections and poll the queue managers constantly. They are simply not designed to scale up to the size of network that is increasingly common today. Once you get to the point where clicking inside your desktop tool locks the workstation for 10-20 seconds, you are wasting money through lost productivity.

  • Point-of-view: Desktop tools require a user session to run and, by implication, a user to keep an eye on them. By contrast, centralized tools are designed to run independently of any user session so they know the entire network and are always on.

  • Scalability: Assuming that your desktop tool had the ability to scale up to hundreds of queue managers without degrading performance of the workstation, is it a good model? The queue managers would each require an open connection from each administrator, and each connection would be constantly polling the queue manager state. Scaling this model up to a large administration team, lots of queue managers, or large numbers of queues and channels, exacts an increasingly large toll on each queue manager. Extrapolate this far enough and eventually there are no cycles left to do business transactions.

  • Accountability: I've been on a number of engagements where the queue manager configurations we built one day were changed the next and nobody knew who did it or why. For any company where WebSphere MQ is used for mission critical applications, and especially in industries with regulatory compliance requirements, this is probably not acceptable. With a large team and lots of queue managers, tracking down a specific adverse event is a practical impossibility when there are no logs, or when the logs are distributed across many workstations. With a central tool and consolidated access log, you can have meaningful accountability.

  • Security: Administrators require full access for their tools. Central tools terminate that administrative access path onto specific hosts in a secured data center, which then mediate access through a lightweight client, such as a browser. Desktop tools require administrative access to terminate on dynamically allocated workstations in unsecured environments.

  • Operational events: Event messages consumed by one desktop client are not visible to other desktop clients. Several approaches have been tried, the best of which I've seen has been to distribute events as pub/sub messages. While this has its advantages, the events still need to be parsed, and there is no provision to recover events generated while the user is offline. Central tools consume the event message and a presentation layer displays it to as many users as needed.

  • Configuration management: Desktop tools provide some configuration management but it is not tightly integrated into WebSphere MQ. On the other hand, the current crop of enterprise tools can schedule automated changes, move configurations from one queue manager to another, include template functionality and roll back to prior points in time.

  • Administration: Securing interactive access to WebSphere MQ with desktop tools requires management of a public key infrastructure at the desktop and maintenance of access control lists (setmqaut commands) on all the queue managers. Central tools store user identity and access control information for the entire network in a single location and provide screens to manage it all. This results in fewer errors and the reduction in administrative overhead offsets a large chunk of the cost of the central tool.

These differences in the functionality and architecture of the tools accumulate as the network and staff scale up. Beyond a certain point, the limitations of the desktop tools begin to contribute directly to outages and, ultimately, to the financial bottom line.


A bit of prevention = a gigabyte of cure

Toward the end of any troubleshooting engagement, I am inevitably asked: What could we have done to prevent this?

In the case of tools, the answer is to start when the network is still relatively small and bundle the cost of the tool into overhead of each queue manager. Once the central components are in place, the incremental cost of adding a new queue manager is minimal. This prevents the "sticker shock" of migrating to a central tool when the network is large. But if your network is already large and you need tools, remember that the situation will only get worse. Better to bite the bullet now than to defer the implementation while the messaging network continues to grow.

While it may seem obvious, my additional advice once tools are available is to actually listen to what the network is trying to tell you! At the beginning of this article, I listed several assignments I in which I had been involved. The account with the cluster problems was monitoring some MQ processes, but not amqrrmfa, which is the cluster repository process local to each queue manager. On another account, we found out about a queue file being deleted only when I suggested they might want to take backups of their object definitions. More recently, I worked at an account where we enabled events to see if any of the WebSphere Message Broker problems we were having were showing up at the queue manager. The event queues immediately filled up and the volume of event messages was so high that it was impossible to pick out events related just to the problem we were trying to diagnose.

In each of these cases, monitoring and administration tools were available that would have detected the problems, but they were not used.


Listen -- but learn, too!

In addition to lack of tools, lack of education is a significant contributing factor to many of these problems. There is an IBM conference each year with deep WebSphere MQ content: In the USA, it is called IMPACT, and worldwide it is known as the Transaction and Messaging Conference. There are also formal instructor-led classes, plus an increasing number of Web-based and other self-paced training options available.

When it comes to formal education, don't limit the administrators to just the administration classes. Application development training gives administrators a comprehensive view and helps them understand the business requirements of project teams and optimize the MQ configuration accordingly. Some administrators might tell you not to send developers to administration classes because it makes them "dangerous." My opinion is that developers design more robust and scaleable systems if they understand how WebSphere MQ actually works.

I have attended almost all of these yearly conferences, going back to the first one, and I see a lot of the same people there from year to year. My (completely unscientific) observation is that I have not been to engagements at companies who are sending their folks to these conferences. Conference attendance is definitely a lot cheaper than an extended outage.

Many conference sessions include deep technical content that is not available anywhere else. Plus, information about new products, features, or versions often show up at conferences before they appear in classroom curriculum, which can make conference attendance an even more valuable experience. Visit the IMPACT Web page for more information. The worldwide Transaction and Messaging Conferences for the year have not been announced, but I will post links to them in this column when they are. Also, visit the WebSphere Education page for information on classroom, distance, and self-paced training options.

Finally, I can't mention continuing education without talking about the two of the best user communities on the Web. My favorite is the MQSeries List, an e-mail-based discussion group. The other is MQSeries.net which is an online forum. Both are excellent sources of support for everything from very simple questions to deeper topics, like architecture and advanced configuration. Both communities also are quite active, friendly to newcomers, and free.


Future topics

In upcoming installments of Mission:Messaging, which will appear in every other issue of the IBM WebSphere Developer Technical Journal, I will be writing about security, which continues to be a challenge across the board, topics from the sessions I will be presenting at IMPACT, and much more.

The next Mission:Messaging column will be published in March, a week before IMPACT 2008. I look forward to seeing you there. If it's your first time at the conference, come find me and say hello. Then, I will cross your company off my potential problem-client list and maybe make room on my calendar for taking a class or two.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=282848
ArticleTitle=Mission:Messaging: If your queue manager could talk, would you hear it?
publish-date=01232008