Migration: To Be or Not To Be (WebSphere MQ for z/OS)
MarkWomack 270000PC6X Visits (6002)
Like all platforms, specialized skills are just part of the territory when we talk about successfully migrating (upgrading) software. For those familiar with mainframes this has never been more true than in the z/OS environment where installation and migration tasks may entail extensive planning. When so many details have to be taken into account it can be easy to miss a simple step. Though relatively small, one misstep can mean missed targets or delays in rollouts to other environments until test systems are first up to snuff. So, while analyzing migration-related issues I realized that everyone can make similar mistakes, but most people don't make the same exact one. It made sense, then, to summarize generally what we see at Level 2 and give links to some resources that can help fix certain issues by taking a broad stroke at them. This topic almost lends itself to being a Top 10 list, but migration is a little bit more complicated than that, and so these points follow no particular order.
Migration, co-existence, fallback, and toleration PTFs are all related to each other. They fall into a bucket of fixes that allow queue managers to function together in a queue-sharing group. It's the code that will let you migrate to WebSphere MQ 7.1 and then, when you notice a problem, allows a fallback to the older 7.0.1 until you can figure out why that application isn't working exactly as expected. It allows more recent code to recognize structures that were built in earlier levels of the product, so that no matter what you do with the product things behave seamlessly. Given that, it's pretty important to make sure every single one of these fixes is applied to - not only the new level of MQ you're migrating to - but also the old release. We maintain a list of these that's linked to out of these documents:
Beyond that, some of the most inexplicable errors occur when library concatenations aren't quite right. Sometimes, with migrations to higher code levels, some of the library DDs can be out of whack, whereby version 7.1.0 STEPLIBs may still be pointing at version 7.0.1 libraries for instance. With this sort of error usually some pretty unusual abends (for which no hits in IBMLINK can be found) may occur. This is a clue to check the library statements. In addition it's a good idea to ensure the proper library APF authorizations are in place. The WebSphere MQ Configuring manual discusses this as one of the standard tasks involving queue manager set-up. Topics in that section of the book discuss (at least today) 22 tasks that should at least be reviewed in case they apply to your environment.
Not being exactly the same, but also causing similar results, can be the misapplication of PTFs. Perhaps some PTF application JCL just wasn't set up right or references the wrong library. A lot of times the wrong target zone is hit by a fix, so when Level 2 checks a dump for a PTF we can see that it may have been applied, but simply not to "this" instance of MQ. If you apply a PTF, but the same problem comes back, then it's worth investigating the PTF application process.
So what if migration completes without any obvious negative effects, but then suddenly some sort of performance impact is noticed? This could have been the case for those migrating to version 7.0.1 when, depending on message sizing, page sets could fill more rapidly. In order to get an idea of what to expect from varying levels of the product, some of the best content on this can be found in SupportPacs centering around performance and tuning. I review MP16 and MP1G often to provide invaluable assistance in this area. These SupportPacs help to tie together some RMF and SMF reports with MQ so that you'll know what to expect and what settings might lead to unusual results (and thus can be tweaked).
That falls a little bit outside of the window of migration, but often it's right after migration that we'll hear that performance just isn't what it used to be (so these documents are worth a visit from your local performance guru).
So often, in queue-sharing group environments, implementation of the DB2 plans can be at the center of many errors around migration time. Note that since Spring 2012 - per PM60589 - MQ users now only need to issue bind packages and will not have to BIND plans when a new DBRM ships. For DB2 plan errors often the plan being used was not the latest level at BIND time, leading to messages such as CSQ5007E.
Then there's the problem I called "The Chicken or the Egg". It involves the order in which the initialization datasets are concatenated (eg. CSQINP2). If datasets are concatenated such that later members require certain objects to already be defined before those members are read in, then it's important to put those required members in the concatenation first; otherwise, some interesting errors will result. We've seen where CSQ4INYG ought to be concatenated early on because of the important queue objects it defines. Similarly, definitions for things such as storage classes need to be known before those storage classes are ever referenced by anything else.
I've tried to cover a few of the more commonly seen migration issues and I'm sure there are many more that we've never seen, but that probably fall along some common lines. Commentary is certainly welcomed to shed more light on the challenges of migrating.