If you are familiar with IBM MQ Clustering, you will already know that keeping your Full Repositories running smoothly is key to maintaining a happy and healthy cluster. You will therefore have come across best practice advice to dedicate specific hosts to running these queue managers, and then leave them to get on with the business of running the cluster without also running application workloads.
However, sooner or later you may need to relocate your cluster repositories, perhaps because
- You are migrating to a new release of the product
- Your choice of environment (OS, hardware) has changed
- Workload has grown and you want to separate out systems which were previously co-located
…etc. or maybe a combination of the above.
The usual recommendation when ‘replacing’ a queue manager in a cluster is to avoid relying on the new system being an exact replica of the old. Because clusters are asynchronous, and to some extent keep a historical record of members and the objects they have advertised, it is usually better to add new systems which are completely unique, even if they are to serve the same purpose as an older system being decommissioned. This, then, is also the approach taken here.
- Create a new queue manager which is to start hosting a full repository for the cluster, with a unique name.
- Introduce this queue manager into the cluster as normal – defining a sender channel on this queue manager to an existing FR, and a receiver channel by which other queue managers will contact this queue manager.
- Now promote this queue manager to hold a full repository for the cluster – remember to also define cluster sender channel(s) to (the/all) other full repository or repositories and from them to this queue manager at this point. It is critical for all to be fully interconnected to ensure a full picture of the cluster state reaches this new repository.
Note 1: Assuming you normally run with the recommended 2 FRs, you now have 3 Full Repositories for this cluster for a short period. This is perfectly acceptable as long as all remain fully interconnected.
At this point it is good to take a checkpoint – display cluster objects (queues, topics, queue managers) on the new full repository to confirm that all knowledge of the cluster has been transferred to the new cache. If entries which you believe should be present are missing:
- Check all your CLUSSDR channels between FRs are correctly defined and able to start
- Check there is not a build up of messages still being processed on the SYSTEM.CLUSTER.COMMAND.QUEUEs (or any transmit queues in the cluster).
- Check for any errors in the queue manager error logs.
- Now you are ready to decommission an existing full repository. This is as simple as modifying the REPOS or REPOSNL attribute on the queue manager. Migration complete!
Note 2: At some point before or after step 5 you will need to go and visit all partial repositories and other FRs removing any CLUSSDR definitions which point to this ex-Full Repository and replacing with a definition pointing to one which is still active. This should be done as soon as is convenient after these changes, but is not critical for it to be carried out instantly. However, if while in this state you attempt to ‘bootstrap’ the cluster in any way (for example issuing REFRESH CLUSTER) you will experience problems, so it is desirable to complete this process in a timely manner.
If necessary, you can now repeat these steps to migrate the (an) other full repository
Varying the process
In the real world, there are a number of factors which can mean the above process has to be varied slightly:
- If you must keep the same name for the new full repository, it may be preferable to temporarily drop down to one FR, and ensure the old queue manager is completely forgotten (using RESET CLUSTER) before adding the new system with the same name to the cluster. Running with one FR for a short maintenance window is not a concern, since even if that fails you will not experience immediate problems as long as no applications attempt to access queues which they have not used before in this period. Again – this is NOT the recommended route, as in practice using the same name for different actual queue managers has been seen to lead to confusion and hard to spot errors, which can occur many days or weeks after the change in configuration.
- By corollary to (1), one of the most common reasons for requiring the same queue manager name is that application queues are also hosted on the full repository. Consider taking this opportunity to separate out application and repository tasks onto separate queue managers (even if still hosted on the same system). The new full repository (with a new name) can then be introduced as a separate step from flushing out and replacing the system with the name which is being kept.
- In general if you prefer temporarily dropping to one FR to running with three - for example, simply to avoid the brief period with one extra queue manager active - a window with one full repository is acceptable; (see (1).
I hope this post has been helpful both in outlining best practice and the reasoning behind it – as ever the very first step is to try out your planned process on a test system and ensure the precisely tailored version you intend to use is fully tested and documented for your own reference. If you found this useful, and in case you missed them, you can always check out my profile page for links to previous blog entries relating to MQ Clustering.