This is the blog for zAIM software (Application Integration & Middleware) on z/OS, such as MQ, IIB, WAS, Java, etc). We want to share the technical information for zAIM in this blog and community. This is the first blog entry regarding MQ Clustering and expects the reader has knowledge for MQ Clustering.
Actually we have received many questions about MQ Clustering from customer. CLUSDATE and CLUSTIME field in the output from DISPLAY CLUSQMGR command is one of them. Background is that customer worry about cluster expiration, and want to know when it is expired. The one of the reason is that they want to start auto-defined channel as regular operation from Full repository QMGR on z/OS. I explain the detail below.
Let's assume Banking Transaction. They have several MQ Clustering Channels connecting to/from distributed QMGR for various business channel, i.e ATM transaction, Internet Banking, or something like that. The transaction comes from front end client application on Distributed partial repository QMGRs, then z/OS Full repository QMGRs receive the transaction down the channels, and query / update Database using transaction manager such as IMS/CICS/WAS, then reply the results to client side. Starting / stopping channel means one of the process of 'open for business’ or 'close for business'. They don't want unexpected transaction from client side when closing their business, hence they want to 'manage' starting or stopping channels for open / close business on z/OS side. Full repository managers are on z/OS and several partial repository managers on distributed. In this case, 'open for business' in the MQ Clustering environment includes starting auto-defined channels on z/OS. This is why they want to start auto-defined channel on z/OS.
Let's assume customer's system is multi cluster environment. there are CLUS-A and CLUS-B and partial repository QMGR join both clusters using NAMELIST. Usually they use only CLUS-A. CLUS-B is for disaster recovery or cold standby for planned outage. There are two full repository QMGRs on CLUS-A and CLUS-B respectively. Partial repository QMGRs on distributed join both clusters, and usually connect to QMGRs on CLUS-A, but when planned outage, Partial repository QMGRs connect to QMGRs on CLUS-B. Again, QMGRs on CLUS-B are cold standby. This mean QMGRs on CLUS-B are usually stopped for long time. This is why customer worry about cluster expiration.
Cluster expiration date is 30 days. When information expires, it is not immediately removed from the repository. Instead it is held for a grace period of 60 days. If no update is received within the grace period, the information is removed. Therefore, if QMGRs are not started for 90 days, auto-defined channels are removed from MQ Cluster. Customer know this behavior and if it is expired, then they start the CLUSSDR channel from partial repository QMGR, which cause CLUSSDR from Full Repository QMGRs to be auto-defined again. But basically they want to manage start / stop the channel on z/OS as described above.
This is why they want to check if it is expired, and actually they rely on CLUSDATE and CLUSTIME for this purpose. The knowledge center's description for CLUSDATE and CLUSDATE is as follows.
The date on which the definition became available to the local queue manager, in the form yyyy-mm-dd.
The time at which the definition became available to the local queue manager, in the form hh.mm.ss.
In the meantime, nowadays, there is an useful message CSQX469E to monitor the expiration date.
csect-name Update not received for CLUSRCVR channel channel-name hosted on queue manager qmid in cluster cluster-name, expected n days ago, m days remaining
This message isn't issued if QMGR is stopped of course, but anyway, please find the following example. This story is based on real customer's case.
When QMGR and CHINIT in CLUS-B are stopped for long time due to cold standby, then started on May 13, following message is seen.
CSQX469E Update not received for CLUSRCVR channel TO.CLUS_B1 hosted on queue manager CLUS_B1_2018-01-01_01:00:00 in cluster CLUS-B, expected 59 days ago, 3 days remaining
At that time, CLUSDATE shows 2018-05-13
The QMGRs / CHINIT were stopped after planned maintenance, and again not started for long time, then restarted on June 10th for planned maintenance. CSQX469E message shows as follows
CSQX469E Update not received for CLUSRCVR channel TO.CLUS_B1 hosted on queue manager CLUS_B1_2018-01-01_01:00:00 in cluster CLUS-B, expected 9 days ago, 53 days remaining
At that time, CLUSDATE shows 2018-06-10.
QMGRs / CHINIT were stopped again after planned maintenance, and not started for long time, then restarted on Sep 7th. CSQX469E message was not issued. And trying to start auto-defined channel failed because it is "not found".
Customer upset because CLUSDATE was 2018-06-10 on June 10th and Sep 7th is 89 days after June 10th, so it shouldn't be expired, because it is within 90 days. They thought this was defect, then raised a PMR.
There are some misunderstandings here.
CLUSDATE is just the date when the definition became available on local qmgr. This doesn't mean the cluster object removed 90 days after the date. In this case, it should be expired at June 1st. Note that QMGR was not started at the time. It is started at June 10th,then cluster object was updated then CLUSDATE become 2018-06-10. Next expiration date should be July 1st (30 days after June 1st), not July 10th. Likewise, 90th day is not Sep 10th, but Sep 1st. This is why the channel is not found at Sep 7th, which is already passed 90 days after June 1st.
Strictly speaking, expiration date is not 90 days on z/OS, but 94 days. This is because the calculation is based on TOD value. Likewise, grace period is not 60 days but 62 days. This is why above CSQX469E message shows grace period is 62 days (59 + 3 or 9 + 53).
This is complicated, but anyway the recommendation is Don't rely on CLUSDATE and CLUSTIME as the "start" of expiration date. It may true or not true. Check for CSQX469E messages for cluster expiration. Alternatively, connect the channels more frequently, i.e once a month.