The admin structure is used for
- Opening and closing of shared queues.
- Committing or backing out units of recovery where a Shared Queue is used as part of a UOW.
- Recovering inflight units of work following a failure of one queue manager in the QSG
- Serialized applications connecting to or disconnecting from the queue manager.
What problems can I have with my structures?
- Loss of the Coupling facility - for example the CF is shut down
- Loss of connectivity. This can be all systems lose connectivity - or just just some systems lose connectivity - other systems still have connectivity.
The section Shared queue recovery in the knowledge centre has a good discussion about connectivity to Coupling Facilities and what happens if all, or some systems lose connectivity to the CF.
Do I need to duplex my admin structure?
To tolerate a failure of a coupling facility, you can either duplex the CF structure, or use MQ's loss of connectivity toleration feature.
If you duplex the structure then a copy of the structure is in two CFs. If there is a problem with one copy, then the other can be used. Typically you duplex a structure between two sites, so the response time to the structure would include the time to the remote site.
If you do not duplex the structure, you should specify an alternate CF in case the primary CF has a problem. In there is a failure then the structure is reallocated in the secondary CF and the queue managers rebuild their information.
While the structure is being rebuilt the above facilities are not available - this typically lasts a few seconds for the admin structure.
For the queue managers to stay available and rebuild the Admin structure, you need CFCONLOS(TOLERATE) to be specified. All queue managers in the queue-sharing group must be at command level 710 or greater and OPMODE set to NEWFUNC for TOLERATE to be selected. You will also need to define application structures at CFLEVEL(5) and set CFCONLOS(TOLERATE) for the queue managers in the QSG to stay available if there is a loss of connectivity to the application structures.
So you need to think about
- Higher availability - but longer response time to the structure (10s of microseconds)
- Possible interruption of service for some seconds in the event of a problem - but possible faster response time
You may want to cause the admin structure to be rebuilt.
Consider the scenario where you have two sites. LPARA and LPARB and CFA in one machine are at one site, and LPARZ and CFZ are on a machine are at the remote site. Your admin structure is usually in CFA. LPARA and LPARB get good response time from the CF because it is co-located. Most of the MQ work is done by queue managers on LPARA and LPARB. If LPARA and LPARB are shut down, then the queue manager on LPARZ will do all of the work. The response time to the stucture will be relatively longer because it is a remote CF. If you failover the admin structure to CFZ then once the structure has been rebuilt, the queue manager on LPARZ will have good response time because it is co-located with the CF.
You can request z/OS to rebuild a structure in another CF, if one is available, with the SETXCF,START,REBUILD,STRNAME=structureName command.
Thanks to Gwydion Tudor who helped me with this this.