APAR status
Closed as program error.
Error description
Shortly after two instances of a highly available agent start up and connect to the agent queue manager, the agent queue manager is stopped. When the queue manager is restarted, one instance of the agent reconnects to the queue manger and successfully starts an the active instance. The other instance reconnects to the queue manager, and writes the following message to its event log (output0.log) at regular intervals: BFGMQ1045I: Agent's system queue 'SYSTEM.FTE.COMMAND.<agent_name>' is configured as either NOSHARE or DEFSOPT(EXCL).
Local fix
Problem summary
**************************************************************** USERS AFFECTED: This issue affects all users of MQ Managed File Transfer highly available (HA) agents. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: A highly available Managed File Transfer agent consists of: - One active instance. - One or more standby instances. The first instance of the agent that starts up locks a shared resource (the SYSTEM.FTE.HA.agent_name queue on the agent queue manager). When the other instances start, they fail to obtain the lock and become a standby instance. The standby instances will then attempt to take the lock at regular intervals, as specified by the agent property standbyPollInterval - once a standby instance obtains the lock, then it becomes the active instance. After the active instance has locked the shared resource, it performs its normal startup operations and then starts processing managed transfers. Now, if the active instance became disconnected from the agent queue manager after it had obtained the lock on the shared resource and before it had completed its initialization, then it would attempt to reconnect to the agent queue manager at regular intervals. However, after it had successfully reconnected, it did not attempt to relock the shared resource - instead, it would just continue with its initialization processing. This meant that a standby instance could lock the shared resource on the agent queue manager and so become an active instance too. If this happened, there would now be two active instances of the agent that were trying to initialize at the same time. The two instances would attempt to access various system queues on the agent queue manager for exclusive access. One of the instances would be able to access the system queues and successfully complete its initialization. The other would fail to do so, and would write messages similar to the ones shown below to its event log (output0.log): BFGMQ1045I: Agent's system queue 'SYSTEM.FTE.COMMAND.agent_name' is configured as either NOSHARE or DEFSOPT(EXCL). BFGMQ1045I: Agent's system queue 'SYSTEM.FTE.EVENT.agent_name' is configured as either NOSHARE or DEFSOPT(EXCL). These messages would be written to the event log at regular intervals until the first instance was stopped, at which point the other instance would be able to open the system queues and initialize successfully.
Problem conclusion
To resolve this issue, IBM MQ Managed File Transfer highly available agents have been updated so that if an active instance becomes disconnected from its agent queue manager: - After it has obtained the lock on the shared resource. - And before it has completed its initialization. then it will attempt to relock the shared resource after it has reconnected. If the instance is able to relock the shared resource, then it will remain as the active instance of the agent, and will continue with its initialization. However, if the instance fails to lock the shared resource after it has reconnected, then it will become a standby instance. This prevents two instances of the agent from being the active instance. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.2 LTS 9.2.0.3 v9.x CD 9.2.3 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT35878
Reported component name
MQ BASE V9.2
Reported component ID
5724H7281
Reported release
920
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-02-11
Closed date
2021-03-25
Last modified date
2021-03-25
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
MQ BASE V9.2
Fixed component ID
5724H7281
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]
Document Information
Modified date:
26 March 2021