APAR status
Closed as program error.
Error description
An instance of an MQ Managed File Transfer highly available (MFT HA) agent is running on Linux. While the agent instance is running, it becomes disconnected from its agent queue manager. When this happens, the CPU usage of the agent instance process remains at 100% until the instance reconnects.
Local fix
Problem summary
**************************************************************** USERS AFFECTED: This issue affects all users of MQ Managed File Transfer highly available (MFT HA) agents. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: A highly available Managed File Transfer agent consists of: - One active instance. - One or more standby instances. The first instance of the agent that starts up locks a shared resource (the SYSTEM.FTE.HA.agent_name queue on the agent queue manager). When the other instances start, they fail to obtain the lock and become a standby instance. The standby instances will then attempt to take the lock at regular intervals, as specified by the agent property standbyPollInterval - once a standby instance obtains the lock, then it becomes the active instance. After the active instance has locked the shared resource, it performs its normal startup operations and then starts processing managed transfers. Now, if the agent queue manager was stopped while the highly available agent was running, then the following sequence of events would occur: - The agent instances became disconnected and immediately tried to reconnect. - These reconnection attempts failed, because the queue manager was not running. - The agent instances immediately tried to reconnect again. - Once again, the reconnection attempts failed because the queue manager was unavailable, and so the instances tried to reconnect again straight away. - These reconnection attempts also failed. - The instances immediately tried to reconnect for a third time. And so on. Because the instances were trying to reconnect to the agent queue manager in a tight loop, they ended up consuming a lot of CPU.
Problem conclusion
To resolve this issue, MQ Managed File Transfer highly available agents have been updated to wait for the period of time specified by the agent property: standbyPollInterval before trying to reconnect to the agent queue manager (the default value of this property is 5 seconds, which means that agents will wait 5 seconds in between reconnection attempts). This ensures that the agent does not perform reconnection attempts in a tight loop, and so reduces the CPU usage of the agent instance processes. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.2 LTS 9.2.0.5 v9.x CD 9.2.5 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT38745
Reported component name
MQ BASE V9.2
Reported component ID
5724H7281
Reported release
920
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-10-20
Closed date
2021-11-24
Last modified date
2021-11-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
MQ BASE V9.2
Fixed component ID
5724H7281
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]
Document Information
Modified date:
25 November 2021