APAR status
Closed as program error.
Error description
An MQ Managed File Transfer agent that is hosting one or more resource monitors is observed to hang after the agent queue manager is stopped. The agent's event log (output0.log) reports the following: BFGAG0183I: The agent received MQI reason code 2161. Agent recovery will be initiated. BFGAG0183I: The agent received MQI reason code 2538. Agent recovery will be initiated. BFGAG0183I: The agent received MQI reason code 2009. Agent recovery will be initiated. After the last BFGAG0183I, it stops recovering but the agent JVM is still running. A Javacore (thread dump) take of the agent JVM shows the following threads of interest to this issue: "TriggerRecoveryThread" J9VMThread:0x0000000002460200, state:CW, prio=5 Waiting on: com/ibm/wmqfte/monitor/management/MonitorImpl@0x00000000E0A1C3A8 Owned by: <unowned> Heap bytes allocated since last GC cycle=0 (0x0) Java callstack: at java/lang/Object.wait at java/lang/Object.wait at com/ibm/wmqfte/monitor/management/MonitorImpl.waitCompletion at com/ibm/wmqfte/monitor/management/MonitorCoordinator.stopMonitor at com/ibm/wmqfte/monitor/management/MonitorManager.stopMonitor at com/ibm/wmqfte/monitor/management/MonitorManager.immediateStopMo nitors at com/ibm/wmqfte/agent/Agent$4.run at java/lang/Thread.run at com/ibm/wmqfte/thread/FTEThread.run "Timer-3" J9VMThread:0x000000000246A800, state:P Parked on: java/util/concurrent/locks/ReentrantLock$NonfairSync@0x00000000E 05EC360 Owned by: "TriggerRecoveryThread" (J9VMThread:0x0000000002460200) Java callstack: at sun/misc/Unsafe.park at java/util/concurrent/locks/LockSupport.park at java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndChe ckInterrupt at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireQue ued at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquire at java/util/concurrent/locks/ReentrantLock$NonfairSync.lock at java/util/concurrent/locks/ReentrantLock.lock at com/ibm/wmqfte/monitor/management/MonitorManager.stopMonitor at com/ibm/wmqfte/monitor/management/MonitorWork.stopMonitor at com/ibm/wmqfte/monitor/management/MonitorWork.execute at com/ibm/wmqfte/monitor/management/MonitorImpl.run at com/ibm/wmqfte/monitor/management/MonitorTimerTask.run (entered lock: com/ibm/wmqfte/monitor/management/MonitorTimerTask@0x00000000E0A 27038, entry count: 1) at java/util/TimerThread.mainLoop at java/util/TimerThread.run
Local fix
Terminate the agent's JVM process and restart the agent using fteStartAgent.
Problem summary
**************************************************************** USERS AFFECTED: This issue affects IBM MQ Managed File Transfer agents that host resource monitors. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: APAR IT25113 introduced additional locking within the IBM MQ Managed File Transfer agent code to serialize start and stop operations on resource monitors. https://www-01.ibm.com/support/docview.wss?uid=swg1IT25113 This changed inadvertently caused a deadlock to occur between a recovery thread, created when a connection error was detected, and a thread that was executing a resource monitor. The recovery thread took a lock introduced in IT25113 and was waiting for all resource monitor threads to end before continuing with the recovery processing. However the resource monitor thread was waiting for the same lock after detecting its own connection error to the agent queue manager. The result is that the resource monitor thread was left waiting indefinitely for the new lock and the recovery thread was waiting indefinitely for the resource monitor thread to end.
Problem conclusion
The additional locking that was added under APAR IT25113 has been removed. The java.lang.NullPointerException documented in APAR IT25113 has now resolved by ensuring shutdown threads do not remove resource monitor definitions from an internal registry on shutdown. The registry is built on initial agent startup and following agent recovery. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.0 LTS 9.0.0.8 v9.1 CD 9.1.4 v9.1 LTS 9.1.0.4 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT30148
Reported component name
MQ APPLIANCE M2
Reported component ID
5737H4700
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-08-29
Closed date
2019-09-24
Last modified date
2019-09-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
MQ APPLIANCE M2
Fixed component ID
5737H4700
Applicable component levels
R910 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SS5K6E","label":"IBM MQ Appliance"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]
Document Information
Modified date:
24 September 2019