IBM Support

IT30148: An MQ MFT agent running a resource monitor hangs when in recovery

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An MQ Managed File Transfer agent that is hosting one or more
    resource monitors is observed to hang after the agent queue
    manager is stopped.
    
    The agent's event log (output0.log) reports the following:
    
    BFGAG0183I: The agent received MQI reason code 2161. Agent
    recovery will be initiated.
    BFGAG0183I: The agent received MQI reason code 2538. Agent
    recovery will be initiated.
    BFGAG0183I: The agent received MQI reason code 2009. Agent
    recovery will be initiated.
    
    After the last BFGAG0183I, it stops recovering but the agent JVM
    is still running.
    
    A Javacore (thread dump) take of the agent JVM shows the
    following threads of interest to this issue:
    
    "TriggerRecoveryThread" J9VMThread:0x0000000002460200, state:CW,
    prio=5
    Waiting on:
    com/ibm/wmqfte/monitor/management/MonitorImpl@0x00000000E0A1C3A8
     Owned by: <unowned>
          Heap bytes allocated since last GC cycle=0 (0x0)
          Java callstack:
              at java/lang/Object.wait
              at java/lang/Object.wait
              at
    com/ibm/wmqfte/monitor/management/MonitorImpl.waitCompletion
              at
    com/ibm/wmqfte/monitor/management/MonitorCoordinator.stopMonitor
              at
    com/ibm/wmqfte/monitor/management/MonitorManager.stopMonitor
              at
    com/ibm/wmqfte/monitor/management/MonitorManager.immediateStopMo
    nitors
              at com/ibm/wmqfte/agent/Agent$4.run
              at java/lang/Thread.run
              at com/ibm/wmqfte/thread/FTEThread.run
    
    "Timer-3" J9VMThread:0x000000000246A800, state:P
    Parked on:
    java/util/concurrent/locks/ReentrantLock$NonfairSync@0x00000000E
    05EC360
    Owned by: "TriggerRecoveryThread"
    (J9VMThread:0x0000000002460200)
          Java callstack:
              at sun/misc/Unsafe.park
              at java/util/concurrent/locks/LockSupport.park
              at
    java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndChe
    ckInterrupt
              at
    java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireQue
    ued
              at
    java/util/concurrent/locks/AbstractQueuedSynchronizer.acquire
              at
    java/util/concurrent/locks/ReentrantLock$NonfairSync.lock
              at java/util/concurrent/locks/ReentrantLock.lock
              at
    com/ibm/wmqfte/monitor/management/MonitorManager.stopMonitor
              at
    com/ibm/wmqfte/monitor/management/MonitorWork.stopMonitor
              at
    com/ibm/wmqfte/monitor/management/MonitorWork.execute
              at com/ibm/wmqfte/monitor/management/MonitorImpl.run
              at
    com/ibm/wmqfte/monitor/management/MonitorTimerTask.run
                 (entered lock:
    com/ibm/wmqfte/monitor/management/MonitorTimerTask@0x00000000E0A
    27038, entry count: 1)
              at java/util/TimerThread.mainLoop
              at java/util/TimerThread.run
    

Local fix

  • Terminate the agent's JVM process and restart the agent using
    fteStartAgent.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects IBM MQ Managed File Transfer agents that host
    resource monitors.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    APAR IT25113 introduced additional locking within the IBM MQ
    Managed File Transfer agent code to serialize start and stop
    operations on resource monitors.
    
    https://www-01.ibm.com/support/docview.wss?uid=swg1IT25113
    
    This changed inadvertently caused a deadlock to occur between a
    recovery thread, created when a connection error was detected,
    and a thread that was executing a resource monitor.  The
    recovery thread took a lock introduced in IT25113 and was
    waiting for all resource monitor threads to end before
    continuing with the recovery processing. However the resource
    monitor thread was waiting for the same lock after detecting its
    own connection error to the agent queue manager.
    
    The result is that the resource monitor thread was left waiting
    indefinitely for the new lock and the recovery thread was
    waiting indefinitely for the resource monitor thread to end.
    

Problem conclusion

  • The additional locking that was added under APAR IT25113 has
    been removed.  The java.lang.NullPointerException documented in
    APAR IT25113 has now resolved by ensuring shutdown threads do
    not remove resource monitor definitions from an internal
    registry on shutdown.  The registry is built on initial agent
    startup and following agent recovery.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.0 LTS   9.0.0.8
    v9.1 CD    9.1.4
    v9.1 LTS   9.1.0.4
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT30148

  • Reported component name

    MQ APPLIANCE M2

  • Reported component ID

    5737H4700

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-08-29

  • Closed date

    2019-09-24

  • Last modified date

    2019-09-24

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ APPLIANCE M2

  • Fixed component ID

    5737H4700

Applicable component levels

  • R910 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SS5K6E","label":"IBM MQ Appliance"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
24 September 2019