IBM Support

IT39640: Agents running queue resource monitors hang after being disconnected from their agent queue manager

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An IBM Managed File Transfer agent contains a queue resource
    monitor. When the agent's queue manager restarts, the agent
    writes the following message to its event log (output0.log)
    
    BFGAG0183I: The agent received MQI reason code 2009. Agent
    recovery will be initiated.
    
    However, it never reconnects to the queue manager and no futher
    messages are written to the event log.
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects users of IBM MQ Managed File Transfer, who
    have a queue resource monitor that is configured to trigger on
    complete groups.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    When a queue resource monitor is configured to trigger on
    complete groups, then during a poll the main resource monitor
    thread will create an internal worker thread to scan the queue.
    The worker thread is responsible for finding complete groups
    (and individual messages not in a group). After it has completed
    its scan, the worker thread passes details of those messages
    back to the main resource monitor thread for processing.
    
    Now, if the worker thread encountered an error while scanning
    the queue, it would end without notifying the main resource
    monitor thread. This resulted in the main thread becoming stuck,
    and no errors would be written to either the agent's event log
    (output0.log) or resource monitor event log (resmonevent0.log).
    
    If the agent subsequently entered recovery for an unrelated
    reason (such as the agent queue manager becoming unavailable),
    then the agent would write a message similar to the one shown
    below to its event log:
    
    BFGAG0183I: The agent received MQI reason code 2009. Agent
    recovery will be initiated.
    
    and then become blocked waiting for the main resource monitor
    thread to complete before it could finish its recovery
    processing. A Javacore taken at the time the agent had become
    stuck trying to complete its recovery processing would show two
    threads with the following stacks:
    
    "TriggerRecoveryThread" J9VMThread:0x00000000001E5500,
    omrthread_t:0x00007F23E941B118,
    java/lang/Thread:0x00000000FFF311E0, state:P, prio=5
    ...
    Java callstack:
      at sun/misc/Unsafe.park(Native Method)
      at
    java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.jav
    a)
      at
    java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionO
    bject.awaitNanos(AbstractQueuedSynchronizer.java)
      at
    java/util/concurrent/ThreadPoolExecutor.awaitTermination(ThreadP
    oolExecutor.java)
      at
    java/util/concurrent/Executors$DelegatedExecutorService.awaitTer
    mination(Executors.java)
      at
    com/ibm/wmqfte/monitor/management/MonitorCoordinator.stopMonitor
    (MonitorCoordinator.java)
      at
    com/ibm/wmqfte/monitor/management/MonitorManager.stopMonitor(Mon
    itorManager.java)
      at
    com/ibm/wmqfte/monitor/management/MonitorManager.immediateStopMo
    nitors(MonitorManager.java)
      at com/ibm/wmqfte/agent/Agent$4.run(Agent.java)
    ...
    
    "pool-3-thread-1" J9VMThread:0x00000000006C4700,
    omrthread_t:0x00007F2358105B00,
    java/lang/Thread:0x00000000C07B6928, state:CW, prio=5
    ...
    Waiting on:
    java/util/concurrent/LinkedBlockingQueue@0x00000000C085B580
    Owned by: <unowned>
    ...
    Java callstack:
      at java/lang/Object.wait(Native Method)
      at java/lang/Object.wait(Object.java:189)
      at
    com/ibm/wmqfte/monitor/resource/impl/MonitorResourceItemWorker.g
    etResourceItems(MonitorResourceItemWorker.java)
      at
    com/ibm/wmqfte/monitor/matcher/MonitorMatcher.executeWorker(Moni
    torMatcher.java)
      at
    com/ibm/wmqfte/monitor/management/MonitorWork.buildListOfResourc
    esUsingWorkerThread(MonitorWork.java)
      at
    com/ibm/wmqfte/monitor/management/MonitorWork.buildListOfResourc
    es(MonitorWork.java)
      at
    com/ibm/wmqfte/monitor/management/MonitorWork.execute(MonitorWor
    k.java)
      at
    com/ibm/wmqfte/monitor/management/MonitorImpl.run(MonitorImpl.ja
    va)
      at
    com/ibm/wmqfte/monitor/management/MonitorTimerTask.run(MonitorTi
    merTask.java)
    ...
    

Problem conclusion

  • To resolve this issue, the IBM MQ Managed File Transfer queue
    resource monitors have been updated so that the internal worker
    thread used to scan the queue will now notify the main resource
    monitor thread if an error occurs. This prevents the main
    resource monitor thread from becoming blocked, and allows it to
    report details of the error to the agent's event log
    (output0.log).
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.6
    v9.x CD    9.3.0.0
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT39640

  • Reported component name

    MQ BASE V9.2

  • Reported component ID

    5724H7281

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-01-13

  • Closed date

    2022-02-02

  • Last modified date

    2022-04-13

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ BASE V9.2

  • Fixed component ID

    5724H7281

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]

Document Information

Modified date:
14 April 2022