IBM Support

IT34867: Managed File Transfer agents hang when a managed transfer goes into recovery and subsequently times out.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • When using the "transfer recovery timeout" functionality, an MQ
    Managed File Transfer agent that is acting as the source agent
    for managed transfers writes the following messages to its event
    log when a managed transfer enters recovery:
    
    BFGTR0068I: Transfer ID: <transfer_identifier>, to destination
    agent <destination_agent>, is entering recovery, immediately,
    due to recoverable error: BFGTR0064I: The transfer sender timed
    out waiting for a
    response from the transfer receiver
    BFGTR0081I: Transfer ID: <transfer_identifier> is now in a
    recovering state. The agent will wait for <number> seconds for
    the transfer to recover.
    
    After the timeout period specified in the BFGTR0081I message,
    the managed transfer has not restarted and has not been
    cancelled. Shortly afterwards, the agent stops responding to
    commands, and no longer processes any new managed transfer
    requests.
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This affects users of IBM MQ 9.1 Managed File Transfer who have
    enabled the "transfer recovery timeout" functionality.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    The MQ Managed File Transfer "transfer recovery timeout"
    functionality allows users to specify the behaviour of a source
    agent if a managed transfer goes into recovery. The
    functionality provides three possible options:
    
    1. The agent will continue to try and recover the managed
    transfer indefinitely. This the default behaviour if the
    "transfer timeout recovery" functionality has not been enabled.
    2. The agent will fail the managed transfer as soon as it goes
    into recovery.
    3. The agent will keep trying to recover the managed transfer
    for the specified amount of time. If the managed transfer is
    still in recovery when the timeout period elapses, the agent
    will mark it as "Failed".
    
    More information on this can be found in the "Setting a timeout
    for recovery of stalled transfers" topic in the MQ sections of
    IBM Knowledge Center. The URI for this topic in the MQ 9.1.x
    section of IBM Knowledge Center is
    https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.1.0/com.
    ibm.mq.adm.doc/recovery_timeout.html.
    
    Now, if the "transfer recovery timeout" functionality has been
    enabled to try and recover a managed transfer for a specified
    period of time, then when a managed transfer enters recovery the
    source agent for that managed transfer will write the message:
    
    BFGTR0081I: Transfer ID: '<transfer_identifier>' is now in
    recovering state. The agent will wait for <timeout> seconds for
    the transfer to recover.
    
    to its event log (output0.log) and start a new internal thread,
    called "TransferRecovery[<transfer_identifier>]". If the managed
    transfer resumes during the specified timeout period, the source
    agent writes the message:
    
    BFGTR0082I: Transfer ID: '<transfer_identifier>' that was in a
    recovering state, has now resumed.
    
    to its event log and stops the internal
    "TransferRecovery[<transfer_identifier>]". However, if the
    managed transfer has not restarted by the time the timeout
    period elapses, then the
    "TransferRecovery[<transfer_identifier>]" thread will stop the
    managed transfer and mark it as "Failed".
    
    Now, if:
    
    - A managed transfer went into recovery, and the "transfer
    recovery timeout" functionality was enabled to try and recover
    the managed transfer for a specified period of time.
    - And the managed transfer came out of recovery at the same time
    as the transfer recovery timeout elapsed.
    
    then it was possible for the following sequence of events to
    occur:
    
    - An internal Transfer Sender thread (called
    "TransferSender[<transfer_identifier>]") for the managed
    transfer started up and began the process of restarting the
    managed transfer.
    - The internal "TransferRecovery[<transfer_identifier>]" thread
    locked the entry for the managed transfer in the agent's state
    store, updated the entry to show that the managed transfer had
    now entered a "FailedTransferEnding" state and then waited for
    the "TransferSender[<transfer_identifier>]" thread to stop.
    - Shortly afterwards, the
    "TransferSender[<transfer_identifier>]" thread tried to lock the
    entry for the managed transfer in the agent's state store, and
    became blocked as the entry was locked by the
    "TransferRecovery<transfer_identifier>" thread.
    
    This resulted in the managed transfer getting stuck and never
    completing.
    
    A Javacore collected from the agent after the issue had occurred
    would show entries similar to the ones shown below:
    
    ------------------------------------------------------------
    3XMTHREADINFO
    "TransferRecovery[414d51204d46545037202020202020200cfc985f5ca83d
    2b]" J9VMThread:0x00000000022FC600,
    omrthread_t:0x00007FE7584F3940,
    java/lang/Thread:0x0000000087A895F8, state:CW, prio=5
    3XMJAVALTHREAD            (java/lang/Thread getId:0x754C1,
    isDaemon:false)
    3XMTHREADINFO1            (native thread ID:0x102B, native
    priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread
    flags:0x00000181)
    3XMTHREADINFO2            (native stack address range
    from:0x00007FE73C529000, to:0x00007FE73C569000, size:0x40000)
    3XMCPUTIME               CPU usage total: 0.018578606 secs,
    current category="Application"
    3XMTHREADBLOCK     Waiting on:
    com/ibm/wmqfte/thread/FTEThread@0x0000000087A89578 Owned by:
    <unowned>
    3XMHEAPALLOC             Heap bytes allocated since last GC
    cycle=0 (0x0)
    3XMTHREADINFO3           Java callstack:
    4XESTACKTRACE                at java/lang/Object.wait(Native
    Method)
    4XESTACKTRACE                at
    java/lang/Object.wait(Object.java:218(Compiled Code))
    4XESTACKTRACE                at
    java/lang/Thread.join(Thread.java:700(Compiled Code))
    5XESTACKTRACE                   (entered lock:
    com/ibm/wmqfte/thread/FTEThread@0x0000000087A89578, entry count:
    1)
    4XESTACKTRACE                at
    com/ibm/wmqfte/transfer/impl/TransferSenderImpl.stop(TransferSen
    derImpl.java:269(Compiled Code))
    4XESTACKTRACE                at
    com/ibm/wmqfte/statestore/impl/FTEStateStoreImpl.failTransferonR
    ecoveryTimeout(FTEStateStoreImpl.java:6186)
    4XESTACKTRACE                at
    com/ibm/wmqfte/statestore/impl/FTEStateStoreImpl.handleRecoveryT
    imeout(FTEStateStoreImpl.java:6024)
    4XESTACKTRACE                at
    com/ibm/wmqfte/transfer/impl/TransferRecoveryRunnable.run(Transf
    erRecoveryRunnable.java:119(Compiled Code))
    5XESTACKTRACE                   (entered lock:
    com/ibm/wmqfte/transfer/impl/TransferRecoveryRunnable@0x00000000
    87B57800, entry count: 1)
    4XESTACKTRACE                at
    java/lang/Thread.run(Thread.java:820(Compiled Code))
    4XESTACKTRACE                at
    com/ibm/wmqfte/thread/FTEThread.run(FTEThread.java:70(Compiled
    Code))
    
    ------------------------------------------------------------
    3XMTHREADINFO
    "TransferSender[414d51204d46545037202020202020200cfc985f5ca83d2b
    ]" J9VMThread:0x0000000002319600,
    omrthread_t:0x00007FE76C16DE98,
    java/lang/Thread:0x0000000087A89578, state:P, prio=5
    3XMJAVALTHREAD            (java/lang/Thread getId:0x754C0,
    isDaemon:false)
    3XMTHREADINFO1            (native thread ID:0x102A, native
    priority:0x5, native policy:UNKNOWN, vmstate:P, vm thread
    flags:0x00020081)
    3XMTHREADINFO2            (native stack address range
    from:0x00007FE7763BA000, to:0x00007FE7763FA000, size:0x40000)
    3XMCPUTIME               CPU usage total: 0.011055033 secs,
    current category="Application"
    3XMTHREADBLOCK     Parked on:
    java/util/concurrent/locks/ReentrantLock$FairSync@0x0000000086C9
    0588 Owned by:
    "TransferRecovery[414d51204d46545037202020202020200cfc985f5ca83d
    2b]" (J9VMThread:0x00000000022FC600,
    java/lang/Thread:0x0000000087A895F8)
    3XMHEAPALLOC             Heap bytes allocated since last GC
    cycle=0 (0x0)
    3XMTHREADINFO3           Java callstack:
    4XESTACKTRACE                at sun/misc/Unsafe.park(Native
    Method)
    4XESTACKTRACE                at
    java/util/concurrent/locks/LockSupport.park(LockSupport.java:186
    (Compiled Code))
    4XESTACKTRACE                at
    java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndChe
    ckInterrupt(AbstractQueuedSynchronizer.java:847(Compiled Code))
    4XESTACKTRACE                at
    java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireQue
    ued(AbstractQueuedSynchronizer.java:881(Compiled Code))
    4XESTACKTRACE                at
    java/util/concurrent/locks/AbstractQueuedSynchronizer.acquire(Ab
    stractQueuedSynchronizer.java:1210(Compiled Code))
    4XESTACKTRACE                at
    java/util/concurrent/locks/ReentrantLock$FairSync.lock(Reentrant
    Lock.java:235(Compiled Code))
    4XESTACKTRACE                at
    java/util/concurrent/locks/ReentrantLock.lock(ReentrantLock.java
    :296(Compiled Code))
    4XESTACKTRACE                at
    com/ibm/wmqfte/statestore/impl/FTEMutableCommonState.lock(FTEMut
    ableCommonState.java:735(Compiled Code))
    4XESTACKTRACE                at
    com/ibm/wmqfte/statestore/impl/FTEStateStorePersistence.getMutab
    leSenderState(FTEStateStorePersistence.java:1763(Compiled Code))
    4XESTACKTRACE                at
    com/ibm/wmqfte/statestore/impl/FTEStateStoreImpl.senderTransferR
    ecovered(FTEStateStoreImpl.java:2815)
    4XESTACKTRACE                at
    com/ibm/wmqfte/transfer/impl/TransferSenderRunnable.transferReco
    vered(TransferSenderRunnable.java:1568)
    4XESTACKTRACE                at
    com/ibm/wmqfte/transfer/impl/TransferSenderRunnable.run(Transfer
    SenderRunnable.java:568(Compiled Code))
    4XESTACKTRACE                at
    java/lang/Thread.run(Thread.java:820(Compiled Code))
    4XESTACKTRACE                at
    com/ibm/wmqfte/thread/FTEThread.run(FTEThread.java:70(Compiled
    Code))
    ------------------------------------------------------------
    
    If, after the issue had occurred, the agent's internal
    CommandHandler thread received another message for this managed
    transfer (such as a Cancel Transfer request), then this thread
    would also become blocked as it would try to access the entry
    for the managed transfer in the agent's state store, which was
    locked by the TransferRecovery[<transfer_identifier>] thread. In
    this situation, the agent would be unable to process any new
    managed transfer requests, or complete any in-flight managed
    transfers.
    

Problem conclusion

  • To resolve this issue, the transfer recovery processing has been
    updated so that the internal TransferRecovery thread for a
    managed transfer releases any locks it holds on the entry for
    that managed transfer in the agent's state store before waiting
    for the associated TransferSender thread to stop. This allows
    the TransferSender thread to proceed and then stop in a timely
    manner, which in turn means that the TransferRecovery thread can
    fail the managed transfer as expected.
    
    Note: This issue only affects MQ 9.1 Managed File Transfer. It
    does not occur in MQ 9.2, as the issue has been fixed in MQ
    9.2.0.0 GA.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.1 LTS   9.1.0.9
    
    The latest available MQ maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT34867

  • Reported component name

    IBM MQ MFT V9.1

  • Reported component ID

    5724H7272

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-11-10

  • Closed date

    2021-06-24

  • Last modified date

    2021-11-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM MQ MFT V9.1

  • Fixed component ID

    5724H7272

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910"}]

Document Information

Modified date:
09 November 2021