APAR status
Closed as program error.
Error description
When using the "transfer recovery timeout" functionality, an MQ Managed File Transfer agent that is acting as the source agent for managed transfers writes the following messages to its event log when a managed transfer enters recovery: BFGTR0068I: Transfer ID: <transfer_identifier>, to destination agent <destination_agent>, is entering recovery, immediately, due to recoverable error: BFGTR0064I: The transfer sender timed out waiting for a response from the transfer receiver BFGTR0081I: Transfer ID: <transfer_identifier> is now in a recovering state. The agent will wait for <number> seconds for the transfer to recover. After the timeout period specified in the BFGTR0081I message, the managed transfer has not restarted and has not been cancelled. Shortly afterwards, the agent stops responding to commands, and no longer processes any new managed transfer requests.
Local fix
Problem summary
**************************************************************** USERS AFFECTED: This affects users of IBM MQ 9.1 Managed File Transfer who have enabled the "transfer recovery timeout" functionality. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: The MQ Managed File Transfer "transfer recovery timeout" functionality allows users to specify the behaviour of a source agent if a managed transfer goes into recovery. The functionality provides three possible options: 1. The agent will continue to try and recover the managed transfer indefinitely. This the default behaviour if the "transfer timeout recovery" functionality has not been enabled. 2. The agent will fail the managed transfer as soon as it goes into recovery. 3. The agent will keep trying to recover the managed transfer for the specified amount of time. If the managed transfer is still in recovery when the timeout period elapses, the agent will mark it as "Failed". More information on this can be found in the "Setting a timeout for recovery of stalled transfers" topic in the MQ sections of IBM Knowledge Center. The URI for this topic in the MQ 9.1.x section of IBM Knowledge Center is https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.1.0/com. ibm.mq.adm.doc/recovery_timeout.html. Now, if the "transfer recovery timeout" functionality has been enabled to try and recover a managed transfer for a specified period of time, then when a managed transfer enters recovery the source agent for that managed transfer will write the message: BFGTR0081I: Transfer ID: '<transfer_identifier>' is now in recovering state. The agent will wait for <timeout> seconds for the transfer to recover. to its event log (output0.log) and start a new internal thread, called "TransferRecovery[<transfer_identifier>]". If the managed transfer resumes during the specified timeout period, the source agent writes the message: BFGTR0082I: Transfer ID: '<transfer_identifier>' that was in a recovering state, has now resumed. to its event log and stops the internal "TransferRecovery[<transfer_identifier>]". However, if the managed transfer has not restarted by the time the timeout period elapses, then the "TransferRecovery[<transfer_identifier>]" thread will stop the managed transfer and mark it as "Failed". Now, if: - A managed transfer went into recovery, and the "transfer recovery timeout" functionality was enabled to try and recover the managed transfer for a specified period of time. - And the managed transfer came out of recovery at the same time as the transfer recovery timeout elapsed. then it was possible for the following sequence of events to occur: - An internal Transfer Sender thread (called "TransferSender[<transfer_identifier>]") for the managed transfer started up and began the process of restarting the managed transfer. - The internal "TransferRecovery[<transfer_identifier>]" thread locked the entry for the managed transfer in the agent's state store, updated the entry to show that the managed transfer had now entered a "FailedTransferEnding" state and then waited for the "TransferSender[<transfer_identifier>]" thread to stop. - Shortly afterwards, the "TransferSender[<transfer_identifier>]" thread tried to lock the entry for the managed transfer in the agent's state store, and became blocked as the entry was locked by the "TransferRecovery<transfer_identifier>" thread. This resulted in the managed transfer getting stuck and never completing. A Javacore collected from the agent after the issue had occurred would show entries similar to the ones shown below: ------------------------------------------------------------ 3XMTHREADINFO "TransferRecovery[414d51204d46545037202020202020200cfc985f5ca83d 2b]" J9VMThread:0x00000000022FC600, omrthread_t:0x00007FE7584F3940, java/lang/Thread:0x0000000087A895F8, state:CW, prio=5 3XMJAVALTHREAD (java/lang/Thread getId:0x754C1, isDaemon:false) 3XMTHREADINFO1 (native thread ID:0x102B, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000181) 3XMTHREADINFO2 (native stack address range from:0x00007FE73C529000, to:0x00007FE73C569000, size:0x40000) 3XMCPUTIME CPU usage total: 0.018578606 secs, current category="Application" 3XMTHREADBLOCK Waiting on: com/ibm/wmqfte/thread/FTEThread@0x0000000087A89578 Owned by: <unowned> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0) 3XMTHREADINFO3 Java callstack: 4XESTACKTRACE at java/lang/Object.wait(Native Method) 4XESTACKTRACE at java/lang/Object.wait(Object.java:218(Compiled Code)) 4XESTACKTRACE at java/lang/Thread.join(Thread.java:700(Compiled Code)) 5XESTACKTRACE (entered lock: com/ibm/wmqfte/thread/FTEThread@0x0000000087A89578, entry count: 1) 4XESTACKTRACE at com/ibm/wmqfte/transfer/impl/TransferSenderImpl.stop(TransferSen derImpl.java:269(Compiled Code)) 4XESTACKTRACE at com/ibm/wmqfte/statestore/impl/FTEStateStoreImpl.failTransferonR ecoveryTimeout(FTEStateStoreImpl.java:6186) 4XESTACKTRACE at com/ibm/wmqfte/statestore/impl/FTEStateStoreImpl.handleRecoveryT imeout(FTEStateStoreImpl.java:6024) 4XESTACKTRACE at com/ibm/wmqfte/transfer/impl/TransferRecoveryRunnable.run(Transf erRecoveryRunnable.java:119(Compiled Code)) 5XESTACKTRACE (entered lock: com/ibm/wmqfte/transfer/impl/TransferRecoveryRunnable@0x00000000 87B57800, entry count: 1) 4XESTACKTRACE at java/lang/Thread.run(Thread.java:820(Compiled Code)) 4XESTACKTRACE at com/ibm/wmqfte/thread/FTEThread.run(FTEThread.java:70(Compiled Code)) ------------------------------------------------------------ 3XMTHREADINFO "TransferSender[414d51204d46545037202020202020200cfc985f5ca83d2b ]" J9VMThread:0x0000000002319600, omrthread_t:0x00007FE76C16DE98, java/lang/Thread:0x0000000087A89578, state:P, prio=5 3XMJAVALTHREAD (java/lang/Thread getId:0x754C0, isDaemon:false) 3XMTHREADINFO1 (native thread ID:0x102A, native priority:0x5, native policy:UNKNOWN, vmstate:P, vm thread flags:0x00020081) 3XMTHREADINFO2 (native stack address range from:0x00007FE7763BA000, to:0x00007FE7763FA000, size:0x40000) 3XMCPUTIME CPU usage total: 0.011055033 secs, current category="Application" 3XMTHREADBLOCK Parked on: java/util/concurrent/locks/ReentrantLock$FairSync@0x0000000086C9 0588 Owned by: "TransferRecovery[414d51204d46545037202020202020200cfc985f5ca83d 2b]" (J9VMThread:0x00000000022FC600, java/lang/Thread:0x0000000087A895F8) 3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0) 3XMTHREADINFO3 Java callstack: 4XESTACKTRACE at sun/misc/Unsafe.park(Native Method) 4XESTACKTRACE at java/util/concurrent/locks/LockSupport.park(LockSupport.java:186 (Compiled Code)) 4XESTACKTRACE at java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndChe ckInterrupt(AbstractQueuedSynchronizer.java:847(Compiled Code)) 4XESTACKTRACE at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireQue ued(AbstractQueuedSynchronizer.java:881(Compiled Code)) 4XESTACKTRACE at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquire(Ab stractQueuedSynchronizer.java:1210(Compiled Code)) 4XESTACKTRACE at java/util/concurrent/locks/ReentrantLock$FairSync.lock(Reentrant Lock.java:235(Compiled Code)) 4XESTACKTRACE at java/util/concurrent/locks/ReentrantLock.lock(ReentrantLock.java :296(Compiled Code)) 4XESTACKTRACE at com/ibm/wmqfte/statestore/impl/FTEMutableCommonState.lock(FTEMut ableCommonState.java:735(Compiled Code)) 4XESTACKTRACE at com/ibm/wmqfte/statestore/impl/FTEStateStorePersistence.getMutab leSenderState(FTEStateStorePersistence.java:1763(Compiled Code)) 4XESTACKTRACE at com/ibm/wmqfte/statestore/impl/FTEStateStoreImpl.senderTransferR ecovered(FTEStateStoreImpl.java:2815) 4XESTACKTRACE at com/ibm/wmqfte/transfer/impl/TransferSenderRunnable.transferReco vered(TransferSenderRunnable.java:1568) 4XESTACKTRACE at com/ibm/wmqfte/transfer/impl/TransferSenderRunnable.run(Transfer SenderRunnable.java:568(Compiled Code)) 4XESTACKTRACE at java/lang/Thread.run(Thread.java:820(Compiled Code)) 4XESTACKTRACE at com/ibm/wmqfte/thread/FTEThread.run(FTEThread.java:70(Compiled Code)) ------------------------------------------------------------ If, after the issue had occurred, the agent's internal CommandHandler thread received another message for this managed transfer (such as a Cancel Transfer request), then this thread would also become blocked as it would try to access the entry for the managed transfer in the agent's state store, which was locked by the TransferRecovery[<transfer_identifier>] thread. In this situation, the agent would be unable to process any new managed transfer requests, or complete any in-flight managed transfers.
Problem conclusion
To resolve this issue, the transfer recovery processing has been updated so that the internal TransferRecovery thread for a managed transfer releases any locks it holds on the entry for that managed transfer in the agent's state store before waiting for the associated TransferSender thread to stop. This allows the TransferSender thread to proceed and then stop in a timely manner, which in turn means that the TransferRecovery thread can fail the managed transfer as expected. Note: This issue only affects MQ 9.1 Managed File Transfer. It does not occur in MQ 9.2, as the issue has been fixed in MQ 9.2.0.0 GA. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.1 LTS 9.1.0.9 The latest available MQ maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT34867
Reported component name
IBM MQ MFT V9.1
Reported component ID
5724H7272
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-11-10
Closed date
2021-06-24
Last modified date
2021-11-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM MQ MFT V9.1
Fixed component ID
5724H7272
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910"}]
Document Information
Modified date:
09 November 2021