APAR status
Closed as program error.
Error description
An MQ 9.2 Managed File Transfer agent is acting as the source agent for managed transfers, where the managed transfers have a transfer recovery timeout of 30 seconds. Intermittently, the managed transfers processed by the agent go into recovery during the negotiation phase because the destination agent is not running. This causes the source agent to write the following messages to its event log (output0.log): BFGTR0068I: Transfer ID: <transfer identifier>, to destination agent <destination agent>, is entering recovery, immediately, due to recoverable error: BFGTR0064I: The transfer sender timed out waiting for a response from the transfer receiver. BFGTR0081I: Transfer ID: '<transfer identifier>' is now in recovering state. The agent will wait for 30 seconds for the transfer to recover. 30 seconds later, the destination agent still isn't running, and so the managed transfer times out. When this happens, the source agent writes the message: BFGSS0081E: Recovery of transfer ID: '<transfer_identifier>' timed out after 30 seconds. The managed transfer has been terminated. to its event log, and moves the managed transfer into a RecoveryTimedOut state. Shortly after the destination agent restarts, the source agent generates an FDC containing the information shown below: ... Thread: 38 (TransferCompletedWorker) Class: com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread Method: processCommand Probe: FFDC_001 Cause: com.ibm.wmqfte.cmdhandler.FTECommandHandlerErrorException: BFGCH0066E: An internal error has occurred. An exception has been reported by the State Store. The exception is java.lang.NullPointerException com.ibm.wmqfte.cmdhandler.FTECommandHandlerErrorException: BFGCH0066E: An internal error has occurred. An exception has been reported by the State Store. The exception is java.lang.NullPointerException at com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerStateStoreImpl.p rocessInternalCommand(FTECommandHandlerStateStoreImpl.java:...) at com.ibm.wmqfte.cmdhandler.impl.MessageHandlerImpl.internalMessag eHandler(MessageHandlerImpl.java:...) at com.ibm.wmqfte.cmdhandler.impl.MessageHandlerImpl.processCommand (MessageHandlerImpl.java:...) at com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread.pro cessCommand(FTECommandHandlerWorkerThread.java:...) at com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread.run (FTECommandHandlerWorkerThread.java:...) at java.lang.Thread.run(Thread.java:...) at com.ibm.wmqfte.thread.FTEThread.run(FTEThread.java:...) Caused by: java.lang.NullPointerException at com.ibm.wmqfte.transfer.impl.TransferSenderRunnable.auditComplet e(TransferSenderRunnable.java:...) at com.ibm.wmqfte.transfer.impl.TransferSenderImpl.onComplete(Trans ferSenderImpl.java:...) at com.ibm.wmqfte.statestore.impl.FTEStateStoreImpl.sendForgetToFar End(FTEStateStoreImpl.java:...) at com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerStateStoreImpl.p rocessInternalCommand(FTECommandHandlerStateStoreImpl.java:...) ... 6 more and the managed transfer gets stuck in the RecoveryTimedOut state.
Local fix
n/a
Problem summary
**************************************************************** USERS AFFECTED: This issue affects users of MQ 9.2 Managed File Transfer, who have enabled the transfer recovery timeout functionality for managed transfers. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: The MQ Managed File Transfer recovery timeout functionality provides a mechanism to automatically fail a managed transfer if it enters recovery, and stays in recovery for longer than the specified timeout period. Some information on this can be found in the topic: Transfer recovery timeout concepts in the MQ 9.2 section of IBM Documentation. The URI for the topic is: https://www.ibm.com/docs/en/ibm-mq/9.2?topic=transfers-transfer- recovery-timeout-concepts Now, if an agent was acting as the source agent for a managed transfer that had a transfer recovery timeout specified, and the managed transfer entered recovery during the transfer negotiation phase because the destination agent did not respond, then the source agent would write the following messages to its event log (output0.log): BFGTR0068I: Transfer ID: <transfer identifier>, to destination agent <destination agent>, is entering recovery, immediately, due to recoverable error: BFGTR0064I: The transfer sender timed out waiting for a response from the transfer receiver. BFGTR0081I: Transfer ID: '<transfer identifier>' is now in recovering state. The agent will wait for <timeout> seconds for the transfer to recover. If the managed transfer subsequently timed out due to the transfer recovery timeout, the source agent moved the managed transfer into a "RecoveryTimedOut" state, and wrote the message: BFGSS0081E: Recovery of transfer ID: '<transfer_identifier>' timed out after <timeout> seconds. The managed transfer has been terminated. to its event log. When the destination agent eventually responded to the transfer negotiation message, the source agent started to perform the required cleanup for the managed transfer before cancelling it. As part of this processing, it tried to generate some internal audit information for the managed transfer based on the items that had been transferred. However, because the managed transfer hadn't actually started, the attempt to do this failed and the source agent generated an FDC containing the following information: ... Thread: 38 (TransferCompletedWorker) Class: com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread Method: processCommand Probe: FFDC_001 Cause: com.ibm.wmqfte.cmdhandler.FTECommandHandlerErrorException: BFGCH0066E: An internal error has occurred. An exception has been reported by the State Store. The exception is java.lang.NullPointerException com.ibm.wmqfte.cmdhandler.FTECommandHandlerErrorException: BFGCH0066E: An internal error has occurred. An exception has been reported by the State Store. The exception is java.lang.NullPointerException 	at com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerStateStoreImpl.p rocessInternalCommand(FTECommandHandlerStateStoreImpl.java:...) 	at com.ibm.wmqfte.cmdhandler.impl.MessageHandlerImpl.internalMessag eHandler(MessageHandlerImpl.java:...) 	at com.ibm.wmqfte.cmdhandler.impl.MessageHandlerImpl.processCommand (MessageHandlerImpl.java:...) 	at com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread.pro cessCommand(FTECommandHandlerWorkerThread.java:...) 	at com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread.run (FTECommandHandlerWorkerThread.java:...) 	at java.lang.Thread.run(Thread.java:...) 	at com.ibm.wmqfte.thread.FTEThread.run(FTEThread.java:...) Caused by: java.lang.NullPointerException 	at com.ibm.wmqfte.transfer.impl.TransferSenderRunnable.auditComplet e(TransferSenderRunnable.java:...) 	at com.ibm.wmqfte.transfer.impl.TransferSenderImpl.onComplete(Trans ferSenderImpl.java:...) 	at com.ibm.wmqfte.statestore.impl.FTEStateStoreImpl.sendForgetToFar End(FTEStateStoreImpl.java:...) 	at com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerStateStoreImpl.p rocessInternalCommand(FTECommandHandlerStateStoreImpl.java:...) 	... 6 more This caused the managed transfer to remain registered in the agent's internal state store, where it occupied a slot in the "source transfer list". This meant that the number of source transfers that the agent could participate in, as specified by the agent property: maxSourceTransfers was reduced by 1. If the issue occurred multiple times during the lifetime of an agent, it was possible for all of the source transfer slots to be occupied by managed transfers that had encountered the issue. This would result in the agent being unable to act as the source agent for any new managed transfers.
Problem conclusion
To resolve this issue, two changes have been made: Change 1: ----------- The first change modifies the logic used to handle managed transfers that have timed out due to a transfer recovery timeout, so that the source agent only collects audit information for the items in the managed transfer if the managed transfer actually started before it entered recovery and timed out. This prevents the FDC containing the NullPointerException reported in the APAR from occurring. Change 2: ----------- In addition to this, the design of the transfer recovery timeout functionality has been changed, so that managed transfers which have entered a RecoveryTimedOut state are moved into an "inactive transfers list". This allows the source agent to participate in other managed transfer requests while it is waiting to perform the processing required to clean up state for managed transfers that have entered recovery and timed out. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.2 LTS 9.2.0.5 v9.x CD 9.2.5 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT37869
Reported component name
MQ BASE V9.2
Reported component ID
5724H7281
Reported release
920
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-08-03
Closed date
2021-11-29
Last modified date
2021-11-29
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
MQ BASE V9.2
Fixed component ID
5724H7281
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]
Document Information
Modified date:
30 November 2021