APAR status
Closed as program error.
Error description
An MQ Managed File Transfer agent is acting as the source agent for a message-to-file transfer, where: - The message-to-file transfer contains multiple items. - And each item transfers an individual message to a unique file. - And the -de (destination file behaviour) parameter for each item is set to "Error". - And the -sd (source disposition) parameter for each item is set to "Delete". Partway through the managed transfer, the source agent abends due to an OutOfMemoryError and shuts down. When the source agent is restarted, the message-to-file resumes and completes. However, even though all of the destination files have been written successfully, one of the transfer items is marked as "Failed" with the supplementary message: BFGIO0006E: File "<destination filename>" already exists. As a result, the message for this item is not deleted from the source queue.
Local fix
1. use enableMemoryAllocationChecking=true to enabled memory allocation checking before new transfers are submitted 2. increase the Java heap size available to the agent on startup
Problem summary
**************************************************************** USERS AFFECTED: This issue affects users of: - MQ 9.1 Managed File Transfer - MQ 9.2 Managed File Transfer who have agents that: - Process managed transfer requests containing multiple transfer items. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: During a managed transfer, the source agent reads data for a transfer item and sends it to the destination agent. The destination agent receives the data, and writes it to the destination item. As part of this processing, the two agents store checkpoint records - these contain information about the amount of data that has been transferred. If the managed transfer goes into recovery before it has finished, the source and destination agents use their respective checkpoint records to determine where the managed transfer should be restarted from. Checkpoints are taken at regular intervals, based on the value of the following agent properties: - agentCheckpointInterval 	 - agentChunkSize 	 - agentFrameSize 	 - agentWindowSize At the start of a managed transfer, the source and destination agents perform some negotiation to determine what values to use for these properties, and when to take checkpoints. In addition to this: - The source agent will take a checkpoint after it has sent the last piece of data for a transfer item to the destination agent. - And the destination agent will take a checkpoint after it has received the last piece of data for a transfer item, and written it out successfully. The checkpoint records are stored in messages on the SYSTEM.FTE.STATE.agent_name queue. Both the source and destination agents will have a single message for a managed transfer on their respective queues, and each message will contain a maximum of three checkpoint records. For more information on the agent properties mentioned above, see the "The MFT agent.properties file" topic in the MQ sections of IBM Documentation. For reference, the URI of this topic in the MQ 9.2 section of IBM Documentation is https://www.ibm.com/docs/en/ibm-mq/9.2?topic=reference-mft-agent properties-file Now, when the issue reported in this APAR occurred, the source agent stopped unexpectedly due to an abend: - After it had sent the last piece of data for a transfer item to the destination agent. - And before it had written the checkpoint record indicating that the data had been sent. When the destination agent received the data, it wrote it to a temporary file, renamed that file to be the specified destination filename and stored a checkpoint record indicating that the data had been processed successfully. At this point, the destination agent had one more checkpoint than the source agent, as the source agent had stopped due to an abend before it could write its checkpoint. When the source agent restarted, it contacted the destination agent to determine where the managed transfer should be restarted from. Because the destination agent had one more checkpoint than the source agent, the source agent decided to restart the managed transfer from its checkpoint and resend the data for the transfer item. The destination agent received the data and wrote it to a temporary file. However, when it tried to rename that file to be the specified destination filename, it was unable to do so as the destination file already existed. As the managed transfer request was submitted with the -de (destination file behaviour) parameter set to "Error" for this transfer item, , the destination agent marked the item as "Failed" with the supplementary information: BFGIO0006E: File "<destination filename>" already exists.
Problem conclusion
To resolve this issue, MQ Managed File Transfer has been updated so that: - If a managed transfer goes into, and then comes out of, recovery due to the source agent stopping unexpectedly. - And, during the recovery processing for a managed transfer, the source agent determines that it has written one less checkpoint record that the destination agent. then the source agent will create a dummy checkpoint record and restart from the position requested by the destination agent. This ensures that the source agent doesn't resend any data that has already been processed by the destination agent, which prevents the issue reported in this APAR from occurring. Because of the dummy checkpoint, the source agent is unable to confirm that the data that it sent before it abended is exactly the same as the data that was successfully written by the destination agent (for example, the source file might have changed while the source agent was down). Because of this, the source agent will mark any items that are affected in this way as "Failed", with the new supplementary message: BFGSS0090E: Failed to generate audit information for the source transfer item. If a managed transfer contains one or more transfer items that have been marked as "Failed" with this supplementary messages, systems administrators will have to perform manual checks to verify the destination item contains all of the expected data. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.1 LTS 9.1.0.10 v9.2 LTS 9.2.0.4 v9.x CD 9.2.4 The latest available MQ maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT36500
Reported component name
IBM MQ MFT V9.1
Reported component ID
5724H7272
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-04-08
Closed date
2021-07-23
Last modified date
2021-07-23
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM MQ MFT V9.1
Fixed component ID
5724H7272
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910"}]
Document Information
Modified date:
24 July 2021