IBM Support

IT35599: Standby instance of a Managed File Transfer highly available agent shuts down after being disconnected from its queue manager

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An instance of an MQ Managed File Transfer highly available
    agent starts up, connects to its agent queue manager and becomes
    the standby instance. Shortly after the instance starts, the
    agent queue manager is stopped. As a result, the standby
    instance writes the following messages to its event log and
    shuts down:
    
    BFGAG0200E: An error occurred while determining the version of
    active instance. The error is 'cc=2 rc=2009 op=_get - MQGET'.
    BFGAG0071I: The agent has suspended its current transfers and is
    now stopping.
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects all users of MQ Managed File Transfer highly
    available (HA) agents.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    When using highly available Managed File Transfer agents, one of
    the requirements is that the active instance and all of the
    standby instances are running the same version of the product.
    The following mechanism is used to perform this version check:
    
    ------------------------------
    Step 1:
    --------
    After a standby instance of an agent has started up and
    connected to its agent queue manager, it will
    - Create a temporary queue on the agent queue manager.
    - Construct a message containing various pieces of information,
    including the version of the product that it is using
    - Put the message to the SYSTEM.FTE.COMMAND.<agent_name> on the
    queue manager.
    - And wait for a reply to arrive on the temporary queue.
    
    Step 2:
    --------
    The active instance of the agent picks up the message from the
    SYSTEM.FTE.COMMAND.<agent_name> queue, extracts the version
    information and compares it to the version of the product that
    it is running.
    
    If it determines that it is using the same version as the
    standby instance, then it constructs a message indicating that
    the standby instance can remain active and puts it onto the
    temporary queue.
    
    However, if the active instance finds that it is running a
    different version to the standby instance, then it constructs a
    message indicating that the standby instance should shut down.
    This message is then put to the temporary reply queue.
    
    Step 3:
    --------.
    When the standby instance gets the message from the reply queue,
    it checks the contents
    
    If the message indicates that the standby instance can remain
    active, then the instance will start trying to open the
    SYSTEM.FTE.HA.<agent_name> queue on the agent queue manager at
    regular intervals, so that it can take over if the active
    instance stops for some reason.
    
    If the reply message indicates that it needs to stop, then the
    standby instance writes the message:
    
    BFGAG0199E: The version of IBM MQ Managed File Transfer
    installed on standby machine does not match with the version on
    active instance running elsewhere. This instance of agent will
    stop."
    
    to its event log and shuts itself down.
    &#09;
    ------------------------------
    &#09;
    Now, if the standby instance encountered an issue while
    performing the version check, it would write the messages:
    
    BFGAG0200E: An error occurred while determining the version of
    active instance. The error is 'cc=<completion code> rc=<reason
    code> op=<operation>
    BFGAG0071I: The agent has suspended its current transfers and is
    now stopping.
    
    to its event log (output0.log) and shut itself down, regardless
    of whether the issue was a recoverable error (such as the
    instance becoming disconnected from its agent queue manager due
    to a network error) or a more permanent failure. When this
    happened, the instance had to be manually restarted.
    

Problem conclusion

  • To resolve this issue, MQ Managed File Transfer highly available
    agents have been updated so that:
    
    - If a standby instance encounters an issue while performing the
    version check.
    - And that issue represents a recoverable error (such as
    MQRC_CONNECTION_BROKEN)
    
    then the instance will write the message:
    
    BFGAG0183I: The agent received MQI reason code <reason_code>.
    Agent recovery will be initiated.
    
    to its event log and then attempt to reconnect to the agent
    queue manager. This ensures that the instance remains active,
    and does not require a manual restart.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.3
    v9.x CD    9.2.3
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT35599

  • Reported component name

    MQ BASE V9.2

  • Reported component ID

    5724H7281

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-01-20

  • Closed date

    2021-03-03

  • Last modified date

    2021-03-03

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ BASE V9.2

  • Fixed component ID

    5724H7281

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]

Document Information

Modified date:
25 March 2021