IBM Support

IT35878: Multiple instances of a highly available MFT agent start as the active instance following a queue manager restart

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Shortly after two instances of a highly available agent start up
    and connect to the agent queue manager, the agent queue manager
    is stopped. When the queue manager is restarted, one instance of
    the agent reconnects to the queue manger and successfully starts
    an the active instance. The other instance reconnects to the
    queue manager, and writes the following message to its event log
    (output0.log) at regular intervals:
    
    BFGMQ1045I: Agent's system queue
    'SYSTEM.FTE.COMMAND.<agent_name>' is configured as either
    NOSHARE or DEFSOPT(EXCL).
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects all users of MQ Managed File Transfer highly
    available (HA) agents.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    A highly available Managed File Transfer agent consists of:
    
    - One active instance.
    - One or more standby instances.
    
    The first instance of the agent that starts up locks a shared
    resource (the SYSTEM.FTE.HA.agent_name queue on the agent queue
    manager). When the other instances start, they fail to obtain
    the lock and become a standby instance. The standby instances
    will then attempt to take the lock at regular intervals, as
    specified by the agent property standbyPollInterval - once a
    standby instance obtains the lock, then it becomes the active
    instance.
    
    After the active instance has locked the shared resource, it
    performs its normal startup operations and then starts
    processing managed transfers.
    
    Now, if the active instance became disconnected from the agent
    queue manager after it had obtained the lock on the shared
    resource and before it had completed its initialization, then it
    would attempt to reconnect to the agent queue manager at regular
    intervals. However, after it had successfully reconnected, it
    did not attempt to relock the shared resource - instead, it
    would just continue with its initialization processing.
    
    This meant that a standby instance could lock the shared
    resource on the agent queue manager and so become an active
    instance too. If this happened, there would now be two active
    instances of the agent that were trying to initialize at the
    same time. The two instances would attempt to access various
    system queues on the agent queue manager for exclusive access.
    One of the instances would be able to access the system queues
    and successfully complete its initialization. The other would
    fail to do so, and would write messages similar to the ones
    shown below to its event log (output0.log):
    
    BFGMQ1045I: Agent's system queue 'SYSTEM.FTE.COMMAND.agent_name'
    is configured as either NOSHARE or DEFSOPT(EXCL).
    BFGMQ1045I: Agent's system queue 'SYSTEM.FTE.EVENT.agent_name'
    is configured as either NOSHARE or DEFSOPT(EXCL).
    
    These messages would be written to the event log at regular
    intervals until the first instance was stopped, at which point
    the other instance would be able to open the system queues and
    initialize successfully.
    

Problem conclusion

  • To resolve this issue, IBM MQ Managed File Transfer highly
    available agents have been updated so that if an active instance
    becomes disconnected from its agent queue manager:
    
    - After it has obtained the lock on the shared resource.
    - And before it has completed its initialization.
    
    then it will attempt to relock the shared resource after it has
    reconnected. If the instance is able to relock the shared
    resource, then it will remain as the active instance of the
    agent, and will continue with its initialization. However, if
    the instance fails to lock the shared resource after it has
    reconnected, then it will become a standby instance. This
    prevents two instances of the agent from being the active
    instance.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.3
    v9.x CD    9.2.3
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT35878

  • Reported component name

    MQ BASE V9.2

  • Reported component ID

    5724H7281

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-02-11

  • Closed date

    2021-03-25

  • Last modified date

    2021-03-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ BASE V9.2

  • Fixed component ID

    5724H7281

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]

Document Information

Modified date:
26 March 2021