[V9.1.4 Dec 2019]

Discarding instances, failover, and maintenance in highly available agents

Highly available Managed File Transfer instances can be discarded, can be failed in various different ways, and might need maintenance.

Discarding the standby instance status

There can be situations where the active instance is busy with transfers and is unable to process standby instance status messages, or the standby instance has failed, or is not publishing status messages for any reason.

In such scenarios the active agent that was aware of presence of standby instance waits for the value specified by the standbyStatusDiscardTime property in the agent.properties file before removing the standby instance from its list. The default value for this property is 600 seconds, which is twice that of the standbyStatusPublishInterval property.

Failing over an instance normally

You must use the fteStopAgent command with the -i option to carry out a normal failover.

This ensures that the active instance is stopped immediately. If you stop an agent without the -i option, the agent continues to run until all ongoing transfers are completed by the active instance, therefore, the failover might take a long time.

Any inflight transfers resume from the last known check point.

Failing over an instance in other situations

If an active instance ends in a way that is not normal, or the whole machine fails, the connection to the agent queue is broken, and the queue manager closes all open queues, including the SYSTEM.FTE.HA.<agent name> queue, and connections.

Due to this, the standby instance acquires the exclusive GET and completes the rest of the agent initialization.

Again, any inflight transfers resume from the last known check points.

If a connection to the queue manager breaks

Client mode

An agent process consists of several threads. Other than the default threads, for example, a thread that publishes agent status at regular intervals, every transfer request is handled with a set of threads that end after a transfer completes.

Many of these threads connect to the agent queue manager and put and get messages. It is possible that any of these connections can break due to a network issue or a queue manager failing. When any thread detects a connection broken problem, the thread informs the main thread to initiate recovery, and ends.

The main thread then launches another thread to wait for a connection to the queue manager being established. Once reconnected, an attempt is made to acquire the exclusive GET for the agent. If that succeeds, the agent continues to complete the recovery and becomes the active instance. If the attempt to acquire the exclusive GET fails, the instance becomes a standby.

Bindings mode

When connecting in bindings mode, if an agent loses connection, the agent process ends. The process controller handles the restarting of the agent. When an agent restarts, it goes through the process of attempting to acquire the exclusive GET for itself.

If the agent succeeds, it becomes an active instance; otherwise the agent becomes a standby instance.

Applying maintenance level upgrades

The steps for applying maintenance to highly available agents are similar to those documented for multi-instance queue managers. For more information, see Applying maintenance level updates to multi-instance queue managers on Windows, Applying maintenance level updates to multi-instance queue managers on AIX®, Applying maintenance level updates to multi-instance queue managers on Solaris or Applying maintenance level updates to multi-instance queue managers on Linux®.

You must stop the agent running on the machine where the maintenance level is to be applied, before applying maintenance. If you are updating an active instance, for continuity of transfers, you must failover the active instance to a standby instance.

Once the upgrade is complete, you must start the agent instance, failover the current active instance to the upgraded instance, and then upgrade the standby instance.

Migrating agents from an earlier version of the product

Agents migrated from versions of IBM® MQ prior to IBM MQ 9.1.4 run as non highly available. You can make them run in high availability mode by following the procedure in Migrating Managed File Transfer agents from an earlier version.