Failover

Failover occurs when an IBM® i system in a cluster automatically switches over to one or more backup nodes in the event of a system failure.

Contrast this with a switchover, which happens when you manually switch access from one server to another. A switchover and a failover function identically once they have been triggered. The only difference is how the event is triggered.

When a failover occurs, access is switched from the cluster node currently acting as the primary node in the recovery domain of the cluster resource group to the cluster node designated as the first backup.

When multiple cluster resource groups (CRGs) are involved in a failover action, the system processes the device CRGs first, the data CRGs second, and the application CRGs last.

With device CRGs, failover processing varies off the devices associated with the CRG. The devices are varied off even if failover is cancelled via the cluster message queue or failover message queue. Some system actions which cause a failover, such as ending TCP/IP, do not affect the entire system, so user and jobs may still need access to the device. You may want to end the CRG before taking those system actions and keep the devices varied on for the following reasons:
  • When you are performing a save with Option 21 after ending all subsystems (ENDSBS *ALL).
  • When you are performing routine fixes by ending subsystems or ending TCP/IP and do not to spend extra time varying off and on devices.
  • When the entire system is not ending, it is possible that other jobs would still need access to the device.

The failover message queue receives messages regarding failover activity for each CRG defined in the cluster. You can also use the cluster message queue to receive a single message for all the CRGs failing over to the same node. Both allow you to control the failover processing of cluster resource groups and nodes. If you have both the cluster message queue and the failover message queue configured, the cluster message queue takes priority. If you prefer failover messages for each CRGs within a cluster, you should not configure the cluster message queue. For either message queue, you can use IBM i watch support to monitor these message queues for activity.