Engine tier failover recovery

If the engine tier (and possibly other tier software as well) is set up in an active-passive configuration, hardware or network errors cause a failover to the passive server. You can also force a failover to occur to free the active server for maintenance or upgrade tasks.

The high availability (HA) software that is installed on the servers manages the fault detection and failover process.

During a failover, the sequence of events differs depending on whether the failover is due to a failure or is forced.

Failover due to a failure

When the active server hardware or network fails, the heartbeat mechanism between the nodes signals the passive server that the active server has failed. The HA software restores service on the passive server by doing the following actions:
  • Ensures that the primary server is no longer running.
  • Assigns the IP address that is associated with the resource group to the new server.
  • Mounts the floating mount point for the software on the new server.
  • Starts the engine tier software on the new server by calling the InfoSvrEngine script with the start option.
  • If other tier software is installed on the server, the HA software starts it by calling the InfoSvrServices script with the start option. This script starts the services tier. It also starts the metadata repository tier if the tier is installed with the engine tier.

Forced failover

When you force a failover, the HA software shuts down the software before starting it up on the other node. The HA software does the following steps:
  • If software is installed for tiers other than the engine tier, the HA software stops it by calling the InfoSvrServices script with the stop option. This script stops the services tier. It also stops the metadata repository tier if the tier is installed with the engine tier.
  • Stops the engine tier software on the server by calling the InfoSvrEngine script with the stop option.
  • Unmounts the floating mount point for the software.
  • Unmounts the data files mount point.
  • Unassigns the IP address associated with the resource group from the old server.
  • Reassigns the resource group IP address and mounts the floating mount point.
  • Starts the engine tier software on the new server by calling the InfoSvrEngine script with the start option.
  • If other tier software is installed on the server, the HA software starts it by calling the InfoSvrServices script with the start option. This script starts the services tier. It also starts the metadata repository tier if the tier is installed with the engine tier.

Recovery process

In a production system, if server engine services did not shut down normally, the DSHARestart tool starts automatically on the passive server. The tool checks and repairs dynamic files that are associated with any jobs that were running on the primary server when the failover occurred. The state of these jobs is set to crashed for easy identification.

In a development system where users were creating, editing, or compiling projects when a failover occurred, the restart might leave projects in an inconsistent state. You can use the SyncProject tool to resolve any inconsistencies in these projects.
Note: Once the recovery process is complete, you must restart any clients that have an active connection to the engine tier. For web clients, log out and log in again for the changes to take effect.