Automated remote restart

Automated remote restart monitors hosts for failure by using the PRS (Platform Resource Scheduler) HA service. If a host fails, PowerVC automatically remote restarts the virtual machines from the failed host to another host within a host group.

Without automated remote restart enabled, when a host goes into Error or Down state, you must manually trigger the remote restart operation, but you can manually remote restart virtual machines from a host at any time, regardless of its automated remote restart setting. For details, see Remotely restart virtual machines from a failed host.

Note: By default, the automated remote restart feature is disabled so that the admin can select the host that should be considered for automated remote restart.

Overview: Managing automated remote restart on a host group, host, or virtual machine

Automated remote restart can be enabled or disabled for each host group, host, and virtual machine. By default, the automated remote restart is disabled on host groups and is enabled on hosts and virtual machines. However, no automated remote restart occurs unless the automated remote restart is enabled on the host group. You can disable automated remote restart on individual hosts or virtual machines at any time. The following scenarios describe what happens if the automated remote restart is set up in different ways for this environment:

This figure shows how you can include or exclude automated remote restart:
  1. Automated remote restart is disabled on the host group: No automated remote restart is performed on Host 1, Host 2, or Host 3.
  2. Automated remote restart is enabled on the host group but Host 1 is excluded from automated remote restart actions: If necessary, virtual machines on Host 2 and Host 3 are automatically remote restarted. Virtual machines on Host 1 are not automatically remote restarted. However, virtual machines on Host 2 and Host 3 could be restarted on Host 1. That is, Host 1 cannot be a source for automated remote restart, but it can be a destination.
  3. Automated remote restart is enabled on the host group and enabled on all hosts, but Virtual Machine A is excluded from automated remote restart actions: Virtual Machine A is never automatically remote restarted. All other virtual machines can be automatically remote restarted to any host in the host group.
  4. Automated remote restart is enabled on the host group and no host or virtual machine is excluded from automated remote restart: Automated remote restart is performed on all of the hosts, and all of the virtual machines can be automatically remote restarted to any other host within the host group.

Failure detection algorithms

To verify if the host is down, PowerVC performs the following failure detection algorithms.
  • For HMC managed hosts
    • The host is in Power off, Error, Error - dump in progress, or FSP unreachable state.
    • If the host is in FSP unreachable state, there are no active Fibre Channel ports on the host's Virtual I/O Servers.
      Note: This check only gives extra assurance that the host is down if the fabrics associated with the host's Virtual I/O Server are managed by PowerVC. As automated remote restart performs checks on the fabric switch to confirm host availability, it is recommended you register fabric switches in PowerVM® NovaLink.
    • There must be at least one fabric registered on PowerVC.
  • For PowerVM NovaLink managed hosts
    • The compute service is down on the host.
    • If the compute service is down, the NovaLink partition is unreachable via SSH.
    • If the SSH connection is unreachable, there are no active Fibre Channel ports on the Fibre Channel switch for the host's Virtual I/O Servers.
    • There must be at least one fabric registered on PowerVC.
    • For virtual machines that use shared storage pool-backed volumes, the host state is down on the shared storage pool cluster.

Enable automated remote restart on a host group

You can enable or disable automated remote restart at any time. To change the automated remote restart setting on a host group, follow these steps:

  1. Navigate to Hosts > Host Groups and select the host group on which you want to enable automated remote restart, then click Edit.
  2. Select Enable automated remote restart, fill out the settings, and click Save Host Group.
    Automated restart relies on these values that are specified on a host group:
    Run interval
    The frequency that the state of the host is checked.
    Stabilization
    The number of consecutive run intervals that the host must be down before an automated remote restart operation is initiated.
    Before initiating the remote restart of virtual machines from the source host, PowerVC verifies that the host is down x times in a row, where x is the stabilization value.
    Note: Ensure that Run interval x Stabilization is at least 5 minutes. For example, 5 minutes x 2 times = 10 minutes.

Enable or disable automated remote restart on a host or virtual machine

By default, if the automated remote restart is enabled on a host group, the action is performed to any host that is down in the host group and every virtual machine on the host. However, you can change this setting at any time. To change the automated remote restart setting on a host or virtual machine, follow these steps:

  1. Open the Hosts or the Virtual Machines page.
  2. Double-click the host or virtual machine that you want to work with, find the automated remote restart option and click Edit.
  3. In the dialog that appears, select or deselect the check box as appropriate.
Note: PowerVC notifies you when a virtual machine is being automatically remote restarted to a host that is not automated remote restart enabled.

Considerations

You need to understand these considerations about the automated remote restart process:
  • To be eligible for automated remote restart, the hosts, and the virtual machines must meet all remote restart requirements. For details, see Remote restart considerations.
  • If the remote restart operation fails on a virtual machine, PowerVC does not retry the operation. Such a virtual machine is added to the failed-to-rebuild list and the host is set to Error state.
  • For NovaLink managed hosts, if the shared storage pool service is running on the host that is down, you must manually switch the service to run on another NovaLink or HMC host. This can be done from the Storage Provider page of the user interface.
  • To prevent virtual machines from being automatically remote restarted when a host is powered off normally, disable the automated remote restart option on the host or host group.
  • If the source host recovers during the automated remote restart process, PowerVC stops the remote restart operation. Virtual machines that have not been migrated remain on the source host.
  • Virtual machines are automatically remote restarted based on the Availability Priority settings as specified in the compute template, where the virtual machine with the highest value is restarted first. For example, a virtual machine that has the priority set to 100 is remote restarted before a virtual machine with the priority value set to 20.
  • After automated remote restart has run on a host and the host comes back up, it is put into maintenance mode. The administrator can check the log file before bringing the host up and performing any new deploy or migration operations.
    Notes:
    • If the host is in Maintenance error state, then click Exit Maintenance Mode to safely bring the host out of maintenance mode.

After the host exits the maintenance mode (after 5 minutes) or when the host becomes active, any pinned virtual machines that were moved from the host or if the host has 'Recall enabled' option set to Yes, PowerVC automatically recalls the virtual machines to the host.