Automated remote restart
Automated remote restart monitors hosts for failure by using the PRS (Platform Resource Scheduler) HA service. If a host fails, PowerVC automatically remote restarts the virtual machines from the failed host to another host within a host group.
Without automated remote restart enabled, when a host goes into Error
or
Down
state, you must manually trigger the remote restart operation, but you can
manually remote restart virtual machines from a host at any time, regardless of its automated remote
restart setting. For details, see Remotely restart virtual
machines from a failed host
.
Overview: Managing automated remote restart on a host group, host, or virtual machine
Automated remote restart can be enabled or disabled for each host group, host, and virtual machine. By default, the automated remote restart is disabled on host groups and is enabled on hosts and virtual machines. However, no automated remote restart occurs unless the automated remote restart is enabled on the host group. You can disable automated remote restart on individual hosts or virtual machines at any time. The following scenarios describe what happens if the automated remote restart is set up in different ways for this environment:

- Automated remote restart is disabled on the host group: No automated remote restart is performed on Host 1, Host 2, or Host 3.
- Automated remote restart is enabled on the host group but Host 1 is excluded from automated remote restart actions: If necessary, virtual machines on Host 2 and Host 3 are automatically remote restarted. Virtual machines on Host 1 are not automatically remote restarted. However, virtual machines on Host 2 and Host 3 could be restarted on Host 1. That is, Host 1 cannot be a source for automated remote restart, but it can be a destination.
- Automated remote restart is enabled on the host group and enabled on all hosts, but Virtual Machine A is excluded from automated remote restart actions: Virtual Machine A is never automatically remote restarted. All other virtual machines can be automatically remote restarted to any host in the host group.
- Automated remote restart is enabled on the host group and no host or virtual machine is excluded from automated remote restart: Automated remote restart is performed on all of the hosts, and all of the virtual machines can be automatically remote restarted to any other host within the host group.
Failure detection algorithms
- For HMC managed hosts
- The host is in
Power off
,Error
,Error - dump in progress
, orFSP unreachable
state. - If the host is in
FSP unreachable
state, there are no active Fibre Channel ports on the host's Virtual I/O Servers.Note: This check only gives extra assurance that the host is down if the fabrics associated with the host's Virtual I/O Server are managed by PowerVC. As automated remote restart performs checks on the fabric switch to confirm host availability, it is recommended you register fabric switches in PowerVM® NovaLink. - There must be at least one fabric registered on PowerVC.
- The host is in
- For PowerVM NovaLink managed hosts
- The compute service is down on the host.
- If the compute service is down, the NovaLink partition is unreachable via SSH.
- If the SSH connection is unreachable, there are no active Fibre Channel ports on the Fibre Channel switch for the host's Virtual I/O Servers.
- There must be at least one fabric registered on PowerVC.
- For virtual machines that use shared storage pool-backed volumes, the host state is down on the shared storage pool cluster.
Enable automated remote restart on a host group
You can enable or disable automated remote restart at any time. To change the automated remote restart setting on a host group, follow these steps:
- Navigate to Hosts > Host Groups and select the host group on which you want to enable automated remote restart, then click Edit.
- Select Enable automated remote restart, fill out the settings, and click
Save Host Group.Automated restart relies on these values that are specified on a host group:
- Run interval
- The frequency that the state of the host is checked.
- Stabilization
- The number of consecutive run intervals that the host must be down before an automated remote restart operation is initiated.
Before initiating the remote restart of virtual machines from the source host, PowerVC verifies that the host is down x times in a row, where x is the stabilization value.Note: Ensure that Run interval x Stabilization is at least 5 minutes. For example, 5 minutes x 2 times = 10 minutes.
Enable or disable automated remote restart on a host or virtual machine
By default, if the automated remote restart is enabled on a host group, the action is performed to any host that is down in the host group and every virtual machine on the host. However, you can change this setting at any time. To change the automated remote restart setting on a host or virtual machine, follow these steps:
- Open the Hosts or the Virtual Machines page.
- Double-click the host or virtual machine that you want to work with, find the automated remote restart option and click Edit.
- In the dialog that appears, select or deselect the check box as appropriate.
Considerations
- To be eligible for automated remote restart, the hosts, and the virtual machines must meet all remote restart requirements. For details, see Remote restart considerations.
- If the remote restart operation fails on a virtual machine, PowerVC does not retry the operation. Such a virtual machine is added to the
failed-to-rebuild list and the host is set to
Error
state. - For NovaLink managed hosts, if the shared storage pool service is running on the host that is down, you must manually switch the service to run on another NovaLink or HMC host. This can be done from the Storage Provider page of the user interface.
- To prevent virtual machines from being automatically remote restarted when a host is powered off normally, disable the automated remote restart option on the host or host group.
- If the source host recovers during the automated remote restart process, PowerVC stops the remote restart operation. Virtual machines that have not been migrated remain on the source host.
- Virtual machines are automatically remote restarted based on the Availability Priority settings as specified in the compute template, where the virtual machine with the highest value is restarted first. For example, a virtual machine that has the priority set to 100 is remote restarted before a virtual machine with the priority value set to 20.
- After automated remote restart has run on a host and the host comes back up, it is put into
maintenance mode. The administrator can check the log file before bringing the host up and
performing any new deploy or migration operations.Notes:
- If the host is in
Maintenance error
state, then click Exit Maintenance Mode to safely bring the host out of maintenance mode.
- If the host is in
- VM recall operation from a higher version host to a lower version host (for example, P10 host is
higher version to P9 or P8 hosts)fails.This happens when a VM is deployed with
Default
compatibility mode on the lower level host and remote restart is performed to a higher version destination host (for example, a P8 host is lower version to P9 or P10 hosts). The virtual machine's compatibility mode changes according to the destination host after remote restart.
After the host exits the maintenance mode (after 5 minutes) or when the host becomes active, any pinned virtual machines that were moved from the host or if the host has 'Recall enabled' option set to Yes, PowerVC automatically recalls the virtual machines to the host.