IBM Support

Simplified Remote Restart via HMC or PowerVC

How To


Summary

Let PowerVC restart your failed LPAR (VM) on another server for you, while you sleep at night!

Objective

Nigels Banner

Steps

Previous Remote Restart was rather complicated and involved Active Memory Sharing setup, hence the new name. Now in 2016 it is Simplified

SRR Logo by Nigel Griffiths

What is it for?

  • You have a running Virtual Machine but the Server that it was running on suddenly halt which some machine level issue.
  • The details of the Virtual Machine are available and so it the disk on a SAN device - external to the machine.
  • You need to rebuild the Virtual Machine on a different server with enough CPU and RAM resources, connect the disks and network.
  • So you can quickly restart the service.
  • Simplified Remote Restart does these tasks for you in a minute.

STOP PRESS - UPDATED INFORMATION: November 2016

HMC 860 (November 2016) - Allows setting the Simplified Remote Restart flag without stopping the virtual machine so no service outage.

PowerVC 1.3.2 (December 2016) - Now has automatic Simplified Remote Restart. It notices the virtual machine stopped and automatically starts recovery to other servers in the PowerVC Host Group, plus you can prioritize the virtual machines, so the Production ones get recovered first.   I tested the beta version and it worked well 1st time. 


 Brief the Prerequisites (fuller list lower down in the article) for Simplified Remote Restart (SRR) are:

  • POWER8 machines or above
  • Recommend the latest HMC and firmware
  • Need to switch on the Simplified Remote Restart feature on the HMC Simplified Remote Restart & reboot the VM
    Sorry about that - we know that most people want to avoid a reboot!

As simple as 1-2-3-4-5-6

  1. Get Live Partition Mobility (LPM) working
    If LPM does not work, then SRR never will!
  2. Set the Simplified Remote Restart flag for your Logical Partitions (LPAR) or Virtual Machine (VM)
  3. LPM Validate + Simplified Remote Restart Validate
  4. BANG!  Don’t Panic – do it for real!
  5. Testing it works before the event with the HMC command line interface
  6. Testing it PowerVC GUI
  7. Simplified Remote Restart is not PowerHA (HACMP)
  8. Further Information and my Simplified Remote Restart video on YouTube covering this material

1 Live Partition Mobility - A Reminder

  1. Live Partition Mobility = the virtual machine in operation. Many years of experience now.
  2. Static Partition Mobility = when shut down. Quick because there is no memory to move.
  3. "Zombie" Partition Mobility = rescue the virtual machine from beyond the grave = Simple Remote Restart. We used to call this dead partition migration!
    • LPM started 2007 = 10 years next year
    • In my opinion: every POWER customer should be heavily into LPM by now because it is so useful.
  4. Requires PowerVM Enterprise Edition
  5. Requires “spare” capacity - somewhere to go with CPU, RAM, and I/O bandwidth
  6. Keep Hardware Management Consoles (HMC) and virtual I/O Servers (VIOS) up to date
  7. Pure virtual network (SEA) & disks (vSCSI or vFC) - zero physical adapters in the virtual machine
  8. Source + Target need the same subnet and disks (or disk LUNs) access

Gotcha that hit me most times but I don't run production servers

  • Virtual optical media (a DVD .iso image virtualised from the VIOS)  - can be deleted from the virtual machine
  • Logical Memory Block size - Best to have one size on every machine. LMB size is set by using the ASMI Interface and a server reboot
  • Processor Mode to the older server (can’t move POWER8 VM in POWER8 mode to a POWER7 server)
  • Linux on POWER but missing the IBM RPM packages to allow the connection to the HMC

Live Partition Mobility - Best Practice

  1. Keep up to date that uses Fix Level Recommendation Tool (FLRT)
  2. LPM prep checklist if its been a while since LPM for a virtual machine
  3. Follow VIOS performance guidelines for LPM
  4. VIOS Advisor
    • The “part” command to tune up your VIOS and avoid "silly" mistakes

2 Set Simplified Remote Restart flag

Check whether your machine can do Simplified Remote Restart at all:

  • Check the Server Properties for the PowerVM Partition Simplified Remote Restart Capable setting
  • Ignore PowerVM Partition Remote Restart Capable - that capability is the older and complicated POWER7 feature

Compatible List

 Set the Simplified Remote Restart flag for a particular virtual machine.

Currently, (at the time of writing), we can't set the Remote Restart flag at the

  • Virtual machine Operating System
  • VIOS
  • HMC GUI
  • HMC Enhanced+ GUI

Currently, (HMC 840 on 2016) it is only settable on the HMC command line as follows and assuming you know how to get to the HMC command line - if you don't stop now!  

List the current setting by using the HMC  lssyscfg command by following the example here

  • You have to use your own Server and virtual machine names, which are found on the HMC GUI.
  • If you foolishly (IMHO) have space characters in the names, I recommend removing them (spaces are highly not recommended) otherwise you are in a world of pain and trying to double-quote the names again and again.
  Syntax    lssyscfg -r lpar -m machine --filter lpar_names=“LPARname"    Example    hmc> lssyscfg -r lpar -m P8-lime  --filter lpar_names="vm61"  name=vm61,lpar_id=7,lpar_env=aixlinux,state=Running, resource_config=1,os_version=AIX 7.1 7100-03-05-1524, logical_serial_num=215296V7,default_profile=default_profile, curr_profile=default_profile,work_group_id=none, shared_proc_pool_util_auth=1,allow_perf_collection=1, power_ctrl_lpar_ids=none,boot_mode=norm,lpar_keylock=norm, auto_start=0,redundant_err_path_reporting=0,rmc_state=active, rmc_ipaddr=9.137.62.61,time_ref=0,lpar_avail_priority=127, desired_lpar_proc_compat_mode=POWER8, curr_lpar_proc_compat_mode=POWER8,suspend_capable=0,  remote_restart_capable=0,    NOT this option  simplified_remote_restart_capable=1, This option and 1 is set on  remote_restart_status=Remote Restartable, This shows the LPAR was Rebooted after the setting above  sync_curr_profile=0,affinity_group_id=none,vtpm_enabled=0

Above it is set to 1 = active and the virtual machine rebooted - the machine is ready for SRR.

 Alternatively, take a whole machine view:

Example: 

hmc> lssyscfg -r lpar -m P8-lime  -F simplified_remote_restart_capable,name
0,limevios1
0,limevios2
0,vm36_Ubuntu1504
0,vm26-ubuntu1504
0,vm35_SLES12
0,vm20-SLES-11.3
0,vm22-RHEL7-GA
0,vm112
1,vm61
hmc>

If your virtual machines have simplified_remote_restart_capable=0, you need to change it.

Set the flag simplified_remote_restart_capable to 1. To switch on SRR.

Syntax: 

chsyscfg-r lpar -m server -i "name=partition name, simplified_remote_restart_capable=1"

Note carefully those double quotes!

Example: 

chsyscfg -r lpar -m P8-lime  -i “name=vm61,simplified_remote_restart_capable=1”


Takes a couple of seconds

 To disable SRR, change the “=1” to “=0”.

Warning: I am told SRR adds a little CPU workload to the HMC. If you have old HMC hardware or software, an upgrade would be a good idea.

Once SRR is enabled - you can find and check it on the HMC GUIs.

HMC Classic View

Alternative panel

HMC Enhanced+ View

View Only

Once set, do not forget to reboot your Logical Partition / virtual machine to ensure it is activated.

UPDATE: Thanks to Elvis in Technical Support for this information. Recent hardware and HMC levels allows the SSR to be dynamically changed (no reboot needed). To check, run the following command on the HMC:

> lssyscfg -r sys -F name,powervm_lpar_simplified_remote_restart_capable,dynamic_simplified_remote_restart_toggle_capable
P9-S922-amber,1,1
S1024-gold,1,1
P8-E850-ruby,1,1
P8-S824-emerald,1,1
P8-S822-lime,unavailable,unavailable
P9-S924-red,1,1
>

# The P8-S8222-lime server is powered off.
# The second "1" in the output highlights the SRR can be switched on and off dynamically.
# That is the "dynamic_simplified_remote_restart_toggle_capable" value.

  • And worth checking it is true afterwards:  remote_restart_status = Remote Restart

 3 Validate for LPM

On the HMC, your Server and Select your virtual machine then Operations -> Mobility -> Validate

Select the right menus

Next, Select your target Server - something similar so it can support the same POWER mode and click: Validate

Run it

The validation results are reported on the screen:

 with an only warnings

Read the Warning (if any) carefully. The example target server does not have the AME function  (that is OK as even if it is in use the machine uses the POWER8 CPU to do AMS without the Hardware accelerator in the chip). I am not paranoid about slot numbers with pointless rules. 

LPM Warnings do NOT STOP LPM or SRR. Warnings are information letting you know there are slight differences between your source and target servers that it works around.

If you get an Error: stop immediately and fix it

If LPM does not work, Simplified Remote Restart is not going to work

Next, let us do a Simplified Remote Restart Validate

Back on the HMC Command Line Interface CLI

Syntax

hmc>  rrstartlpar -o validate    -m Source-box   -p LPAR-name    -t Target-box

Example

hmc> rrstartlpar -o validate -m P8-lime -p vm61 -t P8-emerald

Warnings:
HSCLB504 The migrating partition cannot use hard ware -accelerated Active Memory Expansion on the destination managed system because the destination managed system does not support hard ware -accelerated Active Memory Expansion.
HSCLA4CC The management console cannot maintain the source Virtual I/O Server (VIOS) slot number 14 for virtual SCSI adapter 5 on the destination VIOS partition 2*8286-42A*100EC7V.
HSCLA4CC The management console cannot maintain the source Virtual I/O Server (VIOS) slot number 3 for virtual SCSI adapter 4 on the destination VIOS partition 3*8286-42A*100EC7V.

Note: The warnings are the same as the LPM once because the Simplified Remote Restart Validate does a LPM Validate but it also checks other things like the Remote Restart State.

 The complete rrstartlpar syntax - note that only the bold parts are needed for Simplified' 

rrstartlpar -o { restart | validate | cancel | cleanup | recover }

-m managed-system

[ -t target-managed-system]

{ -p partition-name | --id partition-ID}

[--redundantvios {0 | 1 | 2}]

[--mpio {1 | 2}]

[--vlanbridge {1 | 2}]

[--retaindev]

[--usecurrdata]

[-w wait-time]     # default 3 minutes

[-d detail-level]  # amount of output

[--force]          # cleanup/recover

[-v]

Simplified Remote Restart Official Prerequisites Briefly

Machine Level

  1. LPM prerequisities = access to the same external storage & subnet
  2. The HMC 820 SP1 or later (with latest PTF) + 820 firmware
  3. Machines are Simplified Remote Restart capable
  4. Both hosts must be managed by the same HMC
  5. HMC to FSP connection (The HMC needs to definitely confirm the box is off)
  6. The source host must be in Error, Power Off, or Error - dump in progress state on the HMC. (NOTE: Power off from the HMC is OK)

Virtual machine Level

  1. VM must be Simplified Remote Restart capability enabled
  2. The remote restart state of the VM must be "Remote restartable“

Note:

  • UPDATE: SSP is from Q4 2015 officially supported.

4 Let us pretend the machine crashed

bang

Example:

hmc> rrstartlpar -o restart -m P8-lime -p vm61 -t P8-emerald

 Then, the MAGIC happens

  • Like LPM, you can watch it on the HMC go through various phases of creating the VM
  • Much quicker than LPM as no memory needs to be moved - create virtual machine, attach the disks, and network then power up the virtual machine
  • Restarts the VM automatically at the end for you which is nice.

Then, you need to clean up the debris!

Virtual machine Definition on the source machine is still there - unlike LPM

Why?

  • The source machine was powered off so it is not possible to remove the virtual machine
  • VIOS was shut down -  so SRR can't unconfigure the virtual I/O disks and network

Example:

After the machine is power up, virtual I/O Servers started and settled down … meaning “give it 5 minutes”

hmc> rrstartlpar -o cleanup  -m $SOURCE  -p $VM
    ## for my small VM about 10 second
hmc>

If the machine is still down or the VIOS not running fully yet you get error messages like this:

  • HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
  • HSCLA928 The Virtual I/O Server (VIOS) partition limevios1 is not in the Running state. This operation is only allowed when the VIOS partition is running.

 

5 How to test SRR

Get to a machine state to Power Off:

CLEAN Method A

  1. Shut down the virtual machines
  2. Shut down the virtual I/O server
  3. Power Off the machine

Power Off state = ready for SRR

Note: The shut down method is a little cheat!  In a real server crash the VM + VIOS, don’t cleanly stop & flush disks

 Get to a machine state to Power Off UGHLY

Ugly Method Alternatives:

  • B) Yank all Electrical power cords
    • Count to ten and reinsert the cords
    • But make sure that there are no autostart VIOS or virtual machine settings
    • If left unplugged the HMC sees the machine state as unknown and Simplified Remote Restart is NOT available!
  • C) HMC Power-off Server - VIOS & VM running

Note:

  • Now VM + VIOS have to crash recover file systems so the test is more realistic

Manual request the remote restart:

HMC CLI: rrstartlpar -o restart …
The command will return fairly quickly (after validate)

 Watch the HMC status to see it working on the target machine
The VM starts automatically.

 If it fails: rrstartlpar -o recover …

 Clean up source machine

  • Restart machine
  • Restart VIOS
  • Wait 5 minutes
  • rrstartlpar -o cleanup …

 

6 Use PowerVC to do the Simplified Remote Restart instead of the HMC rrstartlpar command

"Remotely Restart Virtual Machines" button appears when you select a Powered-off Host machine that is SRR-ready

Warning PowerVC as a few more prereqs

  • HMC 830 + Firmware 830  - IMHO it is recommended to be on the latest HMC version and Systems Firmware
  • VIOS 2.2.3.4+     - IMHO VIOS 2.2.4
  • At least PowerVC  version 1.2.3.2  - IMHO 1.3

I am not going to cover how to run PowerVC. Watch my YouTube Videos on that topic.

If you have a machine in "Power Off" State, Select it and click the "Remotely Restart Virtual Machines" button

PowerVC Hosts

You get a list of virtual machines on that machine that are Simplified Remote Restartable - in this example, there is only 1.

Select a virtual machine and click "Remote Restart":

Select the VM

In the next picture, it shows the selection of the target machine (from a list) and click the "Remote Restart " button

Select the host

And it gets on with it:

Rebuilding

Later it shows the VM running on the target machine

Rather an anti-climax!

PowerVC cleans up the original VM once the machine + VIOS started

  • It checks every 2 minutes and does it as soon as the machine and VIOS are ready

7 PowerVC compared to PowerHA (HACMP)

I have these two charts that cover the highlights and hopefully make it clear that there are large differences

Hopefully, these points are self-explaining but note they are my opinions and NOT IBM official statements.

not HACMP/PowerHA


SRR v PowerHA Compared

8 Further Information

In my YouTube Video on this topic, I also discuss scaling up Simplified Remote Restart to many machines and hundreds of virtual machines, automated recovery and cover some of the challenges you might need to address.  

Watch it here https://www.youtube.com/watch?v=aoe1fyT5l0A

   YouTube video advert

    ">

    Additional Information


    Other places to find content from Nigel Griffiths IBM (retired)

    Document Location

    Worldwide

    [{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power -\u003EPowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

    Document Information

    Modified date:
    31 August 2023

    UID

    ibm11115721