IBM Support

Simplified Remote Restart via HMC or PowerVC

How To


Summary

Let PowerVC restart your failed LPAR (VM) on another server for you, while you sleep at night!

Objective

Nigels Banner

Steps

Previous Remote Restart was rather complicated and involved Active Memory Sharing setup, hence the new name. Now in 2016 it is Simplified.

logo

What is it for?

  • You have a running Virtual Machine but the Server it was running on suddenly halt which some machine level issue.
  • The details of the Virtual Machine are available and so it the disk on a SAN device - external to the machine.
  • You just need to rebuild the Virtual Machine on a different server with enough CPU and RAM resources, connect the disks and network.
  • So you can quickly restart the service.
  • This is what Simplified Remote Restart will do for you in around a minute.

STOP PRESS - UPDATED INFORMATION:   Nov 2016

HMC 860 (November 2016) - Allows setting the Simplified Remote Restart flag without stopping the LPAR so no service outage.

PowerVC 1.3.2 (December 2016) - Now has automatic Simplified Remote Restart.  It notices the machine going down and automatically starts recovery to other servers in the PowerVC Host Group plus you can prioritise the VMs, so the Production ones get recovered first.   I tested the beta version and it worked very well 1st time. 


 Brief the Pre-requisites (fuller list below) for Simplified Remote Restart (SRR) are:

  • POWER8 machines
  • Recommend latest HMC & Firmware
  • Need to switch on SRR on the HMC to do this & reboot the VM
    Sorry about that - we know most people want to avoid a reboot!

As simple as 1-2-3  4-5-6

  1. Get Live Partition Mobility (LPM) working
    If LPM does not work then SRR never will!
  2. Set the Simplified Remote Restart flag for your Logical Partitions (LPAR) / Virtual Machine (VM)
  3. LPM Validate + Simplified Remote Restart Validate
  4. BANG!  Don’t Panic – do it for real! (unlikely)
  5. Testing it works before the event via the HMC CLI
  6. Testing it PowerVC GUI
     
  7. Simplified Remote Restart in NOT PowerHA (HACMP)
  8. Further Information and my Simplified Remote Restart video on YouTube covering the material blow

1 Live Partition Mobility - A Reminder

  1. Live Partition Mobility = while running. Many years experience now.
  2. Static Partition Mobility = when shutdown. Quick as no memory to move.
  3. "Zombie" Partition Mobility = rescue the LPR/VM from beyond the grave = Simple Remote Restart. We used to call this dead partition migration!!
    • LPM started 2007 = 10 years next year
    • In my opinion: every POWER customer should be heavily in to LPM by now because it is so useful.
  4. Requires PowerVM Enterprise Edition
  5. Requires “spare” capacity - some where to go with CPU, RAM and I/O bandwidth
  6. Keep HMC & VIOS’s up to date
  7. Pure virtual network (SEA) & disks (vSCSI or vFC) - zero physical adapters in the LPAR/VM
  8. Source + Target need same subnet & disks/LUN access

Gotcha that hit me most times but then I don't run production machines

  • Virtual optical media (a DVD .iso image virtualised from the VIOS)  - can just delete it from the LPAR/VM
  • Logical Memory Block size - Best to have one size on every machine this is set via ASMI + Server reboot
  • Processor Mode to older box (can’t move POWER8 VM in POWER8 mode to a POWER7 box)
  • Linux on POWER but missing the IBM RPMs to allow the connection to the HMC

Live Partition Mobility - Best Practice

  1. Keep up to date using Fix Level Recommendation Tool (FLRT)
  2. LPM setup checklist for first time
  3. LPM prep checklist if its been a while since LPM for a LPAR
  4. Follow VIOS performance guidelines for LPM
  5. VIOS Advisor
    • The “part” command to tune up your VIOS and avoid "silly" mistakes

2 Set Simplified Remote Restart flag

Check if you machine can do Simplified Remote Restart at all:

  • Check the Server Properties for the PowerVM Partition Simplified Remote Restart Capable setting
  • Note this is NOT the PowerVM Partition Remote Restart Capable - this one is the older more complicated POWER7 feature

 Capability list

Set the Simplified Remote Restart flag for a particular LPAR  / VM.

Can’t set Remote Restart flag at the

  • VM / LPAR Operating System
  • VIOS
  • HMC GUI
  • HMC Enhanced+ GUI

Currently (HMC 840 at Feb 2016) this is only settable via the HMC command line as follows and assuming you know how to get to the HMC command line - if you don't please stop now!  I guess this might change in the future - this is not an announcement.

List the current setting using the HMC  lssyscfg command by following the example here

  • You will have to use your own Server and LPAR names (of course) these are found on the HMC GUI.
  • If you foolishly (IMHO) have spaces in the names then I recommend removing them (space are highly not recommended) otherwise you are in a world of pain and trying to double quote the names again and again.
  Syntax    lssyscfg -r lpar -m machine --filter lpar_names=“LPARname"    Example    hmc> lssyscfg -r lpar -m P8-lime  --filter lpar_names="vm61"  name=vm61,lpar_id=7,lpar_env=aixlinux,state=Running, resource_config=1,os_version=AIX 7.1 7100-03-05-1524, logical_serial_num=215296V7,default_profile=default_profile, curr_profile=default_profile,work_group_id=none, shared_proc_pool_util_auth=1,allow_perf_collection=1, power_ctrl_lpar_ids=none,boot_mode=norm,lpar_keylock=norm, auto_start=0,redundant_err_path_reporting=0,rmc_state=active, rmc_ipaddr=9.137.62.61,time_ref=0,lpar_avail_priority=127, desired_lpar_proc_compat_mode=POWER8, curr_lpar_proc_compat_mode=POWER8,suspend_capable=0,  remote_restart_capable=0,    NOT this option  simplified_remote_restart_capable=1, This option and 1 is set on  remote_restart_status=Remote Restartable, This shows the LPAR was Rebooted after the setting above  sync_curr_profile=0,affinity_group_id=none,vtpm_enabled=0

Above it is set to 1 = active and the LPAR / VM rebooted - this is ready.

 

Alternatively, take a whole machine view:

Example: hmc> lssyscfg -r lpar -m P8-lime  -F simplified_remote_restart_capable,name
0,limevios1
0,limevios2
0,vm36_Ubuntu1504
0,vm26-ubuntu1504
0,vm35_SLES12
0,vm20-SLES-11.3
0,vm22-RHEL7-GA
0,vm112
1,vm61
hmc>

If your LAPRs have simplified_remote_restart_capable=0 then you need to change it as below.

Set the flag simplified_remote_restart_capable=1 i.e. to enable the Simplified Remote Restart feature:

Syntax

chsyscfg-r lpar -m server -i "name=partition name, simplified_remote_restart_capable=1"

Note carefully those double quotes!

Example:

 chsyscfg -r lpar -m P8-lime  -i “name=vm61,simplified_remote_restart_capable=1”
Takes a couple of seconds

 

To disable SRR, change the “=1” to “=0”. Pretty obvious but you may not have thought of that!

 

Warning:

  • I am told this adds some CPU overhead on the HMC
  • It regularly collects VIOS configuration details for Simplified Remote Restart VM’s and during LPAR / VM configuration changes.
  • I have non seen CPU% figures but its not large & occasional
  • If you set 100's of LPARs to Simplified Remote Restart then start monitoring your HMC CPUs loads before and afterward.

Once you have enabled SRR - you can find / check it on the HMC GUIs.

HMC Classic View

Activate panel

HMC Enhanced+ View

View Only

Once set - DO NOT FORGET TO REBOOT YOUR LOGICAL PARTITION / VIRTUAL MACHINE

  • And worth checking this is true afterwards:  remote_restart_status=Remote Restartable

 3 Validate for LPM

You should know this already but for completeness here is a reminder.

On the HMC your Server and Select your LPAR then Operations -> Mobility -> Validate

Select the right menus

Then Select your target Server - something similar so it can support the same POWER mode then click  Validate

Run it

Then you should get something like this:

 wit an only warnings

Read the Warning (if any carefully) - here the target does not have AME (that is OK as even if it is in use the machine will use the POWER8 CPU to do AMS without the Hardware accelerator in the chip) and I am not paranoid about slot numbers.

If you get an Error - stop now and fix it.

If LPM does not work the you will never get Simplified Remote Restart working.

Next lets do a Simplified Remote Restart Validate

Back on the HMC Command Line Interface CLI

Syntax

hmc>  rrstartlpar -o validate
   -m Source-box   -p LPAR-name
   -t Target-box

Example

hmc> rrstartlpar -o validate -m P8-lime -p vm61 -t P8-emerald

Warnings:
HSCLB504 The migrating partition cannot use hardware-accelerated Active Memory Expansion on the destination managed system because the destination managed system does not support hardware-accelerated Active Memory Expansion.
HSCLA4CC The management console cannot maintain the source Virtual I/O Server (VIOS) slot number 14 for virtual SCSI adapter 5 on the destination VIOS partition 2*8286-42A*100EC7V.
HSCLA4CC The management console cannot maintain the source Virtual I/O Server (VIOS) slot number 3 for virtual SCSI adapter 4 on the destination VIOS partition 3*8286-42A*100EC7V.
hmc>

Note: The warnings are exactly the same as the LPM once because the Simplified Remote Restart Validate does a LPM Validate but it also checks other things like the Remote Restart State.

 The complete rrstartlpar syntax - note only the bold parts are needs for Simplified - I have not needed any of the others and I think some are for Complicted Remote Restart

 rrstartlpar

-o { restart validate | cancel | cleanup | recover }

-m managed-system

[ -t target-managed-system]

-p partition-name | --id partition-ID}

[--redundantvios {0 | 1 | 2}]

[--mpio {1 | 2}]

[--vlanbridge {1 | 2}]

[--retaindev]

[--usecurrdata]

[-w wait-time]      default 3 minutes

[-d detail-level]      amount of output

[--force]      cleanup/recover

[-v]

SRR Official Pre-Requisties Briefly      

Machine Level

  1. LPM pre-reqs = access to same external storage & sub-net
  2. The HMC 820 SP1 or later (with latest PTF) + 820 firmware
  3. Machines are simplified remote restart capable
  4. Both hosts must be managed by the same HMC
  5. HMC to FSP connection (The HMC needs to definitely confirm the box is off)
  6. The source host must be in Error, Power Off, or Error - dump in progress state on the HMC. (NOTE: Power off from the HMC is OK)

LPAR / VM Level

  1. VM must be Simplified Remote Restart capability enabled
  2. Remote restart state of the VM must be "Remote restartable“

Note:

  • UPDATE: SSP is from Q4 2015 officially supported.

4 Lets pretend the machine crashed

bang

Example:

hmc> rrstartlpar -o restart -m P8-lime -p vm61 -t P8-emerald

 Then the MAGIC happens

  • Like LPM you can watch it on the HMC go through various phases of creating the VM
  • Much quicker than LPM as no memory moved needs to be moved - create LPAR / VM, attaché disks and network and power up the LPAR / VM
  • Restarts the VM automatically at the end for you which is nice.

Then you need to clean up the debris!

VM / LPAR Definition on the source machine is still there - unlike LPM

Why?

  • Source Machine was powered-off so not possible to remove the LPAR
  • VIOS was shutdown -  so can't unconfigure the virtual I/O disks and network

Example:

After machine power up,  VIOS(s) started and settled down … meaning “give it 5 minutes”

hmc> rrstartlpar -o cleanup  -m $SOURCE  -p $VM
    [[for my small VM about 10 second]]
hmc>

If the machine is still down or the VIOS not running fully yet you get error messages like this:

  • HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
  • HSCLA928 The Virtual I/O Server (VIOS) partition limevios1 is not in the Running state. This operation is only allowed when the VIOS partition is running.

 

5 So how to test SRR?

Get to a machine state to Power Off:

CLEAN Method A

  1. Shutdown Virtual Machine(s)
  2. Shutdown VIO Server(s)
  3. Power Off the machine

Power Off state = ready for SRR

Note: This is somewhat cheating!!  In a real crash the VM + VIOS don’t cleanly stop & flush disks

 Get to a machine state to Power Off UGHLY

Ugly Method Alternatives:

  • B) Yank all Electrical power cords
    • Then reinsert
    • But make sure there are not no autostart VIOS or LPAR settings
    • If left unplugged the HMC sees the machine state as unknown and Simplified Remote Restart is NOT available!
  • C) HMC Power-off Server - VIOS & VM running

Note:

  • Now VM + VIOS have to crash recover filesystems so the test is more releasic

As above manual request the remote restart:

HMC CLI: rrstartlpar -o restart …
Should return fairly quickly (after validate)

 Watch the HMC status to see it working on the target machine
The VM will start automatically

 If it fails: rrstartlpar -o recover …

 Clean up source machine

  • Restart machine
  • Restart VIOS
  • Wait 5 minutes
  • rrstartlpar -o cleanup …

 

6 Using PowerVC to do the Simplified Remote Restart instead of the HMC rrstartlpar command

"Remotely Restart Virtual Machines" button appears when you select a Powered-off Host machine that is SRR ready

Warning PowerVC as a few more pre-reqs

  • HMC 830 + Firmware 830  - IMHO recommend be on the latest HMC and Systems Firmware
  • VIOS 2.2.3.4+     - IMHO VIOS 2.2.4
  • At least PowerVC  version 1.2.3.2  - IMHO 1.3

I am not going to cover how to run PowerVC - go watch my YouTube Videos on that.

If you have a machine in "Power Off" State then Select it and click on the "Remotely Restart Virtual Machines" button

PowerVC Hosts

Then you get a list of LPARs / VMs on that machine that are Simplified Remote Restartable - in this example there is only 1.

Select a LPAR /VM and click on "Remote Restart":

Select the VM

Then below select the target machine (from a list) and  click on the "Remote Restart " button

slect the host

And it just gets on with it:

Rebuilding

Later is just shows the VM running on the target machine

Rather an anti-climax !!

PowerVC will clean up the original VM once the machine + VIOS started

  • It checks every 2 minutes and does it as soon as the machine and VIOS are ready

7 PowerVC compared to PowerHA (HACMP)

I have these two charts that cover the highlights and hopefully make it very clear that there are very large differences.

Hopefully these are self explaining but note they are my opinions and NOT IBM official statements.

not HACMP/PowerHA


Compared

8 Further Information

In my YouTube Video on this topic I also discuss scaling up Simplified Remote Restart to many machine and 100's of LPARs / VMs, automated recover  and cover some of the challenges you might need to address.  

Watch it here https://www.youtube.com/watch?v=aoe1fyT5l0A

   video first screen

HMC RR – Knowledge Centre

HMC Community Files – DeveloperWorks

Whitepaper & rrMonitor script

PowerVC Overview of Remote Restart

PowerVC Deep Dive by Christine Wang

Mr chmod666 Articles Hints and Using SRR

rrMonitor original

rrMonitor  improved

 The End

    ">

    Additional Information


    If you find errors or have question, email me: 

    • Subject: SRR
    • E-mail: n a g @ u k . i b m . c o m  

    Also find me on

    • Twitter @mr_nmon
    • LinkedIn www.linkedin.com/in/nigelargriffiths
    • YouTube https://www.youtube.com/nigelargriffiths
       

    Document Location

    Worldwide

    [{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power ->PowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

    Document Information

    Modified date:
    26 November 2019

    UID

    ibm11115721