IBM Support

LKU: Gathering data from the Surrogate LPAR after a failed Live Update.

How To


Summary

Sometimes when Live Update fails due to a problem on the Surrogate LPAR, the logs from the Surrogate LPAR are not copied over on the remaining Original LPAR and manual extraction is required. 

Objective

This document describes:
  • Overview of the procedure.
  • Finding the disk of the Surrogate's rootvg in a HMC-managed setup. 
  • Finding the disk of the Surrogate's rootvg in a PowerVC-managed setup. 
  • Extracting the logs. 
  • Extracting a dump from the Surrogate (when needed). 
  • Cleaning up the imported volume group.

Steps

Overview of the procedure

Live Update has procedures to copy the logs out of the Surrogate LPAR, but in cases where the Surrogate crash, connectivity loss or a similar issue, the automated procedure does not work. In those cases we, need to extract the data manually. 

To extract the LKU data, we need to find which hdisk was used for the Surrogate boot, import it, mount its file systems and copy the log files. 

 

For HMC-managed systems, the disk can be found quickly by checking the Live Update config file /var/adm/ras/liveupdate/lvupdate.data. 
The Surrogate boot disk is the "nhdisk"  or "alt_nhdisk". 
disks:
       nhdisk = hdisk1  <<<<<<<
       alt_nhdisk = hdisk2  <<<<<<<
       mhdisk = hdisk3 
Where, if either the "nhdisk" or "alt_ndhsik" has the "lvup_rootvg" volume group, the Surrogate boot is the other. 
In this example, hdisk1 is the Surrogate boot disk: 
# lspv
hdisk0          00fabfb25091f4c6                    rootvg          active      
hdisk1          00fabfb25091f3bb                    None       <<<<<<<
hdisk2          00fabfb2cce8c279                    lvup_rootvg       
hdisk3          00fabfb25092h596                    None                 
Another way to find the Surrogate boot is to check the "lvupdlog", this log is located in "/var/adm/ras/liveupdate/logs". Inside it, you could either search for "alt_disk_copy" or grep for it like this: 
*Note the alt_disk_copy check does not apply to PowerVC-managed hosts. 
 
$ grep alt_disk_copy lvupdlog
OLVUPD 09/13/2022-23:18:11.542 DEBUG lvupdate_utils32.c - 2228 - lvup_createNrvg: About to execute: ulimit -f unlimited;/usr/sbin/alt_disk_copy -B -g -d "hdisk1" -i /var/adm/ras/liveupdate/image.data -e /var/adm/ras/liveupdate/lvup_exclude.rootvg -s /usr/sbin/lvup_newrootvg -L
The alt_disk_copy command's target disk is the one used for the Surrogate boot, in this case "hdisk1" 
For PowerVC-managed systems, getting the Surrogate logs is a bit tricky. Because PVC automatically assigns new disks for LKU and removes them post LKU, getting the needed information takes a little more work compared to the HMC-managed hosts. 
Live Update will generally leave the Surrogate boot disk after a failure event, but its number could be different, compared to the one assigned at the start of LKU. 
To find the Surrogate boot disk, open the /var/adm/ras/liveupdate/logs/lvupdlog and look for "importvg -y __NewRootVG or __SurrRootVG".
importvg -y ___NewRootVG hdisk1
Replaying log for /dev/fslv00.
mount: 0506-324 Cannot mount /dev/fslv00 on /surr_root: A system call received a parameter that is not valid.
mount: 0506-324 Cannot mount /dev/fslv02 on /surr_var: A system call received a parameter that is not valid.
You need to collect those logs only if the automated extraction failed or if the Surrogate crashed. 
If there is no importvg for Surr or NewRootVG, one can compare the output from the disks collected before LKU and the "lspv" command: 
# lspv | awk '{print$1}'
hdisk0
hdisk1 
hdisk3

# awk '{print $3}' /var/adm/ras/liveupdate/lvuppaths.cf
hdisk0
hdisk1
Here, hdisk3 is the Surrogate boot disk. 

Extracting the logs

Now that the Surrogate boot disk is identified, we can proceed with importing the volume group and mounting the file systems we need. 
  • Importing the Volume Group
While importing, you need to pay attention, and preferably save the console output as the Logical Volumes is renamed to "fslv0*". 
The command to use is "importvg" where "hdiskX" is the Surrogate boot disk identified in the previous steps  
# importvg -y SurrVG hdiskX
Example: 
# importvg -y SurrVG hdisk1
0516-530 synclvodm: Logical volume name hd5 changed to bootlv00.
0516-530 synclvodm: Logical volume name hd6 changed to pagelv00.
0516-530 synclvodm: Logical volume name hd8 changed to loglv00.
0516-712 synclvodm: The chlv succeeded, however chfs must now be 
        run on every filesystem which references the old log name hd8.
0516-530 synclvodm: Logical volume name hd4 changed to fslv00.
0516-530 synclvodm: Logical volume name hd2 changed to fslv01.
0516-530 synclvodm: Logical volume name hd9var changed to fslv02.
0516-530 synclvodm: Logical volume name hd3 changed to fslv03.
0516-530 synclvodm: Logical volume name hd1 changed to fslv04.
0516-530 synclvodm: Logical volume name hd10opt changed to fslv05.
0516-530 synclvodm: Logical volume name hd11admin changed to fslv06.
0516-530 synclvodm: Logical volume name lg_dumplv changed to lv00.
0516-530 synclvodm: Logical volume name livedump changed to fslv07.
imfs: Warning: mount point / already exists in /etc/filesystems.
imfs: Warning: mount point /usr already exists in /etc/filesystems.
imfs: Warning: mount point /var already exists in /etc/filesystems.
imfs: Warning: mount point /tmp already exists in /etc/filesystems.
imfs: Warning: mount point /home already exists in /etc/filesystems.
imfs: Warning: mount point /opt already exists in /etc/filesystems.
imfs: Warning: mount point /admin already exists in /etc/filesystems.
imfs: Warning: mount point /var/adm/ras/livedump already exists in /etc/filesystems.
SurrVG
The output shows the default order of the renamed LVs where we generally need hd9var > fslv02.
*NOTE* that if you already have LVs named "fslv0X" the renaming will look differently, pay close attention to how the LV is renamed after the import. 
The renamed LV list is shown only the first time you import the volume group. 
  • Preparing the Logical Volume 
*The example is for "/var" when "hd9var" was renamed to "fslv02"
First, clean the file systems stanza from the LVCB (Logical Volume Control Block) in order to get a clean mount. 
# putlvcb -f '' fslv02 
Create a directory for the mount point: 
 
# mkdir /surr_var 
  • Mounting the file system 
Note the log device imported in the "importvg" output, by default its "loglv00", but make sure that is the case, replace the name if it was renamed differently for you. 
# mount -o log=/dev/loglv00 /dev/fslv02 /surr_var 
  • Copy the logs 
The only thing left is to copy the logs. They are in "/surr_var/adm/ras/" and it is recommended to collect all of the logs in there.
# tar -cvf /tmp/surr_logs.tar /surr_var/adm/ras/
Finally, upload the "tar" file at: https://www.ibm.com/support/pages/enhanced-customer-data-repository-ecurep-send-data-https

Extracting a dump from the Surrogate (when needed)

In cases where the Surrogate crashed and support needs the system dump, you can extract the dump file manually from the Surrogate's rootvg
First, verify the dump LV is renamed to "lv00" in the "importvg" output , if it is not, use the new name. 
  • Query the dump device to find out whether there is a dump and its size 
# sysdumpdev -LS /dev/lv00
0453-039

Device name:         /dev/lg_dumplv
Major device number: 10
Minor device number: 11
Size:                68537856 bytes
Uncompressed Size:   1040972068 bytes
Date/Time:           Fri Sep 23 08:14:07 CDT 2022
Dump status:         0
Type of dump:        fw-assisted
dump completed successfully

Scanning device /dev/lg_dumplv for existing dump.
A valid dump header was detected. This may take a while.
A complete .BZ compressed dump was detected in device /dev/lg_dumplv.
0453-039

Device name:         /dev/lg_dumplv
Major device number: 10
Minor device number: 11
Size:                68537096 bytes
Uncompressed Size:   1040972092 bytes
Date/Time:           Fri Sep 23 08:14:41 CDT 2022
Dump status:         0
Type of dump:        fw-assisted
  • Collect "snap -a"
# snap -r  ( to remove old snap content )
# snap -a
  • Calculate the dump size in 512-byte blocks
Take the dump size from the last entry: 
Size:                68537096 byte
And divide it by 512 like this: 

68537096 / 512  = 133861,515625

Even that up to its higher decimal 133862 and this is your size in 512-byte blocks. 

  • Extract the dump to file 
Make a temporary directory and use "dd" to extract the dump in to a file like this: 
# mkdir /tmp/ibmsupt/dump
# dd if=/dev/lv00 of=/tmp/surr/surr.dump bs=512 skip=1 count=133862
133862+0 records in.
133862+0 records out.
Note the "count" value is the dump size divided by 512 and rounded up to its higher decimal. 
  • Mount the "/usr" file systems from the surrogate to get the "unix" and "kdb64" files. 
At Support, we need the unix and kdb binaries to open the dump.
Normally /usr is renamed to "fslv01", but pay attention to the "importvg" output to ensure this is the case, replace the LV name if it's different for you. 
# mkdir /surr_usr 
# putlvcb -f '' fslv01
# mount -o log=/dev/loglv00 /dev/fslv01 /surr_usr 

  • Copy the files from Surrogate's /usr 
# cp /surr_usr/lib/boot/unix_64 /tmp/ibmsupt/dump/surr_unix_64
# cp /surr_usr/sbin/kdb_64 /tmp/ibmsupt/dump/surr_kdb_64
  • Compress snap to upload to IBM Software Support. 
# snap -c 
And upload the snap file at: https://www.ibm.com/support/pages/enhanced-customer-data-repository-ecurep-send-data-https

Cleaning up the imported volume group

  • Unmount any previously mounted file systems. 
# cd /
# umount /surr_var 
# umount /surr_usr 
  • Varyoff and export the volume group. 
# varyoffvg SurrVG
# exportvg SurrVG

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cvz7AAA","label":"Install-\u003ELKU"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"7.2.0;7.3.0"}]

Document Information

Modified date:
23 September 2022

UID

ibm16787335