How To
Summary
Sometimes when Live Update fails due to a problem on the Surrogate LPAR, the logs from the Surrogate LPAR are not copied over on the remaining Original LPAR and manual extraction is required.
Objective
This document describes:
- Overview of the procedure.
- Finding the disk of the Surrogate's rootvg in a HMC-managed setup.
- Finding the disk of the Surrogate's rootvg in a PowerVC-managed setup.
- Extracting the logs.
- Extracting a dump from the Surrogate (when needed).
- Cleaning up the imported volume group.
Steps
Overview of the procedure
Live Update has procedures to copy the logs out of the Surrogate LPAR, but in cases where the Surrogate crash, connectivity loss or a similar issue, the automated procedure does not work. In those cases we, need to extract the data manually.
To extract the LKU data, we need to find which hdisk was used for the Surrogate boot, import it, mount its file systems and copy the log files.
For HMC-managed systems, the disk can be found quickly by checking the Live Update config file /var/adm/ras/liveupdate/lvupdate.data.
The Surrogate boot disk is the "nhdisk" or "alt_nhdisk".
disks:
nhdisk = hdisk1 <<<<<<<
alt_nhdisk = hdisk2 <<<<<<<
mhdisk = hdisk3
Where, if either the "nhdisk" or "alt_ndhsik" has the "lvup_rootvg" volume group, the Surrogate boot is the other.
In this example, hdisk1 is the Surrogate boot disk:
# lspv
hdisk0 00fabfb25091f4c6 rootvg active
hdisk1 00fabfb25091f3bb None <<<<<<<
hdisk2 00fabfb2cce8c279 lvup_rootvg
hdisk3 00fabfb25092h596 None
Another way to find the Surrogate boot is to check the "lvupdlog", this log is located in "/var/adm/ras/liveupdate/logs". Inside it, you could either search for "alt_disk_copy" or grep for it like this:
*Note the alt_disk_copy check does not apply to PowerVC-managed hosts.
*Note the alt_disk_copy check does not apply to PowerVC-managed hosts.
$ grep alt_disk_copy lvupdlog
OLVUPD 09/13/2022-23:18:11.542 DEBUG lvupdate_utils32.c - 2228 - lvup_createNrvg: About to execute: ulimit -f unlimited;/usr/sbin/alt_disk_copy -B -g -d "hdisk1" -i /var/adm/ras/liveupdate/image.data -e /var/adm/ras/liveupdate/lvup_exclude.rootvg -s /usr/sbin/lvup_newrootvg -L
The alt_disk_copy command's target disk is the one used for the Surrogate boot, in this case "hdisk1"
For PowerVC-managed systems, getting the Surrogate logs is a bit tricky. Because PVC automatically assigns new disks for LKU and removes them post LKU, getting the needed information takes a little more work compared to the HMC-managed hosts.
Live Update will generally leave the Surrogate boot disk after a failure event, but its number could be different, compared to the one assigned at the start of LKU.
To find the Surrogate boot disk, open the /var/adm/ras/liveupdate/logs/lvupdlog and look for "importvg -y __NewRootVG or __SurrRootVG".
importvg -y ___NewRootVG hdisk1
Replaying log for /dev/fslv00.
mount: 0506-324 Cannot mount /dev/fslv00 on /surr_root: A system call received a parameter that is not valid.
mount: 0506-324 Cannot mount /dev/fslv02 on /surr_var: A system call received a parameter that is not valid.
You need to collect those logs only if the automated extraction failed or if the Surrogate crashed.
If there is no importvg for Surr or NewRootVG, one can compare the output from the disks collected before LKU and the "lspv" command:
# lspv | awk '{print$1}'
hdisk0
hdisk1
hdisk3
# awk '{print $3}' /var/adm/ras/liveupdate/lvuppaths.cf
hdisk0
hdisk1
Here, hdisk3 is the Surrogate boot disk.
Extracting the logs
Now that the Surrogate boot disk is identified, we can proceed with importing the volume group and mounting the file systems we need.
- Importing the Volume Group
The command to use is "recreatevg" where "hdiskX" is the Surrogate boot disk identified in the previous steps
# recreatevg -y SurrVG hdiskX
Example:
# recreatevg -y SurrVG hdisk2
SurrVG
# lsvg -l SurrVG
SurrVG:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
fshd5 boot 1 1 1 closed/syncd N/A
fshd6 paging 8 8 1 closed/syncd N/A
fshd8 jfs2log 1 1 1 closed/syncd N/A
fshd4 jfs2 19 19 1 closed/syncd /fs/
fshd2 jfs2 53 53 1 closed/syncd /fs/usr
fshd9var jfs2 19 19 1 closed/syncd /fs/var
fshd3 jfs2 18 18 1 closed/syncd /fs/tmp
fshd1 jfs2 17 17 1 closed/syncd /fs/home
fshd10opt jfs2 21 21 1 closed/syncd /fs/opt
fshd11admin jfs2 18 18 1 closed/syncd /fs/admin
fslg_dumplv sysdump 16 16 1 closed/syncd N/A
fslivedump jfs2 4 4 1 closed/syncd /fs/var/adm/ras/livedump
- Mounting the file system
All file systems and LVs have the "fs" prefix added to them.
# mount /fs/var
- Copy the logs
The only thing left is to copy the logs. They are in "/fs/var/adm/ras/" and it is recommended to collect all of the logs in there.
# tar -cvf /tmp/surr_logs.tar /fs/var/adm/ras/
Finally, upload the "tar" file at: https://www.ibm.com/support/pages/enhanced-customer-data-repository-ecurep-send-data-https
Extracting a dump from the Surrogate (when needed)
In cases where the Surrogate crashed and support needs the system dump, you can extract the dump file manually from the Surrogate's rootvg.
First, verify the dump LV is name after "recreatevg" , normally it should be "fslg_dumplv".
- Query the dump device to find out whether there is a dump and its size
# sysdumpdev -LS /dev/fslg_dumplv
0453-039
Device name: /dev/lg_dumplv
Major device number: 10
Minor device number: 11
Size: 68537856 bytes
Uncompressed Size: 1040972068 bytes
Date/Time: Fri Sep 23 08:14:07 CDT 2022
Dump status: 0
Type of dump: fw-assisted
dump completed successfully
Scanning device /dev/lg_dumplv for existing dump.
A valid dump header was detected. This may take a while.
A complete .BZ compressed dump was detected in device /dev/lg_dumplv.
0453-039
Device name: /dev/lg_dumplv
Major device number: 10
Minor device number: 11
Size: 68537096 bytes
Uncompressed Size: 1040972092 bytes
Date/Time: Fri Sep 23 08:14:41 CDT 2022
Dump status: 0
Type of dump: fw-assisted
- Collect "snap -a"
# snap -r ( to remove old snap content )
# snap -a
- Calculate the dump size in 512-byte blocks
Take the dump size from the last entry:
Size: 68537096 byte
And divide it by 512 like this:
68537096 / 512 = 133861,515625
Even that up to its higher decimal 133862 and this is your size in 512-byte blocks.
- Extract the dump to file
Make a temporary directory and use "dd" to extract the dump in to a file like this:
# mkdir /tmp/ibmsupt/dump
# dd if=/dev/fslg_dumplv of=/tmp/ibmsupt/dump/surr.dump bs=512 skip=1 count=133862
133862+0 records in.
133862+0 records out.
Note the "count" value is the dump size divided by 512 and rounded up to its higher decimal.
- Mount the "/usr" file systems from the surrogate to get the "unix" and "kdb64" files.
At Support, we need the unix and kdb binaries to open the dump.
Normally /usr is renamed to "fshd2", but double check the "lsvg -l SurrVG" output to ensure this is the case, replace the LV name if it's different for you.
# mount /fs/usr
- Copy the files from Surrogate's /usr
# cp /fs/usr/lib/boot/unix_64 /tmp/ibmsupt/dump/surr_unix_64
# cp /fs/usr/sbin/kdb_64 /tmp/ibmsupt/dump/surr_kdb_64
- Compress snap to upload to IBM Software Support.
# snap -c
And upload the snap file at: https://www.ibm.com/support/pages/enhanced-customer-data-repository-ecurep-send-data-https
Cleaning up the imported volume group
- Unmount any previously mounted file systems.
# cd /
# umount /fs/var
# umount /fs/usr
- Varyoff and export the volume group.
# varyoffvg SurrVG
# exportvg SurrVG
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cvz7AAA","label":"Install-\u003ELKU"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"7.2.0;7.3.0"}]
Was this topic helpful?
Document Information
Modified date:
29 January 2025
UID
ibm16787335