IBM Support

Mksysb and/or alt_disk_copy hang - must gather

How To


Summary

We all use mksysb and alt_disk_copy often for various of reasons, mainly for system backup and recovery.
However in certain situations mksysb and alt_disk_copy can hang which can prevent us from creating our backup or clone.
In this document I will explain most common causes for mksysb and alt_disk_copy to hang and what data you need to upload to IBM support for quicker resolution of the problem.

Steps


1. Mksysb explained in 3 steps.
When creating an mksysb the command does the following:
  • It creates/updates the image.data file located in the root directory, which contains all the LVM information of the system.
  • Next, it generate a list of files that it will back up.
  • Finally, it begins the back up process.
When mksysb backs up the files, it is important to note that it backs up each file individually in the exact same order as in the generated list of files.
It is also important to note that the mksysb does not back up NFS mounted filesystems however it tries to access them.

Most common hang is due to NFS mount. As mksysb does not back up NFS mounted filesystems it attempts to access them. If an NFS mount is stale mksysb will hang attempting to access it. In this case fixing or simply umounting the filesystem will resolve the issue. Sometimes bad file or files being modified by an application could also cause mksysb to hang.
In that case, excluding those files form the back up should resolve the issue. If you are uncertain how to proceed, then follow the bellow instructions to obtain and upload the data needed for IBM support to help investigate and advise on possible resolution.

If mksysb starts hanging, gather and upload the following data:
Set the mksysb in debug
# export MKSYSB_DEBUG=yes
​
Start a script to capture the log.
# script /tmp/mksysb_debug.out

Run your mksysb command in verbose (adding the -v flag)
# mksysb -iv <dir/name_of_mksysb>

Next, open a new session and make a copy of the archive list. This file is the file generated by the mksysb with the list of files it backs up in that exact same order.
# cd /tmp
# ls -lotr
Look for the directory that looks like this:
mksysb.####### (the numbers are different each time)
# cd mksysb.#######
# cp .archive.list.#######  /tmp/mksysb_archive_list

Once the mksysb command hangs, stop the process with Ctrl+C, then stop the sctipt:
# exit
Set the mksysb out of debug:
# export MKSYSB_DEBUG=no

Next, collect and upload a snap:
# snap -r   <--- clears old snap data
# snap -aZ
# cp /tmp/mksysb_debug.out /tmp/ibmsupt/testcase
# cp /tmp/mksysb_archive_list /tmp/ibmsupt/testcase
# cp /etc/exclude.rootvg /tmp/ibmsupt/testcase
# snap -c
# mv  /tmp/ibmsupt/snap.pax.Z   <Case Number>.snap.pax.Z

Upload the snap to your IBM support case.
2. Alt_disk_copy hangs.
Similar to mksysb, alt_disk_copy hangs usually are caused by dead NFS mount, corrupt files or applications are modifying files at the time of the alt_disk_copy process.
Just as mksysb, alt_disk_copy also creates a list of files it begins to back up in the same order as in that list, then it begins restoring the files on the target disk.
Most hangs usually occur during the back up process.
If you are uncertain how to proceed when a hang occurs, follow the bellow instructions and upload the data needed for IBM support to analyze and advise on a possible resolution.
Start a script to capture the command output in debug and verbose.
# script /tmp/alt_disk_copy_debug.out
# alt_disk_copy <your flags> -D -V
As soon as you initiate the command, quickly open a new session to the system:
# cd /tmp

Keep running the bellow ('ls') command until you see a file name something similar to .include.list.10813894 appear.
#ls | grep include.list

This file (.include.list.10813894) will appear 20 - 30 seconds after the alt_disk_copy is initiated. It disappears quickly, so you must run the 'ls' command quickly until you see that file is created and then make a copy of it before it disappears:
#cp <.include.list.10813894> include.list_original

Next, go back to the fist session and wait for the alt_disk_copy to complete/error/hang and stop the script as follows:
#exit

Next, collect and upload a snap:
# snap -r   <--- clears old snap data
# snap -aZ
# cp /tmp/alt_disk_copy_debug.out /tmp/ibmsupt/testcase
# cp /tmp/include.list_original /tmp/ibmsupt/testcase
# snap -c
# mv  /tmp/ibmsupt/snap.pax.Z   <Case Number>.snap.pax.Z

Upload the snap to your IBM support case.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cvysAAA","label":"Install-\u003Ealt disk"},{"code":"a8m0z000000cvyjAAA","label":"Install-\u003Emksysb\/backups"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions"}]

Document Information

Modified date:
26 April 2022

UID

ibm16572753