zFS Online Salvage and Disabled Aggregate Recovery
Yumei 310000FGG5 Visits (4460)
Currently, the zFS kernel has the capability to perform a salvage verification and repair of an aggregate. If an aggregate has a problem, zFS will disable the aggregate for all access. The user must manually unmount the file system and run an offline repair.
If zFS finds a problem that might lead to a corruption of an aggregate, or if the aggregate is already corrupted, zFS will disable access to the aggregate. In approximately one minute’s time, zFS will initiate an internal re-mount (NORWSHARE) or a chgowner operation (RWSHARE) to re-initialize all the memory information about that aggregate to ensure the zFS memories start cleaning the aggregate. This frequently prevents corruptions from reaching disks, but sometimes the disk will indeed be corrupted. If the disk truly is corrupted, it’s likely the corruption will be encountered again and zFS will disable access again. If zFS disables the same aggregate 3 times it will remain disabled.
In V2R3, zFS provides the ability for the customer to initiate an online salvage, with online salvage, zFS can now initiate salvage verification and repair of a R/W file system without unmounting it, and if an aggregate has been disabled 3 times, zFS can now repair it online without the need for a manual unmount of the file system, the file system becomes usable again without the need to alter the mount tree.
The online salvage allows concurrent reading of the aggregate while salvage verification is taking place and user activities would only be stopped if a repair was needed. Since verification does all the repair computations in addition to verification, verification is over 99% of the time required to perform a salvage operation. Thus, zFS at least allows file and directory reading while the verification is being performed.
zFS also provides a new ZFSCALL_AGGR PFSCTL command (opcode 0x40000005) to allow an application to request a salvage of a zFS file system (it should be noted that this command could be long running so the task could wait a long time).
In order to invoke the command, the issuer must be logged in as a root user (UID=0) or have READ authority to the SUPE
zFS provides a salvage command (and API that goes with it) to allow the salvaging of a file system, the syntax of that command is shown here.
Salvage processing is driven by the zFS owner. The zfsadm salvage command does not provide detailed status information. This information is available in the system log of the zFS owner. The “zfsadm fsinfo” or “F ZFS,FSINFO” command can also be used to display minimal point in time information about the progress of a salvage operation.
Commands “zfsadm fsino” and “F ZFS,FSINFO” provide a new status field indicator SL which means salvaging. The Legend in the command output possibly shows this new SL indicator and its meaning. The user can also specify the SL in the selection criteria to select aggregates that are salvaging. Additionally, the commands show the progress of an aggregate that is salvaging if the -owner statistics are selected.
An example is shown here:
The following part are some cases I once performed in our environment.
Testing system is running in a colony address space. Testing file systems are OMVSSPT.DEMM.ZFS and OMVSSPT.DEMI.ZFS
Case 1: File system not mounted
Case 2: No salvage running, issue -cancel
Case 3: Successfully verified
Case 4: Salvage + cancel
Session 1: issued zfsadm salvage -aggregate OMVSSPT.DEMI.ZFS -verifyonly
Session 2: issued zfsadm salvage -aggregate OMVSSPT.DEMI.ZFS -cancel
Case 5: salvage + shell unmount
Session 1: issued zfsadm salvage -aggregate OMVSSPT.DEMM.ZFS -verifyonly
Session 2: issued
unmount -o immediate -f OMVSSPT.DEMM.ZFS
unmount -o force -f OMVSSPT.DEMM.ZFS
Case 6: salvage + TSO UNMOUNT
In shell: issued zfsadm salvage -aggregate OMVSSPT.DEMM.ZFS -verifyonly
Case 7: Privilege
The issuer must be logged in as a root user (UID=0) or have READ authority to
For more information about zfsadm salvage command, please refer to: z/OS Distributed File Service zFS Administration -> Chapter 11. zFS commands -> zfsadm salvage