Varyonvg failure and filesystem mount errors due to disk reservation conflicts

Troubleshooting

Problem

When attempting to varyon a Volume Group (VG), the operation may fail with error "0516-013 varyonvg: The volume group cannot be varied on because there are no good copies of the descriptor area".

Additionally, you might encounter I/O ERROR DETECTED BY LVM in the error report (errpt) for the Physical Volume (PV) associated with the same VG.

errpt|more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
E86653C3   0410084025 P H LVDD           I/O ERROR DETECTED BY LVM
65DE6DE3   0410084025 P S hdiskX         REQUESTED OPERATION CANNOT BE PERFORMED

The error might appear with the label SC_DISK_ERR10 or SC_DISK_ERR4 or SC_DISK_ERR2 or SC_DISK_ERR9 ,But always check the sense code associated with the error and pay attention to sense code 0118 as it indicates disk reservation conflict

If no disk errors with sense code 0118 are present but issues persist, please Open a case with IBM Support for further assistance.

errpt -a| more
LABEL:          SC_DISK_ERR10
IDENTIFIER:     65DE6DE3
Date/Time:       Thu Apr 10 12:17:39 2025
Sequence Number: 185004
Machine Id:      00CBB7004B00
Node Id:         bender
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   hdiskX
Description
REQUESTED OPERATION CANNOT BE PERFORMED
Probable Causes
DASD DEVICE
User Causes
RESOURCE NOT AVAILABLE
UNAUTHORIZED ACCESS ATTEMPTED
        Recommended Actions
        FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
        PERFORM PROBLEM DETERMINATION PROCEDURES
Failure Causes
MEDIA
DISK DRIVE
        Recommended Actions
        FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
        PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
PATH ID
           3
SENSE DATA
0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0118 0004 0000 0000

"summ" diagnostics tool can be used to decode the errors. In this case, the explanation for the SC_DISK_ERR10 error is a path failure due to RESERVATION CONFLICT:

errpt -a > errpt.out
summ errpt.out|more
Apr 10 12:17:00 LVDD       P LVM_IO_FAIL         LV 8000002800000002 loglv01            PV 8000000E00000003 hdisk4      Block 0x44222408 errno EBUSY   pvid 00CBB70050316A88
Apr 10 12:17:00 hdisk4     P SC_DISK_ERR10       path  3 cmd  failure     TEST UNIT READY  RESERVATION CONFLICT [state ONLINE|PENDING_ERROR|

It may also occur that Volume Group (VG) is in varied-on state , but file system still fail to mount with the error 0506-342 and fsck also will fail with I/O error:

mount /DB2Data
Replaying log for /dev/fslv00.
mount: 0506-324 Cannot mount /dev/fslv00 on /DB2Data: The media is not formatted or the format is not correct.
0506-342 The superblock on /dev/fslv00 is dirty.  Run a full fsck to fix.

fsck -yvv /dev/fslv00
The current volume is: /dev/fslv00
Primary superblock is valid.
J2_LOGREDO:log redo processing for /dev/fslv00
Primary superblock is valid.
        Superblock s_state = 0x1 mode = 0x3
*** Phase 1 - Initial inode scan
Fatal: I/O error

Symptom

Volume Group fails to varyon and upon trying to read the VGDA using lqueryvg you will encounter the error as below.

lqueryvg -Atp hdiskX
0516-024 lqueryvg: Unable to open physical volume.
        Either PV was not configured or could not be opened. Run
        diagnostics.

You might also encounter the file system entering a read-only or degraded state, even if the Volume Group (VG) is varied on.

Example:
/dev/fslv00      /backup          jfs2   Feb 03 21:38 ro,degraded
/dev/nim1        /nim             jfs2   Jul 12 14:56 ro,degraded

When there is a disk reservation conflict, the disk size might even appear as zero.

Cause

A SCSI reservation conflict on a disk may result in a 'DISK Busy' state, causing read or write operations to fail when attempting to varyon a Volume Group.

Environment

AIX and VIOS

Diagnosing The Problem

Use lsattr command to verify the current reserve policy.

lsattr -H -El hdisk# -a reserve_policy

Note: if the disk is shared across multiple hosts or LPARs , verify the disk attribute reserve_policy on all systems and confirm it is set to no_reserve.

If the reserve_policy is set to no_reserve, use devrsrv command to identify the stale reservation.

devrsrv -c query -l hdisk#

Resolving The Problem

Step 1: If the reserve_policy is set to single_path, please modify the attribute value to no_reserve on all host with access to disk.

If user_settable field in lsattr -H -El hdisk# -a reserve_policy is True+, use option -U to modify the reserve policy:

lsattr -h -El hdiskX -a reserve_policy
attribute       value                                               description                      user_settable
reserve_policy  single_path                                         Reserve Policy                   True+


chdev -l hdisk# -a reserve_policy=no_reserve -U

If user_settable field in lsattr -H -El hdisk# -a reserve_policy is True, use option -P to modify the reserve policy:

chdev -l hdisk# -a reserve_policy=no_reserve -P

When -P flag is used either reboot of the LPAR or reconfigure of the device is required for the values to take effect in the kernel as directed in chdev man page.

Note: If the reserve_policy is set to a value other than single_path or no_reserve - such as PR_exclusive or PR_shared - please open a case with IBM Support and request further guidance.

Step 2: If the reserve_policy is set to no_reserve and the problem persist, use devrsrv command to clear the stale reservation.

devrsrv -c query -l hdisk#

devrsrv -f -l hdisk#

Step 3: Please contact your Storage support team to help clearing the reservation if problem persist after executing devrsrv command.

Step 4: The reservation is cleared: you can attempt to vary on the Volume Group (VG) and mount the filesystem using the following commands:

varyonvg vg_name
mount /filesystem

If it was only mount issue or file system in read-only and degraded state once disk reservation are cleared , attempt to re-mount the file system using the below commands.

umount /filesystem
fsck -yvv /filesystem
mount /filesystem

Need Further Assistance?
If the above steps do not resolve the issue, open a case with IBM Support.

Please be sure to collect a snap from the affected AIX server and upload it to your case as instructed here.

Related Information

Multiple errors with identifier B0EE9AF5 logged on AIX or VIO server

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cvyiAAA","label":"LVM"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"ARM Category":[{"code":"a8mKe000000TN3DIAW","label":"MPIO"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips