Recovering failed SCSI devices

Red Hat Enterprise Linux 9.2 LPAR mode z/VM guest

Failed SCSI devices are automatically recovered by the zfcp device driver. You can read the zfcp_in_recovery attribute to check whether recovery is under way.

Before you begin

The FCP device must be online.

Procedure

Perform the following steps to check the recovery status of a failed SCSI device:

  1. Check the value of the zfcp_in_recovery attribute. Issue the lszfcp command:
    # lszfcp -l <LUN> -a

    where <LUN> is the LUN of the associated SCSI device.

    Alternatively, you can issue a command of this form:
    # cat /sys/class/scsi_device/<device_name>/device/zfcp_in_recovery

    The value is 1 if recovery is under way and 0 otherwise. If the value is 0 for a non-operational SCSI device, recovery might have failed. Alternatively, the device driver might have failed to detect that the SCSI device is malfunctioning.

  2. To find out whether recovery failed, read the zfcp_failed attribute. Either use the lszfcp command again, or issue a command of this form:
    # cat /sys/class/scsi_device/<device_name>/device/zfcp_failed

    The value is 1 if recovery has failed, and 0 otherwise.

  3. You can start or restart the recovery process for the SCSI device by writing 0 to the zfcp_failed attribute. Issue a command of this form:
    # echo 0 > /sys/class/scsi_device/<device_name>/device/zfcp_failed

Example

In the following example, SCSI device 0:0:0:0 is malfunctioning. The first command reveals that recovery is not already under way. The second command manually starts recovery for the SCSI device:

# cat /sys/class/scsi_device/0:0:0:0/device/zfcp_in_recovery
0
# echo 0 > /sys/class/scsi_device/0:0:0:0/device/zfcp_failed

What to do next

If you manually configured an FCP LUN, but did not get a corresponding SCSI device, you can also use the corresponding FCP LUN sysfs attributes, in_recovery and failed, to check on recovery.