Recovering a PCIe device

6.10 LPAR mode z/VM guest KVM guest

Use the zpcictl command or the recover sysfs attribute to handle a malfunctioning PCIe device if automatic recovery fails.

Before you begin

A kernel message is displayed when a PCIe device enters the error state. Automatic recovery is in place for PCIe devices. Do not take action unless the automatic recovery fails.
The following sample sequence of kernel messages indicates a successful recovery for an NVMe device:
zpci: 000e:00:00.0: Event 0x3a reports an error for PCI function 0x1004
nvme nvme0: frozen state error detected, reset controller
zpci: 000e:00:00.0: Initiating reset
nvme nvme0: restart after slot reset
zpci: 000e:00:00.0: The device is ready to resume operations
nvme nvme0: Shutdown timeout set to 10 seconds
nvme nvme0: 63/0/0 default/read/poll queues
Failed automatic recoveries end with error messages that call for operator intervention as shown in the following example.
zpci: 000d:00:00.0: Automatic recovery failed after slot reset
zpci: 000d:00:00.0: Automatic recovery failed; operator intervention is required

Procedure

  1. Optional: Find out which PCIe device is in an error state by issuing the lspci command.
    In the following example, the device in error state can be identified by the trailing (rev ff) in the output line.
    # lspci
    0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] (rev ff)
    0001:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
    0002:00:00.0 Non-VGA unclassified device: IBM Internal Shared Memory (ISM) virtual PCI device
    
  2. Recover the device with the appropriate method for your virtualization environment.
    The preferred method is using the zpcictl command. On KVM guests you cannot use this command to recover PCIe devices, so use the sysfs interface instead.
    • Use the zpcictl command to handle defective PCIe devices. The recovery commands are of this form:
      # zpcictl <option> <function_address>
      where <option> specifies an action that depends on the status of automatic recovery. <function_address> specifies the malfunctioning PCIe device. The examples that follow assume function address 0000:00:00.0.

      For more information about the zpcictl command, see zpcictl - Manage defective PCIe devices.

      Automatic recovery runs but fails
      If automatic recovery runs but fails, force a disruptive reset by using the --reset option. For example:
      # zpcictl --reset 0000:00:00.0

      This reset method includes a controlled shutdown and a subsequent re-enabling of the device. As a result, higher level interfaces such as network interfaces and block devices are destroyed and re-created. Manual configuration steps might be required to re-integrate the device, for example, in bonded interfaces or software RAIDs.

      Recovery does not start automatically
      If the initial device error message is not followed by automatic device recovery, trigger the recovery by using the --reset-fw option. For example:
      # zpcictl --reset-fw 0000:00:00.0
      Recovery unsuccessful
      If all attempts at recovery fail, use the --deconfigure option to prepare for manual repair actions or replacement of the physical device. For example:
      # zpcictl --deconfigure 0000:00:00.0
      This command performs a crude, unplug-style removal of the PCI function. Do not use it for operational PCI functions.
    • Alternatively, you can use the sysfs interface to trigger the recovery. Use this method on KVM guests.
      1. Find the PCIe device directory in sysfs.

        PCIe device directories are of the form /sys/bus/pci/devices/<function_address>, where <function_address> identifies the PCIe device, for example: /sys/bus/pci/devices/0000:00:00.0.

      2. Write 1 to the recover attribute of the PCIe device, for example:
        # echo 1 > /sys/bus/pci/devices/0000:00:00.0/recover
        After a successful recovery, the PCIe device is de-registered and reprobed.