Recovering from a failed ibv_devinfo command

The ibv_devinfo command can fail when modules or hardware drivers fail to load or when libraries are missing.

About this task

The ibv_devinfo command generally fails with one of two common errors. The recovery steps for each of those two errors, and one less common error, are given below.

Procedure

  1. Error: Failed to get IB devices list: Function not implemented.

    One of the common causes of this failure is that the ib_uverbs module might not by loaded or it might not be enabled at the correct run levels. To recover from this error, complete the following steps:

    1. To verify the ib_uverbs module is loaded, run the following command and look for similar output:
      lsmod | grep ib_uverbs
      
      ib_uverbs              44238  0
    2. To verify that the RDMA run level is set to on for levels 3 and 5, run the following command and look for similar output:
      chkconfig --list | grep rdma
      
      0:off 1:off 2:off 3:on 4:off 5:on 6:off 
      If RDMA is off, run the following commands to activate RDMA on levels 3 and 5:
      chkconfig --level 3 rdma on 
      	chkconfig --level 5 rdma on
      Run the following command to restart RDMA:
      openibd restart/rdma restart
    3. If there is a missing library, you will see an error similar to the following:
      libibverbs: Warning: couldn't load driver 'mlx4': libmlx4-rdmav2.so: cannot open shared object file: No such file or directory 
      	libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 
      	No IB devices found. 

      If you receive this error, install the libmlx4 user level library.

  2. Error: No IB devices found.

    If no IB devices are found, complete the following steps:

    1. Check to see if the relevant hardware driver is loaded. If a hardware driver is missing, then run the following command:
      modprobe <hardware driver>
    2. Verify that the hardware driver is loaded by default by editing the configuration file.
    3. Run the following command to restart RDMA:
      openibd restart/rdma restart
  3. Error: On Red Hat Enterprise Linux 5.x on ppc64, the wrong libraries are installed.

    Red Hat Enterprise Linux 5.x on ppc64 requires 32-bit user level libraries like libmlx4. However, by default, the 64-bit libraries are installed. Make sure that you have the correct 32-bit libraries installed.