subscribe iconSubscribe to this information
POWER6 information

Checking InfiniBand configuration in Linux

Use the Linux® operating system to verify that the host channel adapters (HCAs) are available and configured correctly.

Verifying that HCAs are visible to the logical partitions

To verify that host channel adapters (HCAs) are visible to the logical partitions, perform the following steps:
  1. From the Cluster Systems Management/Management Server (CSM/MS), run the following command:
    dsh -av "ibv_devices | grep ehca" | wc –l
  2. Select from the following options:
  3. Run the following command:
    dsh -av "ibv_devices | grep ehca" > hca_list

    A list of HCAs visible to the logical partitions plus their respective GUID will be generated.

  4. Open the generated file, hca_list and compare it with the list of all expected HCAs by their GUID.
  5. For each logical partition having HCAs that are not visible, check to see if the HCA was assigned to that logical partition by performing the following steps:
    1. From the HMC that manages the server, verify that the HCA has been assigned to the logical partition. If the HCA has not been assigned to the logical partition, see Installing or replacing an InfiniBand GX host channel adapter.

      If the device was not assigned to the logical partition, see Installing the operating system and configuring the cluster servers. After you assign the HCA to the logical partition, return to this step.

    2. After you assign the HCA to the correct logical partition, run the following command:
      dsh -av "find /sys/bus/ibmebus/devices -name 'lhca*' | wc -l"
    3. Select from the following options:
  6. If you have an HCA that is assigned to a logical partition, but the HCA is not visible to the system, perform the following steps:
    1. Open the Manage Serviceable Events task on the HMC that manages each server and review the error logs.
    2. Fix any events that are reported against each server or HCAs in that server
    3. Perform one of the following recovery procedures:

Verifying that all HCAs are available to the logical partitions

To verify that all HCAs are available to the logical partitions, perform the following steps:
  1. Run the following command:
    dsh -av "ibv_devinfo | grep PORT_ACTIVE" | wc -l

    The total number of ports that are active are shown. Note that HCA has two ports.

  2. Select from the following options:
    • If the number returned by the system, divided by two, matches the number of HCAs in the cluster, continue with the procedure to verify that all HCAs are available to the logical partitions.
    • If the number returned by the system, divided by two, does not match the number of HCAs, determine the inactive ports and check their cabling state by following step 3.
    • If the number returned by the system, divided by two, does not match the number of HCAs and the ports are correctly connected, continue with step 5.
  3. Verify that all ports are active by running the command:
    dsh -av "ibv_devinfo | egrep ‘hca_id|node_guid|port:|PORT_DOWN'"
  4. For each port listed by the system, ensure that the respective cable is connected firmly to the adapter as well as with the switch.
    • You might want to consider enabling the auto-port-detection feature of eHCA, especially if there are ports unused by purpose. In order to enable that feature add the following line to the file
      /etc/modprobe.conf.local:
      options ib_ehca nr_ports=-1
    • In order get a full list of supported options run the command: modinfo ib_ehca
  5. Verify that all servers are powered on
  6. Run the command:
    dsh -av "lsdev -Cc adapter | grep sn | grep -v Available"

    A list of HCAs that are visible to the system but not available is shown.

  7. Reboot any logical partition linked to an HCA that is listed as not available.
  8. Check SFP and HPSNM for errors related to the links associated with any HCA listed as not available.
  9. When all HCAs are listed as available to the operating system, continue with the procedure to verify HCA numbering and the netid for logical partition.
  10. Check HCA allocation across logical partitions. For HPC Cluster, there should only be one active logical partition and the HCA should be Dedicated to it.
  11. Assure that the fabric is balanced across the subnets. The following command string gathers the GID-prefixes for the ib interfaces. The GID-prefixes should be consistent across all logical partitions.
    dsh –av ‘netstat -i | grep 'ib.*link' | awk \'{split($4,a,"."); for
    (i=5;i<=12;i++){printf a[i]}; printf "\n"}\''
  12. Verify that the tcp_sendspace and tcp_recvspace attributes are set correctly:
    [Nam] Is this send_queue_size and recv_queue_size from ipoib?
    
    [Mark] Yes
    
    dsh –av “ibstat –v | grep ‘tcp_send.*tcp_recv'”

    Because superpackets should be on, the expected attribute value results are tcp_sendspace=524288 and tcp_recvspace=524288.

Verifying that the IP maximum transfer unit (MTU) is configured correctly

To verify that the Internet Protocol (IP) MTU is configured correctly, perform the following steps:
  1. Run the following command:
    dsh –av “find /sys/class/net -name 'ib*' | xargs -I dn cat dn/mtu”
  2. Select from the following options:
  3. For each HCA ibX having the wrong MTU, run the command on the respective logical partition:
    echo <right value> > /sys/class/net/ibX/mtu

Verifying that the network interfaces are recognized as up and available

To verify that the network interfaces are recognized as up and available, run the following command:
dsh –av ‘/usr/bin/lsrsrc IBM.NetworkInterface Name OpState | grep -p"resource" -v "OpState = 1" | grep ib'

The following command string should return no interfaces. If an interface is marked down, it returns the logical partition and ibX interface.


Send feedback | Rate this page

Last updated: Tue, February 08, 2011