Verifying RDMA configurations for connectivity issues on Linux®

RDMA connectivity issues are most commonly caused by misconfiguration. You can verify RDMA configurations to ensure that the members can communicate with the CF.

Before you begin

You must ensure that all of the required OFED packages are installed on all the hosts, and that all required software meets the minimum supported levels. You can check the installed packages and versions by issuing the rpm -qa | grep ofed command.

Procedure

Use the following steps to verify your RDMA configurations:

  1. Examine the physical port states by running the ibstat -v command.
    Ensure that the State is Active, and the Physical State is LinkUp as shown in the following example:
    CA 'mthca0'
            CA type: MT25208 (MT23108 compat mode)
            Number of ports: 2
            Firmware version: 4.7.400
            Hardware version: a0
            Node GUID: 0x0005ad00000c03d0
            System image GUID: 0x0005ad00000c03d3
            Port 1:
                    State: Active
                    Physical state: LinkUp
                    Rate: 10
                    Base lid: 16
                    LMC: 0
                    SM lid: 2
                    Capability mask: 0x02510a68
                    Port GUID: 0x0005ad00000c03d1
            Port 2:
                    State: Down
                    Physical state: Polling
                    Rate: 10
                    Base lid: 0
                    LMC: 0
                    SM lid: 0
                    Capability mask: 0x02510a68
                    Port GUID: 0x0005ad00000c03d2
    If the port State is not Active, check the cable for connectivity.
  2. On the CF hosts, verify that the IP address associated with the RDMA ports matches the IP addresses used for the net names for the CF entry in the db2nodes.cfg file.
    1. View the IP address that is associated with the RDMA ports on the CF host.
      To view the IP address that is associated with the RDMA port, run the ifconfig -a command. The IP address can be found by looking at the address that is associated with the inet addr field as shown:
      coralxib20:/home/svtdbm3 >ifconfig -a
      ib0       Link encap:UNSPEC  HWaddr 80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
                inet addr:10.1.1.120  Bcast:10.1.1.255  Mask:255.255.255.0
                inet6 addr: fe80::205:ad00:c:3d1/64 Scope:Link
                UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
                RX packets:18672 errors:0 dropped:0 overruns:0 frame:0
                TX packets:544 errors:0 dropped:0 overruns:0 carrier:0
                collisions:0 txqueuelen:256
                RX bytes:2198980 (2.0 Mb)  TX bytes:76566 (74.7 Kb)
      In the output, ib0 is the interface name. The status is UP, and the IP address is 10.1.1.120. It is important to ensure that the interface status is up.
    2. Ensure the network names for the CF in the db2nodes.cfg file match with the IP addresses for the intended RDMA port to use for the CF.
      You must also ensure that the name can be pinged, and is reachable from all hosts on the cluster.

      From each member host, run a ping command against the network names that are associated with the CF entry in the db2nodes.cfg file. Observe the IP address returned. The IP address must match the IP address that is associated with the RDMA port configuration at the CF host, as in the ifconfig -a output.

      Note: When you ping an IP address on a different subnet, the pings are unsuccessful. This occurs when you have multiple subnet masks for each interface when there are multiple interfaces defined for the CF. In this case, from the member, ping the target IP address on the CF host that has the same subnet mask as the interface on the member host.
  3. Ensure that the port value specified on the client connect request match the port value the CF listens on.
    You must ensure that the CF port values are the same in the /etc/services files for all hosts in the cluster.
    1. To determine the port value that is used for the CF, look in the CF diagnostic log file.
      In the cfdiag_<timestamp>.<id>.log file, look for the value that is associated with the CA Port[0] field as part of the prolog information at the beginning of the log file.
    2. To determine the port value that is used by the member on the connect request, look for the PsOpen event in the Db2® member diagnostic log (db2diag.log) file.
      Look for the value of the caport field.
  4. Perform an RDMA ping across the cluster by running the following:
    db2cm -verify -req -rdma_ping