Suboptimal performance due to VERBS RDMA being inactive

IBM Storage Scale for Linux® supports InfiniBand Remote Direct Memory Access (RDMA) by using the Verbs API for data transfer between an NSD client and the NSD server. If InfiniBand VERBS RDMA is enabled on the IBM Storage Scale cluster and there is drop in the file system performance, then verify whether the NSD client nodes are using VERBS RDMA for communication to the NSD server nodes. If the nodes are not using RDMA, then the communication switches to using the GPFS node’s TCP/IP interface, which can cause performance degradation.

Note:
  • For information about tuning RDMA parameters, see RDMA tuning.
  • In IBM Storage Scale 5.0.4 and later, the GPFS daemon startup service waits for a specified time period for the RDMA ports on a node to become active. You can adjust the length of the timeout period and choose the action that the startup service takes if the timeout expires. For more information, see the descriptions of the verbsPortsWaitTimeout attribute and the verbsRdmaFailBackTCPIfNotAvailable attribute in the help topic mmchconfig command.

Problem identification

Issue the mmlsconfig | grep verbsRdma command to verify whether VERBS RDMA is enabled on the IBM Storage Scale cluster.

# mmlsconfig | grep verbsRdma

verbsRdma enable

If VERBS RDMA is enabled, check whether the status of VERBS RDMA on a node is Started by running the mmfsadm test verbs status command.

# mmfsadm test verbs status
VERBS RDMA status: started

The following sample output shows the various disks in the gpfs1b file system and the NSD servers that are supposed to act as primary and secondary servers for these disks.

# mmlsnsd
File system   Disk name    NSD servers
---------------------------------------------------------------------------
 gpfs1b        DMD_NSD01    c25m3n07-ib,c25m3n08-ib
 gpfs1b        DMD_NSD02    c25m3n08-ib,c25m3n07-ib
 gpfs1b        DMD_NSD03    c25m3n07-ib,c25m3n08-ib
 gpfs1b        DMD_NSD04    c25m3n08-ib,c25m3n07-ib
 gpfs1b        DMD_NSD05    c25m3n07-ib,c25m3n08-ib
 gpfs1b        DMD_NSD06    c25m3n08-ib,c25m3n07-ib
 gpfs1b        DMD_NSD07    c25m3n07-ib,c25m3n08-ib
 gpfs1b        DMD_NSD08    c25m3n08-ib,c25m3n07-ib
 gpfs1b        DMD_NSD09    c25m3n07-ib,c25m3n08-ib
 gpfs1b        DMD_NSD10    c25m3n08-ib,c25m3n07-ib

Issue the mmfsadm test verbs conn command to verify whether the NSD client node is communicating with all the NSD servers that use VERBS RDMA. In the following sample output, the NSD client node has VERBS RDMA communication active on only one of the two NSD servers.

# mmfsadm test verbs conn

RDMA Connections between nodes: 
  destination idx cook sta cli peak cli RD cli WR  cli RD KB   cli WR KB  srv wait serv RD  serv WR  serv RD KB  serv WR KB   vrecv  vsend  vrecv KB  vsend KB
  ----------- --- ---  --- --- --- ------ -------- ---------   ---------  --- --- -------- -------- ----------- ----------- ------- ----- ---------  -------- 
  c25m3n07-ib  1    2  RTS   0  24   198   16395     12369     34360606    0   0     0        0           0           0          0      0       0        0    

Problem resolution

Resolve any low-level InfiniBand RDMA issue like loose InfiniBand cables or InfiniBand fabric issues. When the low-level RDMA issues are resolved, issue system commands like ibstat or ibv_devinfo to verify whether the InfiniBand port state is active. The following system output displays the output for an ibstat command issued. In the sample output, the port state for Port 1 is Active, while that for Port 2 is Down.

# ibstat
CA 'mlx5_0'
        CA type: MT4113
        Number of ports: 2
        Firmware version: 10.100.6440
        Hardware version: 0
        Node GUID: 0xe41d2d03001fa210
        System image GUID: 0xe41d2d03001fa210
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 29
                LMC: 0
                SM lid: 1
                Capability mask: 0x26516848vverify
                Port GUID: 0xe41d2d03001fa210
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 65535
                LMC: 0
                SM lid: 0
                Capability mask: 0x26516848
                Port GUID: 0xe41d2d03001fa218
                Link layer: InfiniBand

Restart GPFS on the node and check whether the status of VERBS RDMA on a node is Started by running the mmfsadm test verbs status command.

In the following sample output, the NSD client (c25m3n03-ib) and the two NSD servers all show VERBS RDMA status as started.

# mmdsh -N nsdnodes,c25m3n03-ib '/usr/lpp/mmfs/bin/mmfsadm test verbs status'
c25m3n03-ib:  VERBS RDMA status: started
c25m3n07-ib:  VERBS RDMA status: started
c25m3n08-ib:  VERBS RDMA status: started

Perform a large I/O activity on the NSD client, and issue the mmfsadm test verbs conn command to verify whether the NSD client node is communicating with all the NSD servers that use VERBS RDMA.

In the following sample output, the NSD client node has VERBS RDMA communication active on all the active NSD servers.

# mmfsadm test verbs conn

RDMA Connections between nodes:
  destination  idx cook sta cli peak cli RD  cli WR cli RD KB cli WR KB srv wait serv RD serv WR  serv RD KB  serv WR KB vrecv vsend  vrecv KB  vsend KB
  ------------ --- ---  --- --- --- ------- ------- --------- --------- --- ---  ------- ------- ----------- ----------- ----- ------ --------- ---------
  c25m3n08-ib    0   3  RTS   0  13   8193   8205    17179930  17181212   0   0     0       0           0           0     0       0        0        0
  c25m3n07-ib    1   2  RTS   0  14    8192  8206    17179869  17182162   0   0     0       0           0           0     0       0        0        0