RDMA tuning
Read about tuning RDMA attributes to avoid problems in configurations with InfiniBand.
See the following sections of this help topic:
Settings for IBM Storage Scale 5.0.x and later
- Registering the page pool to InfiniBand
- If the GPFS daemon cannot register the page pool to InfiniBand, it fails with the following mmfs
log
messages:
VERBS RDMA Shutdown because pagepool could not be registered to Infiniband. VERBS RDMA Try increasing Infiniband device MTTs or reducing pagepool size.
To resolve this problem, try adjusting the following mlx4_core module parameters for the Mellanox Translation Tables (MTTs). This adjustment does not apply to mlx5_core parameters.- Set log_mtts_per_seg to 0. This value is the recommended one.
- Increase the value of log_num_mtt.
For more information see the following links:- How to increase MTT size in Mellanox HCA at Mellanox Documentation.
- Enabling verbsRdmaSend
- The verbsRdmaSend attribute of the mmchconfig command enables or disables the use of InfiniBand RDMA rather than TCP for most GPFS daemon-to-daemon communications. When the attribute is disabled, only data transfers between an NSD server and an NSD client are eligible for RDMA. When the attribute is enabled, the GPFS daemon uses InfiniBand RDMA connections for daemon-to-daemon communications only with nodes that are at IBM Storage Scale 5.0.0 or later. For more information, see mmchconfig command.
Settings for IBM Storage Scale 4.2.3.x
- Registering the page pool to InfiniBand
- Follow the instructions for registering the page pool to InfiniBand in Settings for IBM Storage Scale 5.0.x and later earlier in this
topic.
- Enabling verbsRdmaSend
- Read the discussion of setting verbsRdmaSend in Settings for IBM Storage Scale 5.0.x and later earlier in this topic. For
4.2.3.x, be aware of the following points:
- Do not enable verbsRdmaSend in clusters greater than 500 nodes.
- Disable verbsRdmaSend if either of the following types of error appears in
the mmfs log:
- Out of memory errors
- InfiniBand error IBV_WC_RNR_RETRY_EXC_ERR
- Setting scatterBufferSize in very large clusters (> 2100 nodes)
- The scatterBufferSize attribute of the mmchconfig
command has a default value of 32768, which provides good performance under most conditions.
However, if the CPU use on the NSD I/O servers is high and client I/O is lower than expected,
increasing the value of scatterBufferSize might improve performance. Try the
following settings:
- For Mellanox FDR 10 InfiniBand: 131072.
- For Mellanox FDR 14 InfiniBand: 262144.
- Setting verbsRdmasPerNode in large clusters (> 100 nodes)
- The verbsRdmasPerNode attribute of the mmchconfig
command sets the maximum number of RDMA data transfer requests that can be active at the same time
on a single node. The default value is 1000. If the cluster is large (more than 100 nodes) the
suggested value is the same value that is set for the attribute
nsdMaxWorkerThreads.
This attribute is supported only in IBM Storage Scale version 4.2.x. For more information, see mmchconfig command in Command reference for IBM Storage Scale 4.2.3.