Flashes (Alerts)
Abstract
IBM has identified an issue with all supported IBM Spectrum Scale code levels when RDMA is used (Spectrum Scale 5.1.0.0 to 5.1.2.4 and 5.1.3.0 to 5.1.3.1). Applications running on an IBM Spectrum Scale client may read incorrect data from files stored on GPFS resulting in undetected data corruption. A fix requiring atomic operations supported by the Host Channel Adapters (HCA, RDMA adapter) is available.
Content
Due to a race condition between the RDMA software layer and IBM Spectrum Scale, it is possible that an application running on an IBM Spectrum Scale client may read incorrect data from files stored on GPFS under certain conditions. The fix requires atomic operations to be supported by the HCA. Some older HCAs may not support atomic operations in which case RDMA must be disabled on those HCAs to avoid the exposure to reading incorrect data.
Users Affected:
Any systems running IBM Spectrum Scale levels 5.1.0.0 to 5.1.2.4 and 5.1.3.0 to 5.1.3.1with non-GNR NSDs and RDMA enabled are affected by this issue. If RDMA is not enabled the system is not affected. RDMA over InfiniBand and ethernet (RoCE) are both impacted.
The observed cases have been in environments with nodes with multiple HCAs and a very high utilization of the PCI bus.
ESS or ECE systems detect when data is not correctly transferred over the network by the use of checksums and force the data to be retransmitted. If checksum errors are detected, messages like below are written to the logs.
2021-01-01_11:12:34.000+0200: [W] Encountered first checksum error on network I/O to NSD Server disk<disk-name>
Recommendations:
Check your systems state
To check whether an ESS or ECE cluster has experienced this data integrity issue the mmfslog on the Spectrum Scale clients can be searched for the checksum error message as follows:
grep "checksum error on" /var/adm/ras/mmfs.log*
If any instances are shown checksum errors have been encountered.
For non-GNR NSD servers this can only be done if checksums are enabled. This is only the case if the nsdCksumTraditional configuration parameter is set to “yes” which is not the default as it can result in significant I/O performance degradation and a considerable increase in CPU usage.
All HCAs in the cluster support atomic operations
In order to fix the race condition IBM Spectrum Scale is submitting an additional RDMA ATOMIC_FETCH_AND_ADD operation to flush file data from the RDMA HCA on the IBM Spectrum Scale client to the host memory. This requires that the RDMA HCA supports atomic operations which can be verified with the ibv_devinfo command shown below. Once it is assured that the HCA supports atomic operations the IBM Spectrum Scale version should be upgraded to a version containing the fix(see below "Versions"). After that, HCAs not supporting atomic operations cannot be used for RDMA any longer by default.
Some or none of the HCAs support atomic operations
By default, all HCAs not supporting atomic operations will be disabled for RDMA operations, i.e.
• if some but not all HCAs do not support atomic operations, the RDMA traffic will be spread across the ones that do support atomic operations
• if none of the HCAs support atomic operations the cluster will fall back to TCP/IP communication only.
At Spectrum Scale startup the mmfslog contains messages for each HCA that does not support atomic operations:
[W] VERBS RDMA open error verbsPort <port> due to missing support for atomic operations for device <device>
If there are HCAs that do not support atomic operations it is highly recommended to either upgrade the firmware to a version that does support atomic operations (if possible) or replace the HCA.
The default behavior of disabling RDMA for HCAs that do not support atomic operations can be changed by setting the configuration option verbsRdmaWriteFlush to "no". This will enable all HCAs to be used for RDMA even if they do not support atomic operations. This option allows the cluster to continue RDMA use with older HCAs at the expense of the potential data integrity issues seen in environments with high utilization and multiple HCAs per node. The following rules apply:
• For nodes that have only HCAs that support atomic operations the potential data integrity issue is fixed.
• If at least one HCA on a client node does not support atomic operations the RDMA write flush will be disabled only for that client node and thus that client node may experience the data integrity issue.
• If at least one HCA on an NSD server node does not support atomic operations the RDMA write flush will be disabled on that server node and consequently also for all clients of that server node.All nodes without the atomic operations may experience the data integrity issue.
In this case the mmfslog shows the message:
[W] VERBS RDMA WRITE flush support disabled on this node as at least one RDMA adapter does not support atomic operations. See option verbsRdmaWriteFlush.
Upgrading / Installing the efix:
Rolling code upgrade is supported without adding additional restrictions. This is because during the RDMA connection setup the two nodes will automatically negotiate if both sides support atomic operations or not.
If all HCAs support atomic operations nothing needs to be done. The new code includes the patch and atomic operations will be used for flushing file data.
HCAs not supporting atomic operations will be automatically disabled, as documented above. Therefore, it must be assured in advance that the HCAs support atomic operations.Otherwise, RDMA connectivity may be lost after the upgrade (for options see above "Some or none of the HCAs support atomic operations", configuration option verbsRdmaWriteFlush).
Checking whether HCAs support atomic operations
To check whether the RDMA HCAs on a node support atomic operations the command ibv_devinfocan be used:
[root@myhost-58 ~]# ibv_devinfo -v | egrep "hca_id|atomic"
hca_id: mlx5_0
atomic_cap: ATOMIC_HCA (1)
atomic_cap: ATOMIC_HCA (1)
hca_id: mlx5_1
atomic_cap: ATOMIC_HCA (1)
hca_id: mlx5_2
atomic_cap: ATOMIC_HCA (1)
hca_id: mlx5_3
atomic_cap: ATOMIC_HCA (1)
In this example all HCAs support atomic operations.
Here is a counter example:
[root@myhost-02~]# ibv_devinfo -v | egrep "hca_id|atomic"
hca_id: mlx5_0
atomic_cap: ATOMIC_NONE (0)
If the HCA does not support atomic operations see section "Some or none of the HCAs support atomic operations"above.
Fix Version:
Users running IBM Spectrum Scale V5.1.3.0 or V5.1.3.1 code levels should upgrade to IBM Spectrum Scale V5.1.4.0 or later available from Fix Central:
Users running IBM Spectrum Scale V5.1.2.0 to V5.1.2.4 code levels should upgrade to IBM Spectrum Scale V5.1.2.5 or later available from Fix Central:
If you cannot apply the above code level, please contact IBM service to request an efix:
• for Spectrum Scale 5.1.3: APAR IJ39051
• for Spectrum Scale 5.1.2: APAR IJ40280
Users running IBM Spectrum Scale V5.1.0.0 to V5.1.1.4 code levels should contact Support for an efix.
Summary
If there are HCAs that do not support atomic operations the following options exist:
• Leave the older HCAs disabled (assuming there are also newer ones that can handle the RDMA traffic (verbsRdmaWriteFlush=yes, default))
• If available, install up-to-date firmware to allow for atomic operations.
• Replace an older HCA with a model that supports atomic operations.
• Analyze the risk of data integrity issues(single card? high utilization? issues seen in the past?), if acceptable set verbsRdmaWriteFlush=no
• Set nsdCksumTraditional to “yes” but be aware that it can result in significant I/O performance degradation and a considerable increase in CPU usage.
[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"ARM Category":[{"code":"a8m3p000000PC7yAAG","label":"non-GPFS"}],"Platform":[{"code":"PF016","label":"Linux"}],"Version":"5.1.0;5.1.3"}]
Was this topic helpful?
Document Information
Modified date:
24 June 2022
UID
ibm16589893