IBM Support

IBM Spectrum Scale (GPFS): RDMA-enabled network adapter failure on the NSD server may result in file IO error

Flashes (Alerts)


Abstract

IBM has identified an issue with all IBM GPFS and IBM Spectrum Scale versions where the NSD server is enabled to use RDMA for file IO and the storage used in your GPFS cluster accessed via NSD servers (not fully SAN accessible) includes anything other than IBM Elastic Storage Server (ESS) or GPFS Storage Server (GSS); under these conditions, when the RDMA-enabled network adapter fails, the issue may result in undetected data corruption for file write or read operations.

Content

Problem Summary: As a result of a logic error when the NSD server is processing a file write or read request after the RDMA-enabled network adapter fails, the data intended to be written to the file will not be written or the data intended to be read from the file will not be read, and in both cases, the application file operation will return success as if the file IO was completed.

Users Affected: This issue affects all IBM GPFS and IBM Spectrum Scale versions. All of the following conditions must be true in order for the problem to occur:


1) The storage used in your GPFS cluster accessed via NSD servers (not fully SAN accessible) includes anything other than IBM Elastic Storage Server (ESS) or GPFS Storage Server (GSS).

2) The NSD Server for any storage devices other than IBM Elastic Storage Server (ESS) or GPFS Storage Server (GSS) is enabled for RDMA and an RDMA-enabled network adapter fails in that server.

3) After the RDMA network adapter fails, the NSD Server processes a network file IO request using RDMA.


Recommendations:
1. Any affected customer (for which all above conditions are true) should upgrade to one of the IBM GPFS product levels (4.1.0.0 through V4.1.0.8) or IBM Spectrum Scale product levels (4.1.1.0 through 4.1.1.14, or 4.2.0.0 through 4.2.3.0) for which a PTF level or efix can be provided to remedy this issue:

(a) For IBM Spectrum Scale V4.2.0.0 through V4.2.3.0, upgrade to Spectrum Scale V4.2.3 PTF1 or later, available from Fix Central at https://www-945.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.2.3&platform=All&function=all

For IBM Spectrum Scale V4.1.0.0 through V4.1.1.14, upgrade to Spectrum Scale V4.1.1 PTF15 or later, available from Fix Central at https://www-945.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.1.1&platform=All&function=all

or

(b) contact IBM Service to obtain the efix for the following code levels:

For IBM Spectrum Scale V4.2.0.0 thru V4.2.3.0, reference APAR IV96037.

For IBM Spectrum Scale V4.1.1.0 thru V4.1.1.14, reference APAR IV96068.

For IBM GPFS V4.1.0.0 thru V4.1.0.8, reference APAR IV96068.

2. If you believe that your Spectrum Scale file system may be affected by this issue, you should contact IBM Service as soon as possible for further guidance and assistance.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"--","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.1.1;4.2.0;4.2.1;4.2.2;4.2.3","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
26 September 2022

UID

ssg1S1010233