Flashes (Alerts)
Abstract
IBM has identified a problem with the IBM Spectrum Scale V5.1.2.0, which may result in TCP connection reset and node expels. This could impact Spectrum Scale (including Spectrum Scale Erasure Code Edition) when using TCP/IP network on all operating systems, so upgrading to V5.1.2.0 is not recommended.
Content
Problem Summary:
This problem is a result of an issue in a new feature which is introduced in Spectrum Scale V5.1.2. This feature uses a scatter buffer to read TCP network data from kernel space to improve performance by reducing the number of syscalls calling into kernel space. However, depending on how many bytes are received from the TCP network, the implementation may only pass a zero-length buffer into kernel space, which results in zero returned from the syscall, and the logic incorrectly resets this TCP connection, since the returned code zero has a special meaning. If all the TCP connections are reset, node expel may happen as well.
Users Affected :
This problem affects customers running IBM Spectrum Scale V5.1.2.0 (including Spectrum Scale Erasure Code Edition) when using TCP/IP network on all operating systems.
Problem Determination :
The following messages in /var/adm/ras/mmfs.log, if occurring repeatedly, may be an indication of this problem.
2021-10-25_15:25:10.882+0800: [E] Close connection to 10.10.17.6 node06 :[0] (Connection reset by peer). Attempting reconnect.
Recommendations:
1.) Customers running IBM Spectrum Scale V5.1.2.0, should apply IBM Spectrum Scale V5.1.2.1 or later .
If you cannot apply the above PTF level, request an efix via APAR IJ35791.
2.) Customers planning to migrate to the 5.1.2 code level and using TCP/IP network should apply V5.1.2.1.
3.) ESS is not impacted by this problem based on its release schedule.
[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"ARM Category":[{"code":"a8m50000000Kzw8AAC","label":"Classifications-\u003ECause-\u003ENetwork"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"5.1.2"}]
Was this topic helpful?
Document Information
Modified date:
30 November 2021
UID
ibm16513196