APAR status
Closed as program error.
Error description
current design of GNR, RG will resign due to vdisk with 8+2p failing fault tolerance whenever RG is recovered, and after enough threshold of this pattern reached, I/O will be returned as error for any disks retrying the I/O.
Local fix
Problem summary
In mirrored disk environment, I/O is expected to continue with the surviving disk if a disk experiences a problem. In the case of a disk being created with a recovery group vdisk, and the recovery group is in a state that it continues to resign due to some vdisk's fault tolerance exceeded after a successful recovery, a race condition exists which could cause the logic of checking this state to be skipped. As a result of this, I/O will continue to be retried to the problem disk instead of moving on to the surviving disks.
Problem conclusion
Benefits of the solution: With the fix, I/O will continue with the surviving disks instead of hanging with the problem disk. Work around: None Problem trigger: A race condition in the logic that determines if a recovery group is repetitively experiencing a resign after a successful recovery due to a fault tolerance being reached. Symptom: I/O hang due to long waiters 'waiting for stateful NSD server error takeover (2)' Platforms affected: N/A Functional Area affected: GNR Customer Impact: Suggested Changed Externals: None
Temporary fix
Comments
APAR Information
APAR number
IJ29555
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
510
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-12-02
Closed date
2020-12-14
Last modified date
2020-12-14
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"510","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
15 December 2020