APAR status
Closed as program error.
Error description
A race condition between disks having errors and recovery groups or log groups resigning could lead to a bug in GNR log vtrack recovery failed to scrub and repair the stale data on the disk and can further lead to data corruption if all good copies of the mirrored data are lost.
Local fix
The immediate bug does not cause problem right away, but it does expose a condition that additional events could lead to a recovery failure. When the recovery failure is detected data has already been corrupted. It will require special steps to allow recovery to complete but in the meantime a certain amount of data will be lost.
Problem summary
A race condition between disks having errors and recovery groups or log groups resigning could lead to a bug in GNR log vtrack recovery failed to scrub and repair the stale data on the disk
Problem conclusion
Benefits of the solution: Fixed the code so that the log vtrack scrubbing operation properly fixes the inconsistent data during recovery. Work Around: None Problem trigger: This is normally exposed by a race condition between a number of disks gradually hitting errors causing recovery groups or log groups to resign. Symptom: This problem could happen to either fast write log or VCD log, and the impact will be failure to complete recovery. The log will indicate: 2020-05-29_16:25:21.693-0700: [E] Recovery failure: at least 3400 sectors of the fast-write log for LG root of RG RG1 could not be recovered. 2020-05-29_16:25:21.693-0700: [E] Beginning to resign log group root in recovery group RG1 due to "recovery failure", caller err 333 when "recovering log group worker" or 2020-04-29_09:04:42.567-0400: [E] Recovery failure: at least 41616 sectors of the VCD log for LG LG001 of RG rg1 could not be recovered. 2020-04-29_09:04:42.567-0400: [E] Beginning to resign log group LG001 in recovery group rg1 due to "recovery failure", caller err 333 when "recovering log group worker" Platforms affected: N/A Functional Area affected: GNR Customer Impact: high Importance Changed Externals:None
Temporary fix
Comments
APAR Information
APAR number
IJ25146
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
505
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-06-01
Closed date
2020-06-01
Last modified date
2020-07-17
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"505","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
18 July 2020