IJ56679: SIGNAL 11 AT STRIPEGROUPDESC::READSGDESC

APAR status

Closed as program error.

Error description

After restart File system manager to break up a deadlock
during mmadddisk,   some nodes might hit Signal 11 as
below:

2025-05-22_19:26:29.912+0800: [E] Signal 11 at location
0x55A0024E6B0E in process 316197, link reg
0xFFFFFFFFFFFFFFFF.
2025-05-22_19:26:29.912+0800: [I] Freezing overwrite mode
tracing to preserve failure data
2025-05-22_19:26:29.912+0800: [I] rax
0x0000000000000000  rbx    0x000000000000070A
2025-05-22_19:26:29.912+0800: [I] rcx
0x0000000000000031  rdx    0x0000537EC0000000
2025-05-22_19:26:29.912+0800: [I] rsp
0x00007FAA2EF4D0B0  rbp    0x00007FAA2EF4D180
2025-05-22_19:26:29.912+0800: [I] rsi
0x00007FACC40EF450  rdi    0x00007FACC40EF450
2025-05-22_19:26:29.912+0800: [I] r8
0x0000000000000000  r9     0x0000000000000001
2025-05-22_19:26:29.912+0800: [I] r10
0x00000000000B3910  r11    0x0000000000000000
2025-05-22_19:26:29.912+0800: [I] r12
0x00007FACC4047A00  r13    0x000055A0041E74E8
2025-05-22_19:26:29.912+0800: [I] r14
0x0000018035A322B0  r15    0x00007FACC40102F0
2025-05-22_19:26:29.912+0800: [I] rip
0x000055A0024E6B0E  eflags 0x0000000000010246
2025-05-22_19:26:29.912+0800: [I] csgsfs
0x002B000000000033  err    0x0000000000000004
2025-05-22_19:26:29.912+0800: [I] trapno
0x000000000000000E  oldmsk 0x0000000010017807
2025-05-22_19:26:29.912+0800: [I] cr2
0x0000000000000008
2025-05-22_19:26:31.081+0800: [D] Traceback:
2025-05-22_19:26:31.081+0800: [D] #0: 0x000055A0024E6B0E
StripeGroupDesc::readSGDesc(int,
StripeGroupDesc::SeedDiskInfo*, StripeGroup*, unsigned
int, int, int*, clientCmdComm*) [clone .cold.139] + 0x30
at ??:0
2025-05-22_19:26:31.081+0800: [D] #1: 0x000055A002A5CC98
StripeGroupDesc::rereadSGDesc(StripeGroup*, unsigned int,
int, int*, clientCmdComm*) + 0x108 at ??:0
2025-05-22_19:26:31.081+0800: [D] #2: 0x000055A002BB85EC
SGMgrData::sg_mgr_init(clientCmdComm*, unsigned int, int)
+ 0xCAC at ??:0
2025-05-22_19:26:31.081+0800: [D] #3: 0x000055A002BB9FD1
SGMgrData::sg_node_failure_recovery() + 0x411 at ??:0
2025-05-22_19:26:31.081+0800: [D] #4: 0x000055A002B968BC
StripeGroup::handlePhase3Recovery() + 0xAC at ??:0
2025-05-22_19:26:31.081+0800: [D] #5: 0x000055A002B9621F
StripeGroupCfg::exceptionHandlerBody(void*) + 0x14F at
??:0
2025-05-22_19:26:31.081+0800: [D] #6: 0x000055A002577332
Thread::callBody(Thread*) + 0x42 at ??:0
2025-05-22_19:26:31.081+0800: [D] #7: 0x000055A002564330
Thread::callBodyWrapper(Thread*) + 0xA0 at ??:0
2025-05-22_19:26:31.081+0800: [D] #8: 0x00007FB6E49003FB
start_thread + 0xEB at ??:0
2025-05-22_19:26:31.082+0800: [D] #9: 0x00007FB6E412BE83
__GI___clone + 0x43 at ??:0
2025-05-22_19:26:31.099+0800: [N] Starting mmsdrserv:
enter
2025-05-22_19:26:31.100+0800: [N] Starting mmsdrserv:
started child process rc: 0
2025-05-22_19:27:07.571+0800: [E] Signal 6 at location
0x7FB6E41F6A31 in process 316197, link reg
0xFFFFFFFFFFFFFFFF.
2025-05-22_19:27:07.571+0800: [I] Calling user exit
script mmSignalHandler: event signalHandler, Async
command /usr/lpp/mmfs/bin/mmsysmonc.
2025-05-22_19:27:07.579+0800: [N] mmfsd is shutting down.


Reported in: 5.2.1

Local fix

Problem summary

```
mmfsd hits signal 11 on readSGDesc
```

Problem conclusion

This problem is fixed in 6.0.0.1
To see all Spectrum Scale APARs and their respective
Fix solutions refer to page: 
https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale
_apars.html

Benefits of the solution:
Fixed the code so that when the file system manager processes a
more recent SG desc that is read from disk before the data
structure with the new SGDesc get populated, it wouldn't hit
signal 11.

Work Around:
There is not a work around,  scale will crash and restart on
itself.

Problem trigger:
After restart File system manager to break up a deadlock during
mmadddisk, some nodes might hit Signal 11, the problem is that
the file system manager is processing a more recent SG desc that


is read from the disk, before the data structure associated with


the new SG descriptor get populated.

Symptom:
Abend/Crash

Platforms affected:
All platforms

Functional Area affected:
All Scale Users

Customer Impact:
High Importance

Temporary fix

Comments

APAR Information

APAR number
IJ56679
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
522
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2025-11-03
Closed date
2025-12-01
Last modified date
2025-12-01

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP

Applicable component levels

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"522","Line of Business":{"code":"LOB69","label":"Storage TPS"}}]

Document Information

Modified date:
07 December 2025

Tips

IJ56679: SIGNAL 11 AT STRIPEGROUPDESC::READSGDESC

Subscribe to this APAR

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

Document Information

Share your feedback

Need support?