APAR status
Closed as program error.
Error description
After restart File system manager to break up a deadlock during mmadddisk, some nodes might hit Signal 11 as below: 2025-05-22_19:26:29.912+0800: [E] Signal 11 at location 0x55A0024E6B0E in process 316197, link reg 0xFFFFFFFFFFFFFFFF. 2025-05-22_19:26:29.912+0800: [I] Freezing overwrite mode tracing to preserve failure data 2025-05-22_19:26:29.912+0800: [I] rax 0x0000000000000000 rbx 0x000000000000070A 2025-05-22_19:26:29.912+0800: [I] rcx 0x0000000000000031 rdx 0x0000537EC0000000 2025-05-22_19:26:29.912+0800: [I] rsp 0x00007FAA2EF4D0B0 rbp 0x00007FAA2EF4D180 2025-05-22_19:26:29.912+0800: [I] rsi 0x00007FACC40EF450 rdi 0x00007FACC40EF450 2025-05-22_19:26:29.912+0800: [I] r8 0x0000000000000000 r9 0x0000000000000001 2025-05-22_19:26:29.912+0800: [I] r10 0x00000000000B3910 r11 0x0000000000000000 2025-05-22_19:26:29.912+0800: [I] r12 0x00007FACC4047A00 r13 0x000055A0041E74E8 2025-05-22_19:26:29.912+0800: [I] r14 0x0000018035A322B0 r15 0x00007FACC40102F0 2025-05-22_19:26:29.912+0800: [I] rip 0x000055A0024E6B0E eflags 0x0000000000010246 2025-05-22_19:26:29.912+0800: [I] csgsfs 0x002B000000000033 err 0x0000000000000004 2025-05-22_19:26:29.912+0800: [I] trapno 0x000000000000000E oldmsk 0x0000000010017807 2025-05-22_19:26:29.912+0800: [I] cr2 0x0000000000000008 2025-05-22_19:26:31.081+0800: [D] Traceback: 2025-05-22_19:26:31.081+0800: [D] #0: 0x000055A0024E6B0E StripeGroupDesc::readSGDesc(int, StripeGroupDesc::SeedDiskInfo*, StripeGroup*, unsigned int, int, int*, clientCmdComm*) [clone .cold.139] + 0x30 at ??:0 2025-05-22_19:26:31.081+0800: [D] #1: 0x000055A002A5CC98 StripeGroupDesc::rereadSGDesc(StripeGroup*, unsigned int, int, int*, clientCmdComm*) + 0x108 at ??:0 2025-05-22_19:26:31.081+0800: [D] #2: 0x000055A002BB85EC SGMgrData::sg_mgr_init(clientCmdComm*, unsigned int, int) + 0xCAC at ??:0 2025-05-22_19:26:31.081+0800: [D] #3: 0x000055A002BB9FD1 SGMgrData::sg_node_failure_recovery() + 0x411 at ??:0 2025-05-22_19:26:31.081+0800: [D] #4: 0x000055A002B968BC StripeGroup::handlePhase3Recovery() + 0xAC at ??:0 2025-05-22_19:26:31.081+0800: [D] #5: 0x000055A002B9621F StripeGroupCfg::exceptionHandlerBody(void*) + 0x14F at ??:0 2025-05-22_19:26:31.081+0800: [D] #6: 0x000055A002577332 Thread::callBody(Thread*) + 0x42 at ??:0 2025-05-22_19:26:31.081+0800: [D] #7: 0x000055A002564330 Thread::callBodyWrapper(Thread*) + 0xA0 at ??:0 2025-05-22_19:26:31.081+0800: [D] #8: 0x00007FB6E49003FB start_thread + 0xEB at ??:0 2025-05-22_19:26:31.082+0800: [D] #9: 0x00007FB6E412BE83 __GI___clone + 0x43 at ??:0 2025-05-22_19:26:31.099+0800: [N] Starting mmsdrserv: enter 2025-05-22_19:26:31.100+0800: [N] Starting mmsdrserv: started child process rc: 0 2025-05-22_19:27:07.571+0800: [E] Signal 6 at location 0x7FB6E41F6A31 in process 316197, link reg 0xFFFFFFFFFFFFFFFF. 2025-05-22_19:27:07.571+0800: [I] Calling user exit script mmSignalHandler: event signalHandler, Async command /usr/lpp/mmfs/bin/mmsysmonc. 2025-05-22_19:27:07.579+0800: [N] mmfsd is shutting down. Reported in: 5.2.1
Local fix
Problem summary
mmfsd hits signal 11 on readSGDesc
Problem conclusion
This problem is fixed in 6.0.0.1 To see all Spectrum Scale APARs and their respective Fix solutions refer to page: https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale _apars.html Benefits of the solution: Fixed the code so that when the file system manager processes a more recent SG desc that is read from disk before the data structure with the new SGDesc get populated, it wouldn't hit signal 11. Work Around: There is not a work around, scale will crash and restart on itself. Problem trigger: After restart File system manager to break up a deadlock during mmadddisk, some nodes might hit Signal 11, the problem is that the file system manager is processing a more recent SG desc that is read from the disk, before the data structure associated with the new SG descriptor get populated. Symptom: Abend/Crash Platforms affected: All platforms Functional Area affected: All Scale Users Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ56679
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
522
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2025-11-03
Closed date
2025-12-01
Last modified date
2025-12-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"522","Line of Business":{"code":"LOB69","label":"Storage TPS"}}]
Document Information
Modified date:
07 December 2025