IBM Support

IJ56679: SIGNAL 11 AT STRIPEGROUPDESC::READSGDESC

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • After restart File system manager to break up a deadlock
    during mmadddisk,   some nodes might hit Signal 11 as
    below:
    
    2025-05-22_19:26:29.912+0800: [E] Signal 11 at location
    0x55A0024E6B0E in process 316197, link reg
    0xFFFFFFFFFFFFFFFF.
    2025-05-22_19:26:29.912+0800: [I] Freezing overwrite mode
    tracing to preserve failure data
    2025-05-22_19:26:29.912+0800: [I] rax
    0x0000000000000000  rbx    0x000000000000070A
    2025-05-22_19:26:29.912+0800: [I] rcx
    0x0000000000000031  rdx    0x0000537EC0000000
    2025-05-22_19:26:29.912+0800: [I] rsp
    0x00007FAA2EF4D0B0  rbp    0x00007FAA2EF4D180
    2025-05-22_19:26:29.912+0800: [I] rsi
    0x00007FACC40EF450  rdi    0x00007FACC40EF450
    2025-05-22_19:26:29.912+0800: [I] r8
    0x0000000000000000  r9     0x0000000000000001
    2025-05-22_19:26:29.912+0800: [I] r10
    0x00000000000B3910  r11    0x0000000000000000
    2025-05-22_19:26:29.912+0800: [I] r12
    0x00007FACC4047A00  r13    0x000055A0041E74E8
    2025-05-22_19:26:29.912+0800: [I] r14
    0x0000018035A322B0  r15    0x00007FACC40102F0
    2025-05-22_19:26:29.912+0800: [I] rip
    0x000055A0024E6B0E  eflags 0x0000000000010246
    2025-05-22_19:26:29.912+0800: [I] csgsfs
    0x002B000000000033  err    0x0000000000000004
    2025-05-22_19:26:29.912+0800: [I] trapno
    0x000000000000000E  oldmsk 0x0000000010017807
    2025-05-22_19:26:29.912+0800: [I] cr2
    0x0000000000000008
    2025-05-22_19:26:31.081+0800: [D] Traceback:
    2025-05-22_19:26:31.081+0800: [D] #0: 0x000055A0024E6B0E
    StripeGroupDesc::readSGDesc(int,
    StripeGroupDesc::SeedDiskInfo*, StripeGroup*, unsigned
    int, int, int*, clientCmdComm*) [clone .cold.139] + 0x30
    at ??:0
    2025-05-22_19:26:31.081+0800: [D] #1: 0x000055A002A5CC98
    StripeGroupDesc::rereadSGDesc(StripeGroup*, unsigned int,
    int, int*, clientCmdComm*) + 0x108 at ??:0
    2025-05-22_19:26:31.081+0800: [D] #2: 0x000055A002BB85EC
    SGMgrData::sg_mgr_init(clientCmdComm*, unsigned int, int)
    + 0xCAC at ??:0
    2025-05-22_19:26:31.081+0800: [D] #3: 0x000055A002BB9FD1
    SGMgrData::sg_node_failure_recovery() + 0x411 at ??:0
    2025-05-22_19:26:31.081+0800: [D] #4: 0x000055A002B968BC
    StripeGroup::handlePhase3Recovery() + 0xAC at ??:0
    2025-05-22_19:26:31.081+0800: [D] #5: 0x000055A002B9621F
    StripeGroupCfg::exceptionHandlerBody(void*) + 0x14F at
    ??:0
    2025-05-22_19:26:31.081+0800: [D] #6: 0x000055A002577332
    Thread::callBody(Thread*) + 0x42 at ??:0
    2025-05-22_19:26:31.081+0800: [D] #7: 0x000055A002564330
    Thread::callBodyWrapper(Thread*) + 0xA0 at ??:0
    2025-05-22_19:26:31.081+0800: [D] #8: 0x00007FB6E49003FB
    start_thread + 0xEB at ??:0
    2025-05-22_19:26:31.082+0800: [D] #9: 0x00007FB6E412BE83
    __GI___clone + 0x43 at ??:0
    2025-05-22_19:26:31.099+0800: [N] Starting mmsdrserv:
    enter
    2025-05-22_19:26:31.100+0800: [N] Starting mmsdrserv:
    started child process rc: 0
    2025-05-22_19:27:07.571+0800: [E] Signal 6 at location
    0x7FB6E41F6A31 in process 316197, link reg
    0xFFFFFFFFFFFFFFFF.
    2025-05-22_19:27:07.571+0800: [I] Calling user exit
    script mmSignalHandler: event signalHandler, Async
    command /usr/lpp/mmfs/bin/mmsysmonc.
    2025-05-22_19:27:07.579+0800: [N] mmfsd is shutting down.
    
    
    Reported in: 5.2.1
    

Local fix

Problem summary

  • mmfsd hits signal 11 on readSGDesc
    

Problem conclusion

  • This problem is fixed in 6.0.0.1
    To see all Spectrum Scale APARs and their respective
    Fix solutions refer to page: 
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale
    _apars.html
    
    Benefits of the solution:
    Fixed the code so that when the file system manager processes a
    more recent SG desc that is read from disk before the data
    structure with the new SGDesc get populated, it wouldn't hit
    signal 11.
    
    Work Around:
    There is not a work around,  scale will crash and restart on
    itself.
    
    Problem trigger:
    After restart File system manager to break up a deadlock during
    mmadddisk, some nodes might hit Signal 11, the problem is that
    the file system manager is processing a more recent SG desc that
    
    
    is read from the disk, before the data structure associated with
    
    
    the new SG descriptor get populated.
    
    Symptom:
    Abend/Crash
    
    Platforms affected:
    All platforms
    
    Functional Area affected:
    All Scale Users
    
    Customer Impact:
    High Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ56679

  • Reported component name

    SPEC SCALE DME

  • Reported component ID

    5737F34AP

  • Reported release

    522

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2025-11-03

  • Closed date

    2025-12-01

  • Last modified date

    2025-12-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE DME

  • Fixed component ID

    5737F34AP

Applicable component levels

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"522","Line of Business":{"code":"LOB69","label":"Storage TPS"}}]

Document Information

Modified date:
07 December 2025