IBM Support

IJ41553: LINUX OS CRASH CAUSED BY MMCCR AND TRACEDEV MODULE

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • The kernel panics around the time of stopping or starting
    the Spectrum Scale daemon. The back trace from the crash
    looks like this:
    
    [12334.372189] CPU: 28 PID: 244084 Comm: mmccr Kdump:
    loaded Tainted: G??????? W? OE? ------------??
    3.10.0-1160.66.1.el7.x86_64 #1
    [12334.385060] Hardware name: Lenovo ThinkSystem SR650 -
    XXXX
    [12334.396680] task: ffff9cb339e54200 ti:
    ffff9cb322958000 task.ti: ffff9cb322958000
    [12334.405012] RIP: 0010:[<ffffffffa404ba86>]?
    [<ffffffffa404ba86>] filp_close+0x26/0x90
    [12334.413747] RSP: 0018:ffff9cb32295bef8? EFLAGS:
    00010282
    [12334.419660] RAX: ffffffffc08470e0 RBX:
    ffff9cb31a95e600 RCX: ffff9cb31a95e600
    [12334.427606] RDX: 0000000000000000 RSI:
    ffff9cb37b4a0500 RDI: ffff9cb31a95e600
    [12334.435550] RBP: ffff9cb32295bf10 R08:
    0000000000000000 R09: 0000561807f21fc0
    [12334.443495] R10: 0000000000000022 R11:
    0000000000000246 R12: ffff9cb37b4a0500
    [12334.451438] R13: ffff9cb37b4a0540 R14:
    0000000000000000 R15: 0000000000000000
    [12334.459385] FS:? 00007fdb88800780(0000)
    GS:ffff9cb39d100000(0000) knlGS:0000000000000000
    [12334.468394] CS:? 0010 DS: 0000 ES: 0000 CR0:
    0000000080050033
    [12334.474790] CR2: ffffffffc0847140 CR3:
    0000002f65680000 CR4: 00000000005607e0
    [12334.482734] DR0: 0000000000000000 DR1:
    0000000000000000 DR2: 0000000000000000
    [12334.490678] DR3: 0000000000000000 DR6:
    00000000fffe0ff0 DR7: 0000000000000400
    [12334.498623] PKRU: 55555554
    [12334.501633] Call Trace:
    [12334.504359]? [<ffffffffa406f8ac>] __close_fd+0x8c/0xb0
    [12334.510082]? [<ffffffffa404d513>] SyS_close+0x23/0x50
    [12334.515707]? [<ffffffffa4599f92>]
    system_call_fastpath+0x25/0x2a
    [12334.522386] Code: ff 0f 1f 40 00 0f 1f 44 00 00 55 48
    89 e5 41 55 41 54 53 48 8b 47 38 48 89 fb 48 85 c0 74 5b
    48 8b 47 28 49 89 f4 48 85 c0 74 4a <48> 8b 40 60 45 31
    ed 48 85 c0 74 08 e8 a9 c2 14 00 41 89 c5 f6
    [12334.544214] RIP? [<ffffffffa404ba86>]
    filp_close+0x26/0x90
    [12334.550332]? RSP <ffff9cb32295bef8>
    [12334.554213] CR2: ffffffffc0847140
    
    Reported in:
    Spectrum Scale 5.1.3.1
    RHEL 7.9
    kernel 3.10.0-1160.66.1.el7.x86_64
    

Local fix

Problem summary

  • Part of GPFS are kernel modules that are loaded upon
    startup and used by other components. Usage counters
    were not used correctly in the tracedev module,
    which can lead to the module being unloaded while
    still in use, resulting in a kernel crash. One case
    where this is possible is running the "mmvdisk server configure"
    and "mmvdisk server unconfigure"
    commands with the --recycle option.
    

Problem conclusion

  • This problem is fixed in 5.1.5 PTF 1
    To see all Spectrum Scale APARs and
    their respective fix solutions refer to page
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_
    apars.html
    
    
    Benefits of the solution:
    Avoid the kernel crash, by handling the usage
    counters of the tracedev module correctly.
    
    Work Around:
    Avoid stopping GPFS immediately after starting up.
    Problem trigger:
    Run GPFS shutdown and startup. This is a rare problem,
    so running this or the mentioned "mmvdisk server" command
    in a loop will be necessary to trigger the problem.
    Symptom: Abend/Crash
    Platforms affected: ALL Linux OS environments
    Functional Area affected: All Scale Users
    Customer Impact: Suggested
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ41553

  • Reported component name

    SPEC SCALE DME

  • Reported component ID

    5737F34AP

  • Reported release

    513

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-08-04

  • Closed date

    2022-08-23

  • Last modified date

    2022-08-23

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE DME

  • Fixed component ID

    5737F34AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"513","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
23 August 2022