IBM Support

IV88438: CRASH IN JFS2 SNAPSHOT CODE IN SECOWIOWAIT() APPLIES TO AIX 7100-03

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A rare race condition can cause a crash with stack
    similar to:
    (4)> f
    pvthread+042700 STACK:
    [00308324]seCOWIOWait+000404 (F1000A003AE20000,
    F1000A0EE01CC090 [??])
    [0030DCCC]seCOW+000B4C (??, ??)
    [0030CC68]seCOD+0000A8 (??, ??, ??, ??)
    [0032C03C]txFreeMap+00029C (??, ??, ??, ??)
    [0035EFFC]xtFreeMap+00021C (??, ??, ??)
    [0032A420]txUpdateMap+000240 (??, ??)
    [0032D4D0]txCommit+0006D0 (??, ??, ??, ??)
    [0034EE50]j2_remove+000470 (??, ??, ??, ??)
    [0066DE78]vnop_remove+000438 (??, ??, ??, ??)
    [006C0A60]kunlinkat+000460 (FFFFFFFEFFFFFFFE,
    0000000111CAED11,
       0000000000000000, 0000000000000000)
    [00003938]syscall+000230 ()
    [kdb_get_virtual_memory] no real storage @
    FFFFFFFFFFFB840
    [100047098]0000000100047098 ()
    [kdb_read_mem] no real storage @ FFFFFFFFFFF8D60
    
    ==============================
    with crash occurring in seCOWIOWait function.  A second
    thread
    will also be seen named snapshot where the snapshot
    command
    is being run to eithor delete a snapshot or create a new
    snapshot.
    
    The snapshot command deletes a snapshot resource needed
    by
    the thread that crashes.
    

Local fix

Problem summary

  • A rare race condition can cause a crash with stack
    similar to:
    
    (4)> f
    pvthread+042700 STACK:
     00308324 seCOWIOWait+000404 (F1000A003AE20000,
    F1000A0EE01CC090  ?? )
     0030DCCC seCOW+000B4C (??, ??)
     0030CC68 seCOD+0000A8 (??, ??, ??, ??)
     0032C03C txFreeMap+00029C (??, ??, ??, ??)
     0035EFFC xtFreeMap+00021C (??, ??, ??)
     0032A420 txUpdateMap+000240 (??, ??)
     0032D4D0 txCommit+0006D0 (??, ??, ??, ??)
     0034EE50 j2_remove+000470 (??, ??, ??, ??)
     0066DE78 vnop_remove+000438 (??, ??, ??, ??)
     006C0A60 kunlinkat+000460 (FFFFFFFEFFFFFFFE,
    0000000111CAED11,
       0000000000000000, 0000000000000000)
     00003938 syscall+000230 ()
     kdb_get_virtual_memory  no real storage @
    FFFFFFFFFFFB840
     100047098 0000000100047098 ()
     kdb_read_mem  no real storage @ FFFFFFFFFFF8D60
    
    ==============================
    with crash occurring in seCOWIOWait function.  A second
    thread will also be seen named snapshot where the snapshot
    command is being run to eithor delete a snapshot or create
    a new snapshot.
    
    The snapshot command deletes a snapshot resource needed
    by the thread that crashes.
    

Problem conclusion

  • Added the check to avoid sync issue between snapshot delete
    command and file remove which will avoid system crash.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV88438

  • Reported component name

    AIX V7.1

  • Reported component ID

    5765H4000

  • Reported release

    710

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-08-19

  • Closed date

    2016-08-19

  • Last modified date

    2017-01-20

  • APAR is sysrouted FROM one or more of the following:

    IV79798

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX V7.1

  • Fixed component ID

    5765H4000

Applicable component levels

  • R710 PSY U872346

       UP17/01/19 I 1000

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
20 April 2022