IBM Support

IV92880: SYSTEM HANGS IN DMAPI CODE WHEN RECYCLING INODES

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A deadlock will occur between two threads with similar
    stacks where you see
    a vGet early in stack and a vPut later:
    example of two stacks:
    stack 1:
    (0)> f 326
    pvthread+014600 STACK:
    [0057B05C]slock+00051C (00000000000E4878,
    8000000000001032 [??])
    [00009558].simple_lock+000058 ()
    [00289FFC]vPut+0001DC (??)
    [0028ABA4]iUnbind+000404 (??, ??, ??)
    [0028F58C]iRecycle+0001AC (??, ??, ??)
    [0028B0EC]iAlloc+0003AC (??, ??)
    [0028FCB0]iGet+0001D0 (??, ??, ??, ??, ??)
    [002E4AA4]eaLookup+0000E4 (??, ??, ??, ??, ??)
    [002F8468]dmGetEA+000068 (??, ??, ??, ??, ??)
    [002F8AD8]dmGetAttr+000098 (??, ??)
    [002FADB8]kvAlloc+0000F8 (??, ??)
    [00292298]vGet@AF90_21+000138 (??, ??, ??, ??)
    [00290294]iGet+0007B4 (??, ??, ??, ??, ??)
    .....
    and stack 2:
    (0)> f 2612
    pvthread+0A3400 STACK:
    [0057B05C]slock+00051C (00000000000E4878,
    8000000000001032 [??])
    [00009558].simple_lock+000058 ()
    [00289FFC]vPut+0001DC (??)
    [0028ABA4]iUnbind+000404 (??, ??, ??)
    [0028F58C]iRecycle+0001AC (??, ??, ??)
    [0028B0EC]iAlloc+0003AC (??, ??)
    [0028FCB0]iGet+0001D0 (??, ??, ??, ??, ??)
    [002E4BDC]eaLookup+00021C (??, ??, ??, ??, ??)
    [002F8468]dmGetEA+000068 (??, ??, ??, ??, ??)
    [002F8AD8]dmGetAttr+000098 (??, ??)
    [002FADB8]kvAlloc+0000F8 (??, ??)
    [00292298]vGet@AF90_21+000138 (??, ??, ??, ??)
    .....
    =================================================
    The vGet of each thread obtains a lock on one of two
    filesystems.  Each vPut trys to obtain a lock on opposite
    filesystem
    resulting in a deadlock.
    Problem is only seen in dmapi event handling J2
    filesystems
    

Local fix

  • You can decrease the possibility of this happening
    by increasing the number of inodes. This can be done
    by increasing the j2_inodeCacheSize ioo tunable or
    increasing the amount of memory in the lpar since
    the amount of inodes used is a percent of memory.
    

Problem summary

  • A deadlock will occur between two threads with similar stacks
    where you see a vGet early in stack and a vPut later:
    stack 1:
    (0)> f 326
    pvthread+014600 STACK:
     0057B05C slock+00051C (00000000000E4878,
    8000000000001032  ?? )
     00009558 .simple_lock+000058 ()
     00289FFC vPut+0001DC (??)
     0028ABA4 iUnbind+000404 (??, ??, ??)
     0028F58C iRecycle+0001AC (??, ??, ??)
     0028B0EC iAlloc+0003AC (??, ??)
     0028FCB0 iGet+0001D0 (??, ??, ??, ??, ??)
     002E4AA4 eaLookup+0000E4 (??, ??, ??, ??, ??)
     002F8468 dmGetEA+000068 (??, ??, ??, ??, ??)
     002F8AD8 dmGetAttr+000098 (??, ??)
     002FADB8 kvAlloc+0000F8 (??, ??)
     00292298 vGet@AF90_21+000138 (??, ??, ??, ??)
     00290294 iGet+0007B4 (??, ??, ??, ??, ??)
    .....
    and stack 2:
    (0)> f 2612
    pvthread+0A3400 STACK:
     0057B05C slock+00051C (00000000000E4878,
    8000000000001032  ?? )
     00009558 .simple_lock+000058 ()
     00289FFC vPut+0001DC (??)
     0028ABA4 iUnbind+000404 (??, ??, ??)
     0028F58C iRecycle+0001AC (??, ??, ??)
     0028B0EC iAlloc+0003AC (??, ??)
     0028FCB0 iGet+0001D0 (??, ??, ??, ??, ??)
     002E4BDC eaLookup+00021C (??, ??, ??, ??, ??)
     002F8468 dmGetEA+000068 (??, ??, ??, ??, ??)
     002F8AD8 dmGetAttr+000098 (??, ??)
     002FADB8 kvAlloc+0000F8 (??, ??)
     00292298 vGet@AF90_21+000138 (??, ??, ??, ??)
    .....
    

Problem conclusion

  • We changed the serialization points in two places to eliminate
    the possible deadlock. This can only be seen in dmapi event
    handling J2 filesytems.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV92880

  • Reported component name

    AIX V7.1

  • Reported component ID

    5765H4000

  • Reported release

    710

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-01-27

  • Closed date

    2017-01-27

  • Last modified date

    2017-10-13

  • APAR is sysrouted FROM one or more of the following:

    IV69815

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX V7.1

  • Fixed component ID

    5765H4000

Applicable component levels

  • R710 PSY U870555

       UP17/10/13 I 1000

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
20 April 2022