IBM Support

IZ73824: MELLANOX DD CRASH AFTER CALLING MXIB_DB_ALLOC. APPLIES TO AIX 5300-08

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • "hcr_cmd  timeout" in error log, possibly followed by system
    crash, with a call chain similar to the following:
    ■003CBE28rmmap_add_io+000168 (000000000001C007,
    0000000000000080,
    00000003C0400080, 0000000000000001, 0000002500000025)
    ■003E8410io_map_init+0002F0 (??, ??, ??)
    ■003E85B0io_map_init_global@AF45_40+000110 (??, ??, ??)
    ■F1000000A0292ED4mxib_db_alloc+0002F4 (F100068800B30000,
    0000000040000003,
    F00000002FF424C0)
    ■F1000000A0293484mxib_db_create+000284 (F100068800B30000,
    F100068800AB2200,
    0000000040000003, F100068800AA9000)
    ■F1000000A0298718mxib_ctl+000538 (8000001A00000000,
    0000000500000005,
    F100068800AA9000, 0000000040000003, 0000000000000000,
    0000000000000000)
    ■00014D70.hkey_legacy_gate+00004C ()
    ■004DF8ACrdevioctl+0000CC (??, ??, ??, ??, ??, ??)
    ■006805C0spec_ioctl+000080 (??, ??, ??, ??, ??, ??)
    ■004EBA10vnop_ioctl+000050 (??, ??, ??, ??, ??, ??)
    ■00556F3Cvno_ioctl+00009C (??, ??, ??, ??, ??)
    ■005D478Cfp_ioctl+00006C (??, ??, ??, ??)
    ■00014F50.kernel_add_gate_cstack+000030 ()
    ■F1000000A02A9224mxibHcaOpened+0001C4 (F00000002FF42B38)
    ■049FF270IbHcaOpen+000950 (??, ??, ??, ??, ??)
    ■049F9750IcmOpenQp1Stage1+000BB0 (??, ??)
    
    If the system does not crash and a f/w upgrade is attempted
    at this time,  f/w corruption may occur.
    

Local fix

Problem summary

  • "hcr_cmd  timeout" in error log, possibly followed by system
    crash, with a call chain similar to the following:
     003CBE28 rmmap_add_io+000168 (000000000001C007,
    0000000000000080,
    00000003C0400080, 0000000000000001, 0000002500000025)
     003E8410 io_map_init+0002F0 (??, ??, ??)
     003E85B0 io_map_init_global@AF45_40+000110 (??, ??, ??)
     F1000000A0292ED4 mxib_db_alloc+0002F4 (F100068800B30000,
    0000000040000003,
    F00000002FF424C0)
     F1000000A0293484 mxib_db_create+000284 (F100068800B30000,
    F100068800AB2200,
    0000000040000003, F100068800AA9000)
     F1000000A0298718 mxib_ctl+000538 (8000001A00000000,
    0000000500000005,
    F100068800AA9000, 0000000040000003, 0000000000000000,
    0000000000000000)
     00014D70 .hkey_legacy_gate+00004C ()
     004DF8AC rdevioctl+0000CC (??, ??, ??, ??, ??, ??)
     006805C0 spec_ioctl+000080 (??, ??, ??, ??, ??, ??)
     004EBA10 vnop_ioctl+000050 (??, ??, ??, ??, ??, ??)
     00556F3C vno_ioctl+00009C (??, ??, ??, ??, ??)
     005D478C fp_ioctl+00006C (??, ??, ??, ??)
     00014F50 .kernel_add_gate_cstack+000030 ()
     F1000000A02A9224 mxibHcaOpened+0001C4 (F00000002FF42B38)
     049FF270 IbHcaOpen+000950 (??, ??, ??, ??, ??)
     049F9750 IcmOpenQp1Stage1+000BB0 (??, ??)
    
    If the system does not crash and a f/w upgrade is attempted
    at this time,  f/w corruption may occur.
    

Problem conclusion

  • Give sufficient timeout to HCA commands.
    Ensure that start_adapter returns correct error code at all
    times
    

Temporary fix

Comments

  • 5300-08 - use AIX APAR IZ73824
    5300-09 - use AIX APAR IZ73800
    6100-02 - use AIX APAR IZ75428
    

APAR Information

  • APAR number

    IZ73824

  • Reported component name

    AIX 5.3

  • Reported component ID

    5765G0300

  • Reported release

    530

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2010-03-31

  • Closed date

    2010-03-31

  • Last modified date

    2013-04-17

  • APAR is sysrouted FROM one or more of the following:

    IZ73800

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX 5.3

  • Fixed component ID

    5765G0300

Applicable component levels

  • R530 PSY U832239

       UP10/05/17 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11P","label":"APARs - AIX 5.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"530","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
17 April 2013