IBM Support

IV48269: SYSTEM CRASHED DUE CAA DMS TIMEOUT AND UNSYNCED CPUS

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When CPUs that are out of sync, there is a potential for
    DMS
    timeout. The reason being, the heartbeat time out value
    is
    being interpreted as a huge positive number instead of
    its
    actual value.
    
    The stack trace will be something similar to the
    following:
    
    CRASH INFORMATION:
    CPU 5 CSA F1000815B0170D00 at time of crash, error code
    for LEDs:
    70000000
    pvthread+001D00 STACK:
    [00102C44].panic_trap+000000 ()
    [00014E48].kernel_add_gate+000048 ()
    [F1000000C03EEE34]dms_trap+000014 ()
    [00014E48].kernel_add_gate+000048 ()
    [000AAFCC]itmr_timeout+00004C ()
    [F1000000C03EEA74]dms_timeout+000174 (??, ??, ??)
    [F1000000C03EEEB4]dms_handler+000034 (??)
    [00014D70].hkey_legacy_gate+00004C ()
    [031E47E4]skey_kmode+000000 ()
    [001F1F4C].dec_itmr_check_fixup+000014 ()
    ____ Exception (F00000002FF47600) ____
    
    In kdb you'll also see that the last heartbeat was
    greater
    than the timebase for the current cpu:
    
    (6)> dd F1000000C04445B0+B50
    F1000000C0445100: F1000000C0452900 F1000000C0442398
    .....E)......D#.
    
    (6)> dd F1000000C0452900
    F1000000C0452900: 0000000000000002 0017BD25B8A829BF
    ...........%..).
    
    (6)> ppda | grep dec_flih
    dec_flih_ref_tb.... 0017BD25B8A278CB
    

Local fix

Problem summary

  • dms_timeout panic:
    pvthread+001D00 STACK:
     00102C44 .panic_trap+000000 ()
     00014E48 .kernel_add_gate+000048 ()
     F1000000C03EEE34 dms_trap+000014 ()
     00014E48 .kernel_add_gate+000048 ()
     000AAFCC itmr_timeout+00004C ()
     F1000000C03EEA74 dms_timeout+000174 (??, ??, ??)
     F1000000C03EEEB4 dms_handler+000034 (??)
     00014D70 .hkey_legacy_gate+00004C ()
     031E47E4 skey_kmode+000000 ()
     001F1F4C .dec_itmr_check_fixup+000014 ()
    

Problem conclusion

  • An additional check was added to dms_timeout() to confirm the
    "current time" is later than the previous heartbeat.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV48269

  • Reported component name

    AIX 610 STD EDI

  • Reported component ID

    5765G6200

  • Reported release

    610

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2013-09-11

  • Closed date

    2013-09-11

  • Last modified date

    2014-02-17

  • APAR is sysrouted FROM one or more of the following:

    IV42736

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX 610 STD EDI

  • Fixed component ID

    5765G6200

Applicable component levels

  • R610 PSY U859947

       UP14/02/17 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLLZP","label":"AIX Standard Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSAUMY","label":"IBM AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
17 February 2014