IBM Support

IJ04977: ERROR RELATED TO NODE TIMEOUT DURING SNAPSHOT RESTORE

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

Direct link to fix

 

APAR status

  • Closed as program error.

Error description

  • Environment details: standard cluster Cluster type:  Number of
    nodes: 2 Nodes details : r1r2m1p34 (172.19.67.44) , r1r2m1p35
    (172.19.67.45) root/Ar1cent@2 HMC details (If nodes are not
    public LPARs): 172.19.69.5
    
    PowerHA version:  Note: Specify pre and post migration PowerHA
    versions if issue is related to migration.  Premigration PHA
    720 sp5 postmigration PHA 7.2.1 SP3
    
    AIX version:  oslevel is same 7100-04-03-1642
    
    Preconditions (If any):  Problem Statement:  1) Cluster created
    and is stable for 720 sp5 .  2) Started migration task from 720
    sp5 -> 721 sp3 rolling migration 3) Post migration cluster is
    stable 4) took cluster snapshot of latest post migrated cluster
    721 sp3 5) deleted the cluster succesfully 6) trying to restore
    the snapshot of the earlier 721 sp3 cluster scuessfull
    (snap721) 7) During verify and sync getting error stated below
    
    ERROR: node_timeout must be greater than network_fdt by at
    least 10 seconds. Correct these values and try synchronization
    again.
    
    cldare: Error detected during synchronization.
    
    Steps to reproduce: As above mentioned Problematic behavior:
    verify sync failed Expected behavior: after snapshot
    restoration verify sync should pass References (If applicable):
    Publib infocenter link or old defect details.  % of
    reproducibility:  Logs location or list of logs attached:
    
    (0) root @ r1r2m1p34: /
    
        /usr/sbin/clctrl -tune -L network_fdt NAME DEF MIN MAX UNIT
        SCOPE ENTITY_NAME(UUID) CUR
        ------------------------------------------------------------
    --------------------
        network_fdt 0 0 590000 milliseconds c
        r1r2m1p34_cluster(a7299d1c-1193-11e8-8042-caaa6459140c)
        20000
        ------------------------------------------------------------
    --------------------
    
    (0) root @ r1r2m1p34: /
    
        /usr/sbin/clctrl -tune -L node_timeout NAME DEF MIN MAX
        UNIT SCOPE ENTITY_NAME(UUID) CUR
        ------------------------------------------------------------
        node_timeout 20000 10000 600000 milliseconds c n
        r1r2m1p34_cluster(a7299d1c-1193-11e8-8042-caaa6459140c)
        30000
        ------------------------------------------------------------
    --------------------
    

Local fix

Problem summary

  • Getting below error during pha 721 sp5 snapshot restoration
    ERROR: node_timeout must be greater than network_fdt by at
    least 10 sec This is because of mismatch of odm values before
    and after restoring snapshot before snapshot: node_timeout=20
    network_fdt=0 after snapshot: node_timeout=20 network_fdt=20,
    here both the values are same, which is not meeting the below
    condition node_timeout - network_fdt = 10
    

Problem conclusion

  • The default values of node_timeout is 30ms and network_fdt is
    20. As per the existing code we are setting the network_fdt and
    node_timeout to default values if the odm values are 0. But
    node_timeout value is not 0 thats why it is not setting to the
    default values instead setting it to the existing odm value
    which is 20. To fix this, Make the code changes such that
    node_timeout should set to default value which is 30 when the
    value is 20 instead of 0
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ04977

  • Reported component name

    POWERHA SYSMIR

  • Reported component ID

    5765H3900

  • Reported release

    721

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2018-03-16

  • Closed date

    2018-04-17

  • Last modified date

    2018-04-17

  • APAR is sysrouted FROM one or more of the following:

    IJ04976

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    POWERHA SYSMIR

  • Fixed component ID

    5765H3900

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLM9V","label":"PowerHA SystemMirror Standard Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"721","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSXU4N","label":"PowerHA SystemMirror Enterprise Edition for AIX"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"721","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLM9V","label":"PowerHA SystemMirror Standard Edition for AIX"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"721","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}},{"Business Unit":{"code":"BU008","label":"Security"},"Product":{"code":"SGL4G4","label":"PowerHA"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"721","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
19 October 2021