IBM Support

IJ08529: BALLOT COMPARISON BUG IN CCR CAUSING LONG RUNNING CLUSTER MANAGE

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Ballot comparison bug in CCR causing long running cluster
    manager elections
    
    Reported In:
    Spectrum Scale 4.2.3
    
    Known Impact:
    If there are concurrent synods are running and/or the CCR
    is under heavy load/stress, the cluster manager election
    may take long time.
    
    Verification steps:
    After "Running election", though there are enough quorum
    to form a cluster but it takes very long time or never to
    elect a new cluster manager.
    

Local fix

Problem summary

  • Long running cluster manager elections caused by concurrent
    running CCR synods on different quorum nodes, when the quorum
    nodes (CCR) are under heavy load.
    

Problem conclusion

  • Prioritizing concurrent running CCR synods on different quorum
    nodes by introducing a ballot counter increment. Nodes with
    lower node Ids will be preferred over nodes with higher node
    Ids, because the lower numbered nodes may start a new cluster
    manager election. In addition replaced the static delay logic
    at the beginning of a synod by talking directly to the node
    started.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ08529

  • Reported component name

    SPECTRUM SCALE

  • Reported component ID

    5725Q01AP

  • Reported release

    423

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-08-16

  • Closed date

    2018-08-16

  • Last modified date

    2019-06-28

  • APAR is sysrouted FROM one or more of the following:

    IJ07733

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPECTRUM SCALE

  • Fixed component ID

    5725Q01AP

Applicable component levels

  • R423 PSY U885025

       19/06/28 I 1000

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"423","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSFKCN","label":"General Parallel File System"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"423","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
28 June 2019