IBM Support

PI81552: Application state becomes stale at the Liberty collective controller

Fixes are available

17.0.0.4: WebSphere Application Server Liberty 17.0.0.4
18.0.0.1: WebSphere Application Server Liberty 18.0.0.1
18.0.0.2: WebSphere Application Server Liberty 18.0.0.2
18.0.0.3: WebSphere Application Server Liberty 18.0.0.3
18.0.0.4: WebSphere Application Server Liberty 18.0.0.4
19.0.0.1: WebSphere Application Server Liberty 19.0.0.1
19.0.0.2: WebSphere Application Server Liberty 19.0.0.2
19.0.0.3: WebSphere Application Server Liberty 19.0.0.3
19.0.0.4: WebSphere Application Server Liberty 19.0.0.4
19.0.0.5: WebSphere Application Server Liberty 19.0.0.5
19.0.0.6: WebSphere Application Server Liberty 19.0.0.6
19.0.0.7: WebSphere Application Server Liberty 19.0.0.7
19.0.0.8: WebSphere Application Server Liberty 19.0.0.8
19.0.0.9: WebSphere Application Server Liberty 19.0.0.9
19.0.0.10: WebSphere Application Server Liberty 19.0.0.10
19.0.0.11: WebSphere Application Server Liberty 19.0.0.11
19.0.0.12: WebSphere Application Server Liberty 19.0.0.12
20.0.0.1: WebSphere Application Server Liberty 20.0.0.1
20.0.0.2: WebSphere Application Server Liberty 20.0.0.2
20.0.0.3: WebSphere Application Server Liberty 20.0.0.3
20.0.0.4: WebSphere Application Server Liberty 20.0.0.4
20.0.0.5: WebSphere Application Server Liberty 20.0.0.5
20.0.0.6: WebSphere Application Server Liberty 20.0.0.6
20.0.0.7: WebSphere Application Server Liberty 20.0.0.7
20.0.0.8: WebSphere Application Server Liberty 20.0.0.8
20.0.0.9: WebSphere Application Server Liberty 20.0.0.9
20.0.0.10: WebSphere Application Server Liberty 20.0.0.10
20.0.0.11: WebSphere Application Server Liberty 20.0.0.11
20.0.0.12: WebSphere Application Server Liberty 20.0.0.12

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • If a Liberty server becomes overloaded (high CPU,
    thread-pool
    exhaustion, high heap usage) and can not send it's
    heart-beat
    to the collective controller it is marked as "down" by the
    controller (CWWKX9078I: The collective member ... on host
    ...
    with the user directory ... missed 3 heart beats. Attempting
    deregistration).
    
    The server will complete it's long running tasks (or grow
    its
    thread-pool enough) and will begin sending it's heartbeat to
    the controller; the controller reports that the server has
    rejoined the collective (CWWKX9076I: The collective member
    ...
    on host ... with the user directory ... connected to the
    collective controller).
    
    The web server Plug-in continues to send an HTTP 503
    (ws_common: ODR says reject with status code 503) to any
    client
    trying to access a service hosted on the previously
    unavailable
    server.
    
    This problem happens because the "application state becomes
    stale at the Liberty Collective Controller".
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server Liberty Network Deployment -         *
    *                  collectives                                 *
    ****************************************************************
    * PROBLEM DESCRIPTION: Performance issues on member hosts can  *
    *                      cause application to be reported as     *
    *                      stopped at the collective controller.   *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    A collective controller monitors members of the collective via
    heart beats sent by the members at a set interval.  Should a
    member miss sending three consecutive heart beats the controller
    will mark that member and its applications as stopped and issue
    message CWWKX9078I.  When a member experiences temporary
    performance issues it may cause the heart beat task to be
    delayed long enough to trigger the controller to mark the member
    and its applications as stopped.  When the member resumes heart
    beating the member server is reset to the started state but its
    applications are not.
    

Problem conclusion

  • Application state is updated in the collective repository when a
    member resumes heart beating.
    
    The fix for this APAR is currently targeted for inclusion in fix
    pack 17.0.0.3.  Please refer to the Recommended Updates page for
    delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI81552

  • Reported component name

    LIBERTY PROFILE

  • Reported component ID

    5724J0814

  • Reported release

    CD0

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-05-12

  • Closed date

    2017-09-14

  • Last modified date

    2017-09-14

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    LIBERTY PROFILE

  • Fixed component ID

    5724J0814

Applicable component levels

  • RCD0 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"CD0","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
18 October 2021