IBM Support

PI49225: RUNTIME GOES IN DENIAL OF SERVICE MODE WHEN SERVER FARM HEARTBEAT FAILS WITH TIMEOUT

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • In a server farm running MobileFirst, there is a heartbeat
    between the MobileFirst Runtime and the Admin Services.  The
    heartbeat is supposed to ensure that the Admin Service knows
    whether the runtime is still alive and running. The heartbeat
    mechanism is implemented through a JMX call.
    If the server is very busy, this JMX call can time out. In this
    case, the runtime is immediately set into "require
    synchronization" mode, which causes all other requests to the
    runtime to be responded with a 503 Denial of Service response.
    The runtime cannot exit this mode since no code triggers a
    re-synchronization, hence it stays in this mode until the server
    is restarted.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of MobileFirst Server                                  *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * In a server farm, after a while and some potential network   *
    * instability or high load, the MobileFirst Server starts      *
    * responding with the HTTP code 503 "Denial of service". Even  *
    * when the network is back or the load decreased, the Server   *
    * remains in this mode and the user must restart the server.   *
    * The expected behavior is that the MobileFirst Server is more *
    * forgiving against network/load fluctuations and does not     *
    * enter the 503 "Denial of service" mode on the first          *
    * temporary failure, but only when there are unrecoverable     *
    * failures over a longer time.                                 *
    *                                                              *
    * The problem is related to the heartbeat mechanism between    *
    * Administration Services and MobileFirst Runtime. If the      *
    * heartbeat fails the first time due to a timeout, the server  *
    * enters the 503 "Denial of service" mode immediately.         *
    * Instead, it should retry and enter the mode only when        *
    * multiple heartbeats fail over long time.                     *
    *                                                              *
    * Only server farm topologies are affected. Websphere Network  *
    * Deployment or any Standalone topology is not affected by     *
    * this problem.                                                *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * -                                                            *
    ****************************************************************
    

Problem conclusion

  • The problem was solved by changing the code so that the
    heartbeat enters the 503 "Denial of service" mode only if many
    heartbeats fail over a longer time.
    
    Minimally, the worklight-jee-library.jar must be reinstalled to
    install the fix.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI49225

  • Reported component name

    MFPF/WORKLIGHT

  • Reported component ID

    5725I4301

  • Reported release

    700

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2015-09-22

  • Closed date

    2015-09-23

  • Last modified date

    2015-09-23

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MFPF/WORKLIGHT

  • Fixed component ID

    5725I4301

Applicable component levels

  • R700 PSY

       UP

  • R710 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSZH4A","label":"IBM Worklight"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"700","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
17 October 2021