Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
2 replies Latest Post - ‏2012-11-12T14:10:47Z by SystemAdmin
SystemAdmin
SystemAdmin
6772 Posts
ACCEPTED ANSWER

Pinned topic how to prevent reboot due to a unresponsive backend

‏2012-11-07T14:06:41Z |
Hi,
We recently had production downtime that we tracked down to an unresponsive back end causing a high number of open tcp connections and consequently high memory consumption. The whole appliance group rebooted because one web service consumed too much memory.
Is it possible to configure the appliance to rather reject service calls instead of rebooting in such a case, ideally by assigning a memory quota to each web service (or at least a domain)? Or is the only way to monitor memory consumption (e.g. via SNMP) and have someone manually spot the "bad" web service and stop it?

Kind Regards
Robert
Updated on 2012-11-12T14:10:47Z at 2012-11-12T14:10:47Z by SystemAdmin
  • msiebler
    msiebler
    140 Posts
    ACCEPTED ANSWER

    Re: how to prevent reboot due to a unresponsive backend

    ‏2012-11-07T14:56:48Z  in response to SystemAdmin
    That is a common question that we see and we have a few best practice in this area.
    There is no one correct answer for all topologies; but there are some patterns that can help.
    First; you cannot do exactly as you wish based on memory quotas.

    Some common things to look at are:

    the timeouts for backends & other calls; by default these are typically very high. If the timeouts are lower then calls get cleaned up faster leading to lower memory.
    Also; you probably want to add SLM and/or message monitors to help limit the number of calls to a backend.
  • SystemAdmin
    SystemAdmin
    6772 Posts
    ACCEPTED ANSWER

    Re: how to prevent reboot due to a unresponsive backend

    ‏2012-11-12T14:10:47Z  in response to SystemAdmin
    Thanks for your answer. We've tried the first point and it helped a lot. The seconds is probably not feasible in our case because even the normal number of calls at peak times seems to be high enough to bring the machine down if backends respond with timeouts.
    But thanks anyway, the first point has helped.