• 2 replies
  • Latest Post - ‏2012-11-12T14:10:47Z by SystemAdmin
6772 Posts

Pinned topic how to prevent reboot due to a unresponsive backend

‏2012-11-07T14:06:41Z |
We recently had production downtime that we tracked down to an unresponsive back end causing a high number of open tcp connections and consequently high memory consumption. The whole appliance group rebooted because one web service consumed too much memory.
Is it possible to configure the appliance to rather reject service calls instead of rebooting in such a case, ideally by assigning a memory quota to each web service (or at least a domain)? Or is the only way to monitor memory consumption (e.g. via SNMP) and have someone manually spot the "bad" web service and stop it?

Kind Regards
Updated on 2012-11-12T14:10:47Z at 2012-11-12T14:10:47Z by SystemAdmin
  • msiebler
    142 Posts

    Re: how to prevent reboot due to a unresponsive backend

    That is a common question that we see and we have a few best practice in this area.
    There is no one correct answer for all topologies; but there are some patterns that can help.
    First; you cannot do exactly as you wish based on memory quotas.

    Some common things to look at are:

    the timeouts for backends & other calls; by default these are typically very high. If the timeouts are lower then calls get cleaned up faster leading to lower memory.
    Also; you probably want to add SLM and/or message monitors to help limit the number of calls to a backend.
  • SystemAdmin
    6772 Posts

    Re: how to prevent reboot due to a unresponsive backend

    Thanks for your answer. We've tried the first point and it helped a lot. The seconds is probably not feasible in our case because even the normal number of calls at peak times seems to be high enough to bring the machine down if backends respond with timeouts.
    But thanks anyway, the first point has helped.