We recently had production downtime that we tracked down to an unresponsive back end causing a high number of open tcp connections and consequently high memory consumption. The whole appliance group rebooted because one web service consumed too much memory.
Is it possible to configure the appliance to rather reject service calls instead of rebooting in such a case, ideally by assigning a memory quota to each web service (or at least a domain)? Or is the only way to monitor memory consumption (e.g. via SNMP) and have someone manually spot the "bad" web service and stop it?
This topic has been locked.
2 replies Latest Post - 2012-11-12T14:10:47Z by SystemAdmin
Pinned topic how to prevent reboot due to a unresponsive backend
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2012-11-12T14:10:47Z at 2012-11-12T14:10:47Z by SystemAdmin
msiebler 2700005RPQ136 PostsACCEPTED ANSWER
Re: how to prevent reboot due to a unresponsive backend2012-11-07T14:56:48Z in response to SystemAdminThat is a common question that we see and we have a few best practice in this area.
There is no one correct answer for all topologies; but there are some patterns that can help.
First; you cannot do exactly as you wish based on memory quotas.
Some common things to look at are:
the timeouts for backends & other calls; by default these are typically very high. If the timeouts are lower then calls get cleaned up faster leading to lower memory.
Also; you probably want to add SLM and/or message monitors to help limit the number of calls to a backend.
SystemAdmin 110000D4XK6772 PostsACCEPTED ANSWER
Re: how to prevent reboot due to a unresponsive backend2012-11-12T14:10:47Z in response to SystemAdminThanks for your answer. We've tried the first point and it helped a lot. The seconds is probably not feasible in our case because even the normal number of calls at peak times seems to be high enough to bring the machine down if backends respond with timeouts.
But thanks anyway, the first point has helped.