Linux Memory Overcommitment and the OOM Mechanism
Often the Linux kernel is by default configured to allow memory over-commitment. Memory over-commitment means that the kernel lets applications allocate more memory than there is available. Here available memory not only includes physically existing RAM, but also swap space. The kernel can allow this as long as the applications do not really access all that memory, because the allocation is treated more like a reservation only. As long as the applications do not really use all of this memory, the memory doesn't need to really exist. Only at the time when an application accesses (i.e. writes to) portions of the allocated memory will the kernel make sure that real memory is available. The rationale behind this is that a shortage of memory may be caused only because applications allocate too much memory even though they do not really use it - at least not all of it immediately. By the time one application really wants to use all of its allocated memory the shortage situation may long have been resolved (e.g. meanwhile some other application has already released memory or even exited). Not letting the first application allocate the memory but producing an error instead would have been like a 'false alarm'.
However, this behavior can lead to some rather strange situations. For example on a machine with 32 GB physical RAM installed and a swap space of also 32 GB configured, a single application can allocate 120 GB of memory (in smaller pieces) or even more when the memory over-commitment is set respectively. The application will not receive an error from the allocation system call (like malloc()). Also here the kernels would assume that the application never will use all of this memory. But for any sensible application (that does not allocate excessive memory it will never use), this is a rather deadly situation. For server applications like Informix server or IWA even more so as eventually they will really want to use all the allocated memory - even when this is a large amount that to the kernel could seem like an 'unrealistic' allocation of 'memory that anyway never will be used'. Sensible applications are also made to diligently check the result of any memory allocation (like system call malloc()) and act gracefully in case the memory cannot be allocated due to memory shortage. With overcommitted memory however, an error will be produced only at a later time, when the application tries to use the previously allocated memory. An error at this time usually is rather unexpected (as the memory was believed to be reserved already) and therefore often not handled gracefully. To the application and its error handling such an error looks more like a hardware failure of a RAM chip, and in cases like this it is considered OK to exit ungracefully.
As a remedy for overcommitted memory situations Linux kernels often have the Out-Of-Memory (OOM) mechanism in place. The intent is to avoid a real out-of-memory situation in the first place, even though memory over-commitment is allowed. As the kernel detects that a serious out-of-memory situation is imminent it will kick off the so called OOM Killer. By this mechanism some application processes running on the system are singled out for having caused the out-of-memory situation and get killed by the OOM Killer. This will then avert problems or damage for the remaining application processes. One important criteria for determining a process as to be killed is of course how much memory it has allocated and it is really using. Now, this is a pity for Informix server or IWA processes, as usually they will be the ones having allocated and using most of the memory, or at least more than any other processes on the system. As a result, instead of protecting such server applications, the OOM mechanism often will kill them as first choice.
For an Informix server the OOM mechanism is not so much a problem as long as it is running alone on a machine and its memory utilization (mainly the SHM_TOTAL parameter in the onconfig file) is configured sensibly. As the Informix server will not try to allocate much more memory than configured, the OOM situation can be avoided in the first place. But if other applications on the same machine cause an OOM situation the Informix server easily can be the first target of the OOM Killer. With IWA it is only possible to configure the amount of shared memory. The private memory allocated by the IWA process on an as-needed basis cannot be configured. IWA by itself is monitoring this to avoid the OOM situation, but it is possible that in some situations the kernel comes to a different conclusion and will kick off the OOM Killer before IWA put the breaks on its memory use. Therefore IWA is somewhat more vulnerable to the OOM mechanism than the Informix server.
To conclude the series of configuration tips we will look at the size of /dev/shm and how to configure it in my next blog entry.