More details about Linux memory size and throughput

Edit online

To understand better what happens in the various scenarios we compare the guest memory behavior for the rule with the largest guests (configuration 1) with the rule with the smallest guest (configuration 4).

Linux™ memory size for individual guests and three different configuration files

Figure 1 shows the Linux memory size for the individual guests relative to the manually-sized configuration:

Graph showing Linux memory size for the individual guests and three different configuration files. — Figure 1. Relative Linux memory size for the individual guests for two different cpuplugd configuration files (manual sized = 100%)

Observation

The automated sizing of memory of the web server systems shows the worst results. None of our configurations allocates less than twice the manually sized memory. The database systems behave similar, but the rule with the direct scans allocates only around 50% too much memory. The sizing for the WebSphere® systems is very good, especially when direct scans are used. Using this rule, the combos are even smaller than the manually sized systems.

Conclusion

The reason for oversizing the web servers by this much is most certainly caused by the small size of these systems when sized manually (342 MiB). The same argument applies to the database servers. Applying stronger conditions, especially with regard to the lower limits, will probably result in a better sizing. For a pure WebSphere system the direct scan rule is a very good fit.

Throughput reached for the individual components and three different configuration files

The fact that some of the combos are smaller than when manually sized, immediately raises the question whether the systems are too small. This should be evident from inspecting the reached throughput in Figure 2:

Observation

For both triplets, applying the direct scan rule leads to similar or even higher throughput than applying the manually sized configuration. Throughput for Combo2 is comparable to the throughput for the manually sized configuration while throughput for Combo1 is lower, especially when using the direct page scan rule.

Conclusion

The setup is relatively sensitive to memory sizes. The reason for the lower throughput for Combo1 is shown in the log output from cpulogd: the CPU plugging rule using direct scan provides only two CPUs for this system, where in the other scenarios Combo1 is allocated three CPUs. This confirms the impression that it will be difficult to optimize throughput and memory size with the same set of rules. It might be important to mention that the CPU cost spent to drive a certain throughput with these workload is very similar, even with the variations in throughput. That means that there is no overhead related to that.

CMM pool size

Looking at the size of the CMM pool over time shows that the same server types always behave in similar manner, even when the load on the triplets is different. The exception here are the combos, see Figure 3 for an example:

The other interesting topic is the impact of the rules on the memory sizing. Figures Figure 4, Figure 5 and Figure 6 show the cmm pool sizes over time for lnwas1, lnudb1 and lncombo2, for the rules with the largest memory sizes (configuration 1), and for the rules with the lowest memory sizes (configuration 4).

Graph of CMM Pool size over time for a Database System — Figure 4. CMM Pool size over time with WebSphere Application Server 1

Graph showing CMM Pool size over time for a Combo System (IHS, WAS, DB2) — Figure 5. CMM Pool size over time for a database system

Figure 6. CMM Pool size over time for a Combo System (IHS, WebSphere Application Server, DB2®)

Observation

In all scenarios we observe how first the CMM pool increases (meaning the guest systems yield memory to the Hypervisor) when the cpuplugd daemon is started. After 30 seconds the middleware servers are started, and after 60 seconds the workload is started and left to run for 10 minutes. The size of the CMM pool of all systems is relatively constant during the workload phase.

The largest difference between configuration 4 and configuration 1 results for the combos, the next largest difference results for the database systems and the smallest difference results for the WebSphere systems.

Conclusion

All configurations are very stable and react quickly to changing requirements. There are only small overswings where the pool was reduced by a large amount and then increased again. The configurations using direct page scans react slower and with smaller pool decreases than the configurations which also include kswapd page scans. In the light of these results the latter configuration was not further evaluated.