Summary

This paper describes how to pause a running Linux® on IBM® System z® instance and later resume operations. An analysis is done of the amount of time needed for suspend and resume, and the system resources required.

There are two methods available:
  • The Linux suspend and resume mechanism, which hibernates the Linux guest and restarts it
  • The z/VM® CP STOP and BEGIN commands, which stop and restart the virtual CPUs

The Linux suspend mechanism requires SUSE Linux Enterprise Server 11 (SLES 11) or Red Hat Enterprise Linux 6. The CP STOP and BEGIN mechanism is a general z/VM feature.

For smaller guests (768 MB), depending on the amount of memory used, it takes 10 - 30 seconds to suspend the guest and approximately 20 seconds to resume the guest. With larger guests the resume time increases, but slower than the guest size increases. This means that in a slightly different environment (with a 5.2 GB guest running six WebSphere® Application Servers), it requires 30 seconds until the first Web site could be delivered when resumed. For comparison, only starting the same six WebSphere Application Servers in that guest serially took approximately 80 seconds. The CP STOP and BEGIN commands take effect immediately.

The test paused a set of guests, previously warmed up to ensure that the memory pages are really used, and moved the workload to a set of standby guests, which where idling before. After a while, the paused guests are restarted and the workload is directed again to these systems. The z/VM is sized so that activating the idling guests should create a memory shortage and force paging. The expected behavior was that the paused guests become dormant and their pages are moved to the paging space.

Both mechanisms put the guests in the dormant state, but only with the Linux suspend and resume method is it permanent. With the CP STOP and BEGIN commands, the guests become scheduled (active) more or less frequently depending on the type of servers running (WebSphere ND, relational database). The reason is that this mechanism only stops the CPUs but leaves the virtual interfaces such as network or FICON® adapters active. In this case, when an external event cause an interrupt, z/VM reacts on behalf of that guest.

Both mechanisms operate that in the case of memory pressure the pages from the paused guests are preferred to be moved to the paging space, which makes both mechanisms suitable features to run with higher levels of memory overcommitment. The z/VM CP STOP and BEGIN mechanism is available in all z/VM versions and takes effect immediately, but it keeps everything in main memory. In case the guest was logged off or there is an IPL of the LPAR, the memory content for the guest is lost. The Linux suspend and resume mechanism is a controlled shutdown of the interfaces, and it writes the memory image to disk. This feature makes the state of the guest robust as a really stopped system, but provides a very fast restore.

Pausing production systems is not recommended, but this technique is very suitable for test or development systems.