When over-committing memory in a multi-layered virtual environment, the various tiers of memory virtualization must be considered.
The lower part of Figure 1 shows the first tier of memory virtualization that z/VM establishes for the virtual memory of a virtual system. The first tier virtualizes 1st level memory – the memory configured for the LPAR – to 2nd level memory – the virtual memory provided by z/VM to a virtual system.
The upper part of Figure 1 shows a second tier of memory virtualization that Linux® on System z® establishes for its processes. The second tier virtualizes 2nd level memory – the virtual memory available within the virtual system – to 3rd level memory – the virtual memory provided by Linux to its processes.
Note that the second tier of memory virtualization is established by Linux with respect to 2nd level memory that – while Linux perceives it as real memory – in fact is already virtualized by z/VM as the virtual system's memory. When the word real is subsequently used in this section with respect to memory, it designates memory that is at the bottom of a memory virtualization tier.
In order to focus on the central concepts of memory virtualization, Figure 1 is simplified in many aspects. For example, it does not account for sharing real memory that backs identical virtual memory regions of two or more consumers (such as shared code). Another simplified aspect are elements of z/VM memory virtualization aiming at improving memory virtualization performance in layered environments, such as the use of shadow DAT tables. Furthermore, communication mechanisms that exist between Linux and z/VM for the purpose of coordinating the effective use of memory are not shown.
Dynamic address translation (simplified)
Dynamic address translation (DAT) is a mechanism that allows mapping virtual memory to real memory. The provider of virtual memory is typically a control program such as the z/VM control program, or the Linux kernel. The consumer of virtual memory is a System z processor that is executing some kind of workload and that is running with DAT enabled. Enabling DAT causes the processor's memory accesses (such as fetching instructions and reading from or writing to memory) to take place with respect to virtual memory.
The provider of virtual memory must ensure that each page of virtual memory that is accessed by the consumer is mapped to a page frame in real memory. The mapping from virtual to real memory is maintained in the form of a hierarchy of DAT tables. For simplicity, we limit the discussion here to the lowest element in that hierarchy, which is called a page table.
Page tables contain valid entries for virtual pages that have an assigned corresponding page frame in real memory, and invalid entries for pages that do not have a corresponding page frame assigned. The reason for invalid entries is that the provider – for example when facing a situation where insufficient real memory is available – can decide to store the content of certain consumer pages on persistent store (such as a paging or swap disk), and invalidate the corresponding page table entry. This enables the provider to then use the respective page frame for other purposes, such as backing a page for another consumer.
In case an invalid entry is encountered while a consumer executes a program accessing the corresponding virtual page, program execution is interrupted and the provider is notified. The provider reacts by fetching the content of the affected page from persistent store into a free page frame in real memory, updates the hierarchy of DAT tables respectively and then re-dispatches the consumer to resume execution with the now available data.
Benefits of memory virtualization
Virtual memory provides for economic use of real memory: It is important to realize that empty virtual memory pages do not need to be backed until they are accessed for the first time. This opens a huge potential for saving real memory, because very often consumers allocate virtual memory in large quantities, but with substantial subsets not being used for extended periods of time, or not even used at all. For example, the size of a virtual system might be defined as 4 GiB, but the guest operating system and its applications might use only a fraction of those 4 GiB.
The key here is the subtle distinction between allocating and using memory: Allocating or defining virtual memory establishes structures enabling the consumer to access virtual memory later-on. Only when the consumer actually uses (reads from/writes to) the virtual memory, it must be backed by real memory.
The lazy provisioning of real memory also provides for the possibility to allocate more virtual memory than real memory is available. This is called memory overcommitment and is discussed in Memory provisioning and memory overcommitment.
Virtual memory is pageable: A second opportunity for saving real memory opens when virtual memory is initially accessed at some point in time, but after that is accessed infrequently or not at all. This situation can be detected by the virtual memory provider that in turn may elect moving such low use pages to persistent store, thereby freeing corresponding real memory page frames for other use, such as backing a different virtual address space.
Virtual memory is sharable: If the virtual memory of several consumers exhibits identical regions, these regions can be shared among the consumers. Thus the same real memory can be used for backing the virtual memory of all these consumers, resulting in substantial savings of real memory. For example, when a Linux process forks, almost all virtual memory of the parent process is initially identical with that of the child process. Linux can share the real memory backing these identical parts. Only as virtual memory is modified by either process, these resultuing unique pages then require individual backing in real memory. This technique is known as copy on write.
Virtual memory is mappable: A fourth advantage of memory virtualization is its capability to map data into virtual address spaces. Two flavors are possible: Read only and read/write. The read only variant is mostly applied in order to map code elements such as Linux dynamic link libraries or z/VM shared segments into the virtual address space. This also provides for fast initialization of respective virtual memory regions, and is typically implemented along with sharing. The read/write variant could be used for fast file access, and when used along with sharing, provides a means of communication between virtual memory consumers.
Impacts of memory virtualization
Memory access latency: For the typical consumer of virtual memory, the process of providing virtual memory occurs transparently, except for the latency. In other words, the consumer of virtual memory does not notice that virtual instead of real memory is used, except that certain memory accesses are delayed until respective memory was made available by the provider. An example for such a delayed access might occur when hitting a virtual page not presently backed with a page frame in real memory.
This means that the main effect, which memory consumers observe when using virtual instead of real memory, is a delayed processing time of memory accesses, resulting in slower workload execution. It is of particular importance to realize that when real memory pressure exists, virtual memory consumers do not observe shortages of virtual memory. Particularly, the size of the virtual memory address space is unaffected, and does not depend on the availability of real memory in the first place. However, note that Linux provides controls that can limit virtual memory allocations depending on the availability of real memory For details see Memory provisioning and memory overcommitment.
Memory provisioning and memory overcommitment
In general, a virtual memory provider should make sure that the sum of the amount of real memory and the space available on paging disks is larger than the sum of the accessed pages of all virtual memory consumers.
The dilemma with that approach is that when initially providing a virtual address space of a certain size, a provider does not typically know in advance how much of that virtual memory will actually be used by the consumer. For example, the program executed by a Linux process might allocate 2 GiB of heap memory, but might only access a small fraction of that memory during later processing. Or a z/VM virtual system may be defined with 4 GiB, but only a fraction of that is actually used by the guest operating system.
For that reason, virtual memory providers may take the risk of allotting more virtual memory than they are able to back with real memory and paging disk space. However, if virtual memory usage at a later point in time requires more real resources than available, usually drastic measures result. For example, Linux might terminate selected processes in this case. The z/VM control program might issue a PGT004 abend when unable to obtain additional page slots on its paging devices.
In order to avoid such situations, system administrators must estimate the virtual memory usage of their workloads in advance, and assign sufficient real memory and paging disk space to their virtual memory providers.
Another approach available only under Linux is
called the overcommit handling mode. Since kernel 2.5.30,
sysctl parameters vm.overcommit_memory and vm.overcommit_ratio can
be used to customize the way Linux over-commits
memory. Because in this paper the focus is on z/VM memory
overcommitment, and our virtual systems are configured not to overcommit
memory at the Linux process
level, the use of these parameters is not further investigated here.
Interactions between virtual memory consumers and providers
In general, a direct interaction between the virtual memory consumer and the virtual memory provider is not required.
However, many virtualization platforms offer more or less sophisticated mechanisms for a direct interaction between virtual memory consumer and provider. When exploited, such mechanisms have the potential of greatly improving the overall performance of virtual systems and host system.
Another means of communication between CP and the guest operating system is known as page fault handshaking. When enabled (which is the default case), CP notifies the Linux on System z virtual system when a page fault is encountered. This enables Linux to select a different thread for dispatch, thereby using the time needed by CP to fetch the page from paging storage. Upon completion of the page read, CP sends another notification, enabling Linux to continue the previously interrupted thread.
For example, z/VM also provides the diagnose code x'10' (Release Pages) that enables a guest operating system to notify CP about the fact that a specified virtual memory range is no longer needed. As a result, CP can reclaim the real memory that had been backing the released virtual memory and use this freed real memory for other purposes. Note that releasing virtual memory in the described way does not remove that virtual memory from the virtual address space. Instead, it just communicates the virtual systems intent to not use the respective virtual memory range in near future.