As memory scales, so does the way it is accessed. This access is typical done in pages. There is a lot design that goes into ensuring the pages required for the execution of code are available well in advance, to reduce the latency of fetching them from resources father away from the CPU core.
Page sizes that are currently supported by Power Systems hardware come in 4 KB, 64 KB, 16 MB, and 16 GB sizes.
The virtual address space of a program is divided into segments. The size of each segment can be either 256 MB or 1 TB on a Power System. The virtual address space can also consist of a mix of these segment sizes. The segments are again divided into units, called pages. Similarly, the physical memory on the system is divided into page size units called page frames. The role of the Virtual Memory Manager (VMM) is to manage the allocation of real memory page frames, and to manage virtual memory page references (which is always larger than the available real memory). The VMM must minimize the total processor time, disk bandwidth price, and response time to handle the virtual memory page faults. IBM Power Architecture provides support for multiple virtual memory page sizes, which provides performance benefits to an application because of hardware efficiencies that are associated with larger page sizes.
The POWER5+ and later processor chips support four virtual memory page sizes: 4 KB, 64 KB, 16 MB, and 16 GB. The POWER6 processor also supports using 64 KB pages inside segments along with a base page size of 4 KB.3 The 16 GB pages can be used only within 1 TB segments.
Large pages provide multiple technical advantages:
- Reduced Page Faults and Translation Lookaside Buffer (TLB) Misses: A single large page that is being constantly referenced remains in memory. This feature eliminates the possibility of several small pages often being swapped out.
- Unhindered Data Prefetching: A large page enables unhindered data prefetch (which is constrained by page boundaries).
- Increased TLB Reach: This feature saves space in the TLB by holding one translation entry instead of n entries, which increases the amount of memory that can be accessed by an application without incurring hardware translation delays.
- Increased ERAT Reach: The ERAT on Power is a first level and fully associative translation cache that can go directly from effective to real address. Large pages also improve the efficiency and coverage of this translation cache as well.
Large segments (1 TB) also provide reduced Segment Lookaside Buffer (SLB) misses, and increases the reach of the SLB. The SLB is a cache of the most recently used Effective to Virtual Segment translations.
While 16 MB and 16 GB pages are intended only for particularly high performance environments, 64 KB pages are considered general purpose, and most workloads benefit from using 64 KB pages rather than 4 KB pages.
For operating system details, see http://www.redbooks.ibm.com/redbooks/pdfs/sg248079.pdf