Using Active Memory Sharing on SLES11

This page has not been liked. Updated 4/11/13 8:11 AM by BrianRappTags: None

Note: Active Memory Sharing is not the same as Active Memory Expansion recently introduced with POWER7 and AIX. Active Memory Expansion is not currently supported by SLES 11 or RHEL 5.


Introduction

Active Memory Sharing (AMS) technology is available thru the Enterprise Edition of IBM Power Virtualization Manager (IBM PowerVM) on select Power processor systems. AMS technology extends PowerVM virtualization by optimizing memory utilization across a physical system. The algorithms applied by AMS allow memory to be shared dynamically by moving it from one partition to another on demand for those enabled partitions. Similar to processor sharing technology, servers that host partitions that run variable workloads can benefit from AMS as memory is shifted automatically to where it is needed.

While AMS is designed to provide improved overall utilization of memory resources across host partitions, it is important to note that partitions configured to use AMS can experience changing memory performance characteristics, which can include higher memory latencies and changing physical memory availability as the memory pages are in transition.

We have observed that applications that require a high quality of service or strongly predictable performance may not be well suited for taking advantage of AMS enabled partitions. Applications with fairly constant, static or unchanging memory usage do not need to be run in partitions enabled with AMS.

More detailed information on AMS usage and performance is available in the whitepaper "IBM PowerVM Active Memory Sharing Performance".

 

Focus of this page

The purpose of this wiki page is to provide hints and usage tips for running SLES11 in an AMS enabled partition.

This page is not intended to address the HMC setup and LPAR definition activities needed to implement and enable AMS for each partition.











 


Active Memory Sharing on SLES11

Initial Required Setup

When setting up your SLES 11 based partition, there are two key setup steps which are required to avoid problems when running applications later.

Note: It is recommended that the SLES11 kernel be updated to a minimum level of 2.6.27.25-0.1-ppc64. This kernel is available for download from Novell at http://download.novell.com/Download?buildid=hfZd_kBCjug~

Increasing The Size of the Swap Partition During Install

The Linux installer will automatically size the swap paging space based on MemTotal information in /proc/meminfo reported during install. In the AMS environment this may be significantly smaller than the desired configured memory size for the Linux partition. Since the potential is higher for swap usage under AMS it is even more important to make sure the size of the swap paging space is adequate.

During install it is recommended that the default size of the swap disk paging space be increased to at least the size of configured desired memory for the Linux partition. This resizing of swap space should be done during the disk partition configuration phase of the install.

Updates to the size and location of swap space can also be done post-install using the mkswap and swapon utilities.

In the future, it will be desirable for the Linux installer code to recognize the desired configured memory size instead of the current actual assigned memory size for the partition in calculating the default size of the swap space.

 

Raising Your Virtual Address Limit

At boot time on SLES 11, the script in "/etc/initscript" calculates a per-process virtual address space limit. The intention of this limit is to prevent runway applications from over allocating memory and forcing Linux into swapping and out of memory conditions that would effect system wide performance.

The initscript sets the virtual address space limit based on the amount of total memory and the size of the swap paging space. Since the total memory allocated by the PowerVM hypervisor to the partition at boot time may be significantly less than the desired configured memory size, the virtual address space limit calculated could be less than desired.

The challenge is these limits are set once at boot time, which means the limits can cause memory allocation failures (ENOMEM) for applications.

 

Be sure to set virtual address limit in /etc/security/limits.conf

Failure to raise the per-process virtual address limit in /etc/security/limits.conf may result in unexpected application failures due to memory allocation errors (ENOMEM). Although applications will receive error return codes indicating ENOMEM no system logging of the error occurs.

To avoid this problem, the system administrator should add an entry in the /etc/security/limits.conf that sets the virtual address space limits for all users based on the desired configured memory for the partition using these steps:

  • Get the size of the swap space for your partition, for example:
    # swapon -s | grep partition | awk '\{print $3\}'
    4194176
  • Get the size of the configured "desired" memory for your partition, for example:
    # cat /proc/ppc64/lparcfg | grep DesMem
    DesMem=4096
  • Calculate the size of the virtual address space limit
    (size of swap space KB + size of configured memory KB) * 0.80
    (4194176 + (4096*1024)) * 0.80 = 6710784
  • Add the calculated virtual address space limit to /etc/security/limits.conf by adding the following line (naturally, you should replace our sample calculated value 6710784 with your specific calculated value from your system)





    The following line sets the "address space limit" (as) as a soft resource limit for all users (*) to the specified value, in this example approximately a 6.4GB address space. It's important to recognize these limits are normally always set at boot time, but in this case we are simply over-riding the value to the more desired value.
    *               soft    as           6710784
  • reboot the system

If the swap space or memory configuration of the partition were to be changed in the future, this "vitual address space limit" value will need to be updated and the partition will need to be rebooted.

 


Specific SLES 11 considerations

In this section we cover a varity of application, partition, and operating system considerations:

 

The IBM PowerVM Hypervisor may be more generous to Linux than to other partition types (AIX, i5OS) running on the system

Under SLES 11 pages being freed by the kernel are not being marked with the UNUSED flag.

This affects the IBM PowerVM hypervisor algorithm that drives memory allocation targets since pages which "should" be free aren't seen as free. This generally results in a slower than desired shifting of available memory from Linux partitions.

This issue has been fixed in more recent versions of the SLES11 kernel. It is recommended that the SLES11 kernel be updated to a minimum level of 2.6.27.25-0.1-ppc64. This kernel is available for download from Novell at http://download.novell.com/Download?buildid=hfZd_kBCjug~

In the future, pages being freed by the kernel should be marked appropriately, and a defect has been opened to address this problem in subsequent releases. When this is resolved, the kernel version with this change will be reflected here as a minimum required kernel version.

 

Additional Memory Latency May Affect Some Applications

Because physical memory allocations to a partition may shift over time, an application may encounter physical memory allocation limits that result in the increased usage of swap.

The overhead and response time of swap may make applications and the overall system perform more slowly, which can be measured with standard classic performance monitoring tools.

Additionally, to avoid memory thrashing at the hypervisor level, the IBM PowerVM hypervisor intentionally shifts memory between partitions at a capped rate. In the initial hypervisor release, the maximum rate at which memory is shifted between partitions is limited to the smaller of:

  • 1/256th of the partition's configured memory size
  • 1/512th of the shared pool's total memory size

Because memory shifts are limited by a capped rate, an application that has high initial demands for memory may also encounter physical memory allocation shortages that result in the usage of swap.

Application specific timeouts should be adjusted as needed.

 

AMS' Affect on Reported Memory Information in Linux

The memory levels reported by /proc/meminfo as well as other Linux utilities (top, vmstat, sar, etc) reflect the current memory allocations made by the PowerPC hypervisor. Applications that utilize this information should be aware that they will vary over time.

 

The Cost of Using Shared Memory

The use of shared memory has a performance cost associated with it. The primary contributors are additional hypervisor management overhead, a potential for more TLB misses due to the smaller 4K page size being used at the hypervisor level, and Linux IO memory accounting.

 

SLES 11 64KB pages when using AMS

Need to describe what's now happening when AMS is managing pages on a 4KB basis.

 

Adjusting a Partition's Observed Memory Allocation

If you observe ongoing over/under memory allocation behavior during steady state, the partition's "Memory Weight" configuration parameter can be used to adjust a partitions weight for subsequent boots. This value can be set within the range of whole numbers from 0 to 255, the default value is 128.

 

Memory Allocation Differences Between Operating Systems

Linux's management of its swap and memory resources may result in a higher memory allocation than similarly configured AIX partitions that share the same memory pool run.

The use of the partitions "Memory Weight" configuration parameter, described above, may need to be adjusted to overcome this.


Monitoring

On Linux for Power, additional tools for monitoring AMS specific attributes are provided by the powerpc-utils-papr package. These tools aggregate information from variety of sources under the /proc and /sys file systems. The latest version of these tools is available at http://powerpc-utils.ozlabs.org/. Version 1.1.6 or later is required.

amsstat

This is a command line utility for monitoring AMS data. This tool is capable of periodic logging and can be used for tuning and problem determination.

{{# ./amsstat }}

Wed Mar 18 10:00:03 CDT 2009

System Memory Statistics:

MemTotal: 690560 kB

MemFree: 156096 kB

Buffers: 159552 kB

Cached: 223488 kB

Inactive: 189120 kB

SwapTotal: 2104384 kB

SwapFree: 2102976 kB

DesMem: 2048 MB

Entitlement Information:

entitled_memory: 67108864

mapped_entitled_memory: 4792320

entitled_memory_group_number: 32773

entitled_memory_pool_number: 0

entitled_memory_weight: 128

entitled_memory_pool_size: 3221225472

entitled_memory_loan_request: -36864

backing_memory: 1001492480

cmo_enabled: 1

cmo_faults: 5930

cmo_fault_time_usec: 8074654

cmo_primary_psp: 1

cmo_secondary_psp: 65535

CMM Statistics:

disable: 0

debug: 0

min_mem_mb: 256

oom_kb: 1024

delay: 1

loaned_kb: 1119232

loaned_target_kb: 1119232

oom_freed_kb: 0

VIO Bus Statistics:

cmo_entitled: 67108864

cmo_reserve_size: 7927808

cmo_excess_size: 59181056

cmo_excess_free: 59181056

cmo_spare: 1562624

cmo_min: 4687872

cmo_desired: 7927808

cmo_curr: 5353472

cmo_high: 5357568

VIO Device Statistics:

v-scsi@30000002:

cmo_desired: 2125824

cmo_entitled: 2125824

cmo_allocated: 1114112

cmo_allocs_failed: 0

l-lan@30000004:

cmo_desired: 4239360

cmo_entitled: 4239360

cmo_allocated: 4239360

cmo_allocs_failed: 0



The amsstat output is broken into several sections. All of the values are measured in bytes unless otherwise noted. Descriptions of some of the most useful fields include:

  • System Memory Statistics Section
    • MemTotal — The current memory allocation assigned to this partition by the IBM PowerVM hypervisor. This total will fluctuate over time as memory usage is balanced between partitions configured to share the memory pool.
    • DesMem — The desired memory requested during the configuration of this partition.
  • Entitlement Information Section
    • entitled_memory — The entitled_memory is a specific amount of memory given to Linux that is available to be mapped for I/O operations.
    • mapped_entitled_memory — The number of bytes of entitled_memory currently mapped for I/O operations.
    • cmo_enabled — Set to 1 when the partition is running in shared memory mode.
    • cmo_faults — A cmo fault occurs when an attempt is made to access a page of memory and the hypervisor must suspend the partition and request data from disk. This value is a sum of the number of faults since the operating system was booted. Increases in this value indicate contention for memory between partitions in the AMS environment.
    • cmo_fault_time_usec — The running sum of the amount of time (in millseconds), since boot, that Linux has been suspended by the hypervisor to handle cmo_faults.
  • CMM Statistics Section
    • disable — Set to 0 if CMM is enabled (this is the default system configuration). If set to 1 then no loaning of pages to the hypervisor will occur and any memory shortfalls will result in utilization of the hypervisors swap paging space.
    • min_mem_mb — The minimum amount of memory that will always be allocated to the partition.
    • loaned_kb — The amount of memory (in KB) the Linux kernel has loaned back to the hypervisor.
    • loaned_target_kb — The amount of memory less than the configured DesMem that the IBM PowerVM hypervisor would like the Linux kernel to loan back to the shared pool.
  • Bus Statistics Section

    This section provides statistics for the VIO bus and all devices attached to it. The VIO Bus manages the operating system's entitled memory for devices which may perform DMA operations.
  • Device Statistics Section

    This section contains information for each device on the VIO bus that can map I/O memory.
    • cmo_allocs_failed — When the amount of memory allocated (cmo_allocated) has exhausted both the entitled memory (cmo_entitled) and the bus' excess pool, memory mapping failures will occur.

      For each failed attempt, the value displayed here will increase by 1. Large changes in this value would indicate resource contention that may require system tuning through the HMC.

amsvis

A graphical tool which provides an at-a-glance feel for AMS statistics in a Linux environment.

Viewing a single system:

Viewing multiple systems:

Additional details for both of these tools are provided in the manual pages provided as part of the powerpc-utils-papr package.


Questions and Discussion

AMS FAQs

1. Under Linux, How can I tell how much logical memory has been configured for the partition?

A. The DesMem field reported by amsstat contains the amount of memory configured in the partition's profile through the HMC.



2. How can I tell how much physical memory has been assigned to the partition by the PowerVM hypervisor?

A. The backing_memory field reported by amsstat contains the current amount of physical memory (in bytes) assigned to the partition by the PowerVM hypervisor. This value should also be very close to MemTotal value reported in /proc/meminfo.



3. What do the loaned_target_kb and loaned_kb values represent in the amsstat output?

A. Based on overall memory usage conditions, the PowerVM hypervisor may request a partition to loan back a target amount of physical memory. This value is contained in the loaned_target_kb field reported by amsstat. The SLES11 kernel will always work aggressively to meet this loan request. The amount of memory actually loaned by the SLES11 kernel back to the PowerVM hypervisor is reflected in the loaned_kb value reported by amsstat. As memory is loaned back to the PowerVM hypervisor, the MemTotal value in /proc/meminfo and the backing_memory value reported by amsstat will be updated to reflect this.



4. What happens if the loaned_kb_target cannot be met by the Linux kernel?

A. In cases where the amount of total active memory for all partitions exceeds the physical size of the memory pool , hypervisor paging (or swapping) occurs. This is harder to accomplish with SLES11 partitions alone if the pool is sized properly to avoid physical over-commitment. AIX partitions using the shared memory pool could be operating with a less aggresive loaning policy and an over-commitment of the physical memory could occur. When the shared memory pool is physically over-committed, memory will be stolen by the PowerVM hypervisor through paging. When partitions access memory that has been paged-out to the PowerVM hypervisor paging storage pool a fault will occur and be handled by the hypervisor.

Despite being aggressive there are conditions where Linux will not loan memory to satisfy the PowerVM hypervisor loaned_target_kb. When the Out-Of-Memory (OOM) threshold is reached by the Linux kernel it reacts by releasing loaned memory and not honoring all of part of additional memory loan requests made by the PowerVM hypervisor. Additionally, Linux will not loan below 256Mb (or 512Mb if kdump is configured with 256Kb reserved for the kdump kernel). In a constrained environment these two factors might also lead to physical over-commitment of memory and hypervisor paging.



5. Where can I fund out more information on the fields displayed by the amsstat utility?

A. The man page for the amsstat command provides very complete documentation for all of the fields relevant to AMS (invocation: man amsstat)

 

For discussions or questions...

To start a Linux discussion or get a question answered, consider posting on the Linux for Power Architecture forum

To get questions answered on PowerVM or the Active Memory Sharing implementation, consider posting on the PowerVM forum.