Memory pools

When you use a workload management (WLM) policy that classifies work into a WLM resource group with a real storage memory limit, you can explicitly restrict physical real memory consumption of work that is running in concurrent address spaces. An address space that is associated with the resource group through classification connects to the resource group's memory pool. All real storage that is used to manage the address space and any data space or hiperspace that is created by the address space, including the backing of its virtual storage, are counted towards the collective pool limit. However, auxiliary storage resources that are used for virtual storage paging, continue to be shared by all address spaces in the system.

A memory pool does not reserve real storage frames for its exclusive use. When unused frames are available in the system, the collection of address spaces is restricted to using up to the real memory limit of the memory pool. When a memory pool gets near its limit, the system starts to page-out pages from the memory pool to free frames. It also reduces any real storage that is kept for performance only reasons. The memory pool protects physical memory allocation from other units of work that are running on the system. If the system gets low on real storage, it initiates paging from any of the address spaces regardless of being in a memory pool. Address spaces that are not classified to a memory pool, connect to the global pool. Common storage and shared storage are always backed by frames from the global pool.

The relationship between virtual and real storage includes setting limits through a consideration of various resources. The amount of virtual storage that an address space can allocate (obtain), but not necessarily reference is controlled by limits that are not defined through WLM resource groups. Virtual limits are only defined at an address space basis where real storage limits cannot be controlled. Real storage limits can be defined only at the memory pool collective address space level.

The amount of virtual storage that an address space can obtain is limited by the following controls.
  • Region size for 31-bit storage
  • Memory limit (MEMLIMIT) for above the bar storage
  • Program Properties Table (PPT)
  • Installation exits.
  • Data spaces and hiperspace region limits
For more information about virtual storage, see Virtual storage overview.

Allocated and referenced virtual storage can be backed by either real storage or paged-out to auxiliary storage. More virtual storage can be allocated than referenced and more storage cannot be referenced than allocated. An address space can never use more real storage than its virtual storage limit. However, the real storage that is used for dynamic address translation (DAT) tables and other system-related control blocks are also used to manage virtual storage and virtual I/O (VIO). Storage that is used by the system for performance reasons is also accumulated to real storage used by an address space. The additional real storage that the system uses makes it possible for real storage usage to be higher than the virtual storage limits and what is allocated for storage. In addition, all real storage that is associated with creating the address space is used to manage hiperspace, data spaces, and the common area data spaces.

An allocation service, such as STORAGE OBTAIN, GETMAIN, and IARV64, can reject a request when it causes the associated address spaces virtual storage limit to be exceeded. Various memory pool thresholds, which start with the OK threshold, are derived by the system from the memory pool's resource group memory limit and are used to control system behavior. The limit threshold attributes are defined in Table 1.
Table 1. Threshold limits in ascending order
Threshold name Thresholds main attributes
OK Within normal limits.
Steal The system initiates paging in an attempt to return to the OK threshold.
High Selective real storage requesters are suspended until the limit falls below the high threshold. When the pool remains above the high threshold for the system defined period, pool members are subject to abend X'E22'.
Maximum The memory pool limit that is defined by the WLM policy.
When the memory pool is over its high threshold, similar to virtual storage limits, system services might reject fixed storage allocation and page fix requests. The system might also take the following actions.
  • Suspend users of services that require real storage and keep services suspended until the memory pool falls below its high threshold. For example, a unit of work that takes an enabled page fault is suspended when the memory pool is above the high threshold. When the pool falls below the high threshold, the unit of work resumes.
  • Allow others to temporarily exceed the high threshold even when above the memory pool limit. For example, a unit of work, running with a program status word (PSW) that is disabled for interrupts, which page fixes a virtual page, can exceed the limit because it cannot be suspended.

To contain a pool within its memory pool limit and reduce the need for suspending requesters, the system begins paging the storage of memory pool member address spaces when the memory pool is at its steal threshold. Suspended units of work are resumed when the memory pool is below its high threshold. The system issues messages at different thresholds and might end the current job step of any pool member if the memory pool exceeds the threshold for the system defined time frame.

The following information summarizes the types of thresholds and actions.

When at the steal threshold
  • The system issues message IAR055I to indicate that paging of memory pool members is started.
  • The system starts to page out backed virtual storage from any number of members of the memory pool to bring the memory pools overall consumption down below the OK threshold. Only virtual storage for non-fixed pools and non-disabled reference (DREF) can be paged out.
When at or above the high threshold
  • Message IAR052E is issued when the high threshold is reached. The system continues paging to reduce the memory pool real storage usage to the OK threshold, and then message IAR052E is deleted.
  • Task processing that requires the system to back referenced virtual memory with real memory is suspended and not resumed until the memory pool falls below the high threshold. The suspension is done to prevent the pool from going over its limit. Requests can be in the form of page faults, STORAGE OBTAIN, GETMAIN, IARV64, and others. To accommodate system requesters that must not be suspended because of memory pool limits, the system allows certain requesters to increase the real storage usage above any threshold. The requesters include the following types.
    • Units of work that are:
      • Operating in service request block (SRB) mode
      • Running on behalf of DUMP processing
      • Disabled (and cannot be suspended)
      • Holding a system suspend lock (such as a local lock)
      • Going through termination
    • Other cases when the system code must obtain real storage.
  • When real storage usage does not decrease below the high threshold within the system-defined time frame and the initial reason for reaching the high threshold is from reclassification, memory pool size reduction, or neither, the system action is explained below.
    • Not from reclassification of an address space into the memory pool or from the reduction in the pool size.
      • The system might abnormally end the currently running job step of one or more memory pool classified address spaces. The job ends with an X'E22' abend code, which can or cannot terminate the job step. Real storage usage can decrease by either the system paging storage or when jobs within the memory pool release storage (including fixed storage that the system cannot page).
    • From reclassification of an address space into the memory pool or the reduction in the pool size.
      • The system issues message IAR058E to indicate that the memory pool is over the limit. The system is not able to reduce the size below the high threshold within the system defined time frame. Unless a new steal reason is triggered, the system stops stealing pages from the memory pool. The memory pool is allowed to persist over the high threshold. However, you must take the appropriate action, to either increase the size of the memory pool or remove members from the pool. For more information about the appropriate actions, see message IAR058E in z/OS MVS System Messages, Vol 6 (GOS-IEA).
When back at or below the OK threshold
  • Message IAR054I is issued indicating that the memory pool is no longer approaching the memory pool limit.
  • Any outstanding related memory pool messages are deleted.

For more information about all IAR messages, see the IAR message in z/OS MVS System Messages, Vol 6 (GOS-IEA).

Things to consider when your installation uses memory pools

  • Understand how workloads behave in a memory pool. For more information, see z/OS MVS Planning: Workload Management.
  • IBM® suggests that you use memory pools when you need to limit memory consumption for workloads. For example, Apache Spark provides guidance about how to operate workloads in a memory pool. For more information, see Configuring z/OS workload management for Apache Spark in IBM Knowledge Center
  • The WLM policy is sysplex wide, which results in resource groups that exist on each system in the sysplex, yet memory pools are managed on a per system basis. Therefore, a 100 GB memory pool in a two-way sysplex might use nearly 200 GB of real storage, but only 100 GB per system. For the resource group memory pool definition and policy scope, review z/OS MVS Planning: Workload Management.
  • Understand the difference between virtual storage and real storage limits. For more information, see Virtual storage overview.
  • Paging to auxiliary storage

    The system can start paging for one or more memory pools and all or some of pool members simultaneously. The system does not differentiate between auxiliary storage resources that are used for memory pools and the global pool. As such, reevaluate your real memory to auxiliary storage size requirements to ensure that auxiliary storage resources meet your paging needs.

    Understand memory pool storage requirements and appropriate classification rules for a workload before you run the workload in production. Review the related product documentation for guidance on memory pool usage. For example, putting all elements of a workload in the same memory pool cannot be correct for that workload.

    Memory pool paging uses system resources that might cause a performance impact on work that runs outside the memory pool. Evaluate frequent memory pool paging in the overall system context. Understand if paging is acceptable always, sometimes, a long period, at specific times, or never, and when you must intercede if an unacceptable condition occurs. For example, consider system automation that monitors memory pool-related paging messages and changes in paging, and raises alerts.

  • Consider automation that can possibly increase the memory pool size to prevent unnecessary processing delays or termination. See the previous section for paging considerations and automation.
  • Prevent contention with resources that are shared outside memory pool boundaries. As a memory pool is capping the amount of real storage that can be used by its members possibly resulting in execution threads or suspended work units when limits are reached, resources must not be shared across memory pool boundaries in a way that might block others outside the pool. This includes other memory pools and those running under the global pool. Doing so can result in unnecessary delays. For example, a memory pool member must not get exclusive ownership of a data set, file system that is used by a unit work that is running outside its memory pool.
  • Any address space that runs a function that is considered important by the installation are not to be capped in general and must not be classified to a resource group with a memory limit.

Reducing memory pool thresholds and reclassifying pool members

Consider the following points when you want to reduce the memory limit of a resource group, reclassify address spaces from a memory pool to the global pool, or reclassify address spaces from the global pool to a memory pool.

The following actions can result in the memory pool possibly reaching thresholds that can have negative affects on what is running in the memory pool. The affects of reaching the various thresholds can vary by what is running in the memory pool and how the program reacts to the limitation. For more information, see the following sections:
  • If the steal threshold is reached, the system starts to steal frames, which causes paging in an attempt to enforce the memory pool threshold.
    • Long-term paging can have negative affects on other programs that are running outside the memory pool. For more information, see Paging to auxiliary storage in Things to consider when your installation uses memory pools.
    • If the system cannot reduce the memory pool real storage usage to an acceptable level, message IAR058E is issued. Message IAR058E indicates that the pool is over its limit, the system cannot reduce it, and your installation must take further action. If reclassification or reduction in the memory pool caused the exceeded limit, the system cannot terminate any address spaces in the memory pool. For more information, see message IAR058E in z/OS MVS System Messages, Vol 6 (GOS-IEA).
      The reason the system cannot reduce real storage usage, with the previously larger pool or without the new reclassified address spaces, is that the collective memory pool address spaces were previously able to fix more storage than is allowed in the memory pool. Because the system cannot page fixed storage, it cannot reduce the current usage. Your installation must correct the condition.
      Note: If the high threshold is reached, units of work in the address space can be suspended until the condition is fixed.