Operating system partitioning and virtualization on Oracle Solaris and IBM AIX

This describes how LSF works in an OS partitioning and virtualization environment, focusing on Oracle Solaris containers and IBM AIX partitions.

Summary

Solaris
In a non-global Solaris container, use a dedicated fixed number of CPUs.
AIX
On IBM AIX, use logical partitions (LPARs) and dynamic logical partitions (DLPARs) with dedicated processors. Note the following points:
  • With LPARs, DLPARs, and micropartitions, all LSF daemons can run, and LSF can dispatch and run jobs on the partitions just as on a physical host.
  • LSF cannot run on WPARs.
  • Specific to CPU utilization reporting,
    1. In micro-partitions with shared processors (capped or uncapped), LSF reports CPU utilization correctly. The CPU utilization is calibrated relative to the entitled capacity of the node.
    2. On LPARs and DLPARs with dedicated processors, LSF works well and reports CPU utilization correctly. LSF also reports number of processors correctly after CPU allocation is changed dynamically on DLPARs.

Solaris 10 containers

In a global zone of Solaris 10, LSF works well without any known issues.

In a non-global zone of Solaris 10, LSF has the following known limitations:

Privileges
The root user in non-global zones of the Solaris container does not have all the privileges that the root user in a global zone has. For LSF to work in a non-global zone, you must ensure that all required privileges are configured for the root user. For example, make sure that that proc_priocntl privilege is assigned to the root user of the non-global zone by using the zonecfg command.
Resource management
When all required privileges are available for the root user, all LSF daemons can run and LSF is able to dispatch and run jobs on a non-global zone like on a physical host. However, the number of CPUs and CPU utilization that is detected by LSF might not be correct for a non-global zone, depending on what resource management mechanism is configured. This might interfere with LSF CPU-based scheduling.

The resource management features that are provided by Solaris 10 include a combination of Fair Share Scheduler, CPU caps, dedicated CPUs, and resource pools.

In a non-global Solaris container, use a dedicated fixed number of CPUs.

Dedicated CPUs
The dedicated-cpu resource specifies that a subset of the system's processors are be dedicated to a non-global zone while it is running. When the zone boots, the system dynamically creates a temporary pool for use while the zone is running. You can specify a fixed number of CPUs or a range, such as 2-4 CPUs, for the number of dedicated CPUs.

When dedicated CPUs are configured with a fixed number of CPUs (that is, the minimum number of CPUs is equal to the maximum), LSF works well and correctly reports number of CPUs and CPU usage in the non-global container.

If a range is specified for the number of CPUs (that is, the minimum number of CPUs does not equal the maximum), LSF does not accurately report the number of CPUs and CPU usage when CPUs are added or removed from the container.

Capped CPUs
The capped-cpu resource provides an absolute fine-grained limit (up to 2 decimal points) on the amount of CPU resources that can be consumed by a zone.

CPU capping for Solaris containers, by design, makes all CPUs in the global zone visible to each of the individual containers. LSF detects the number of CPUs for the container to be the total number of CPUs on the physical host. The CPU utilization that LSF reports is always consistent with mpstat output, which is the host/global-level metric for CPU-capped Solaris containers. No Solaris commands are available that show the CPU utilization specific to a single container when CPU capping is in effect.

For example, a physical host has 2 CPUs and there are two Solaris containers (z1 and z2) with CPU capping of 0.8 each. If z1 has a job running that uses up all of the CPUs available for this container (0.8), and z2 does not have any jobs running, the CPU utilization seen from both z1 and z2 is host level, 40%. This can be obtained from mpstat output on z1 and z2, and LSF reports the same number. In this case, if everything else is equal, LSF has an equal probability of dispatching new jobs to z1 and z2, even though z1 has no more capacity to run more jobs and z2 has the full capacity. This can be mitigated by the job slots configured for z1 and z2. For example, if both of them are manually configured with 1 job slot, since the job slot on z1 is used and the one on z2 is available, LSF dispatches the new job to z2. However, in general, CPU-based scheduling in LSF does not work in this environment. Do not use CPU capping for Solaris containers with LSF because they are conflicting workload management tools.

Fair Share Scheduler
Oracle Solaris Fair Share Scheduler can be used to control the allocation of available CPU resources among zones, based on their importance. This importance is expressed by the number of shares of CPU resources that is assigned to each zone.

When Fair Share Scheduler is configured, the case is similar to capped CPUs for LSF. All CPUs are visible to the sharing zones and global-level CPU utilization is reported in the zones. Therefore, CPU-based scheduling for LSF does not work in this environment.

Resource pool
Resource pools can be configured to include a processor set with an optional scheduling class. Dynamic resource pools provide a mechanism for dynamically adjusting each pool's resource allocation in response to system events and application load changes. A zone can then be associated with a configured resource pool.

Though not tested, if a non-global zone is associated with a static resource pool where the processor set has fixed number of CPUs, LSF is expected to work well in this zone. In all other cases, CPU-based scheduling for LSF might not work well.

AIX partitions

There are several types of AIX partitions.

Logical partition (LPAR)
Logical partitions are introduced with AIX 5.1 and POWER4 technology. Multiple operating systems including AIX and Linux can run in separate partitions on the same system without interference. However, a reboot is required to move resources between LPARs.
Dynamic logical partition (DLPAR)
Dynamic logical partitions are introduced in AIX 5.2. More system flexibility is achieved by being able to move CPUs, I/O adapters and memory dynamically without rebooting the LPARs.
Micropartition
Micropartitions are introduced in AIX 5.3. Micro-Partitioning maps virtual processors to physical processors and the virtual processors are assigned to the partitions instead of the physical processors. With IBM eServer p5 servers, you can choose the following types of partitions:
Workload partition (WPAR)
Workload partitions are introduced in AIX 6.1. WPAR is a purely software partitioning solution that is provided by the operating system. WPAR provides a solution for partitioning one AIX operating instance in multiple environments, each environment being a WPAR. WPARs can be created within multiple AIX instances of the same physical server, whether they execute in dedicated LPARs or micropartitions. A powerful feature of WPAR is WPAR mobility with checkpoint/restart.WPARs created within multiple AIX instances of the same physical server.

Two types of WPAR are available: Application WPAR and System WPAR.

An application WPAR is a light environment suitable for execution of one or more processes. It is transient and has the same lifetime as the application.

A system WPAR is almost a full AIX environment. Each system WPAR has dedicated writable file systems, although it can share the global environment /usr and /opt file systems in read only mode. When a system WPAR is started, an init process is created for this WPAR, which in turn spawns other processes and daemons.

LSF on AIX partitions

With LPARs, DLPARs, and micropartitions, all LSF daemons can run, and LSF is able to dispatch and run jobs on the partitions like on a physical host.

LSF cannot run on WPARs.

LPARs with dedicated processors

On LPARs (whether on micropartitions or physical machines) with dedicated processors, LSF works well and correctly reports CPU utilization.

DLPARs with dedicated processors

On DLPARs (whether on micropartitions or physical machines) with dedicated processors, LSF works well and correctly reports CPU utilization. LSF also correctly reports the number of processors when CPU allocation is changed dynamically on DLPARs.

Micropartitions

In a micropartition that is configured with shared processors, LSF detects the number of CPUs to be the same as the configured virtual processors, and it reports CPU utilization independently for each partition.

LSF reports CPU utilization correctly. Note the CPU utilization is calibrated relative to the entitled capacity of the node. For example, if a node is using cores 1.5 times its entitled capacity, and the CPU utilization from the mpstat output is 60%, LSF reports ut to be 90%.

WPARs

Application WPARs are transient and are not suitable for server applications like LSF daemons.

LSF cannot run on system WPARs. LSF must have access to devices like /dev/mem and /dev/kmem. These are typically removed on WPARs.