Control groups (cgroups) for limiting resource usage on Linux

Before you set limits on memory or CPU usage on Linux®, you must install a control group (cgroup) on each compute host. A cgroup is a Linux kernel feature that allows hierarchical management and allocation of system resources (for example, CPU, memory, and disk input or output) for service instance (SI) groups. For more information about cgroups, refer to your Linux kernel documentation.

IBM® Spectrum Symphony uses cgroups to limit the memory or CPU usage on the compute hosts. Additionally, each system resource (such as memory or CPU) is considered a subsystem. A hierarchy is created for one or more subsystems. Each root directory of the hierarchy (the root cgroup), and all its subdirectories (child cgroups), contain their own configuration files that define limitations for operating system resource usage.

A process can be attached to the root cgroup or one of the child cgroups so that the process is subject to the usage limitations for an operating system resource based on the cgroup configuration files.

When using cgroups, note the following:
  • Unmount the memory root cgroup on each compute host when a cluster is uninstalled or stopped.
  • If a SIM is disabled due to an abnormal action, the cgroup directories remain. Note that the cgroup folders exist only in memory, not on the disk, and are cleaned automatically after a system restart.

Subsystems

The subsystem parameters are a key component of cgroup resource controls that set limits, restrict access, or define allocations for each subsystem. With respect to memory and CPU usage when IBM Spectrum Symphony services run on compute hosts, be familiar with the two subsystems: memory and CPU for cgroups. IBM Spectrum Symphony uses the memory cgroup parameters and cpu group parameters.

A tasks file keeps track of processes associated with the hierarchy that are governed by the parameter settings. The tasks file contains all the process IDs (PIDs) assigned to the cgroup.

Memory subsystem
The memory subsystem of the cgroups feature isolates the memory behavior of a group of processes (tasks) from the rest of the system. It reports on memory resources used by the processes in a cgroup, and sets limits on memory used by those processes.
IBM Spectrum Symphony uses the following memory cgroup parameters:
memory.memsw.limit_in_bytes
Stores the value of the virtualMemoryLimit parameter, which is configured in the application profile or client API.
memory.limit_in_bytes
Stores the value of the virtualMemoryLimit and physicalMemoryLimit parameters, which are configured in the application profile or client API.
CPU subsystem
The CPU subsystem of the cgroups feature isolates the CPU time consumption of a group of processes (tasks) from the rest of the system. It reports on CPU usage by the processes in a cgroup, and sets limits on the number of CPUs used by those processes.
IBM Spectrum Symphony uses the following cpu cgroup parameters:
cpu.cfs_period_us
Specifies a period of time, in mircoseconds, for how regularly a cgroup's access to the CPU resources should be reallocated. Valid values are 1 second to 1000 microseconds.
cpu.cfs_quota_us
Specifies the total amount of time, in microseconds, for which all tasks in a cgroup can run during one period (as defined by cpu.cfs_period_us). As soon as tasks in a cgroup use up all the time specified by the quota, they are throttled for the remainder of the time specified by the period and not allowed to run until the next period.
Together, the cpu.cfs_period_us and cpu.cfs_quota_us store the value of the cpuLimit parameter, which is configured by the cpuLImit parameter either within the application profile, or during session creation by the client API. The cpuLimit parameter stores the number of cores on which an IBM Spectrum Symphony service is expected to run. Given a service takes up to m cores in its run, then the cpuLimit is defined as m cores per service. IBM Spectrum Symphony translates the cpuLimit value to cpu cgroup parameters using these formulas:
cpu.cfs_period_us  =  100000 (0.1 second)
cpu.cfs_quota_us = m  * cpu.cfs_period_us
where m is greater than or equal to 1, so that the default is 1. Valid values for cpuLImit is between 1 and 262144. This can be dynamically changed as a multi-thread SI receives more incoming parallel tasks. The following table outlines how the cpuLImit value translates the cpu.cfs_period_us and cpu.cfs_quota_us cpu cgroup parameters:
cpuLimit cpu.cfs_quota_us cpu.cfs_period_us
1 100000 100000
2 200000 100000
3 300000 100000
m m*100000 100000

Cgroup configuration

Configure cgroups using one of two modes:
  • Simplified workload execution mode (WEM), which requires manually setting up and mounting the cgroup, and assigning SIM permissions to create sub directories for its SIs, and then running the supplied symcgroup.sh script file.
  • Advanced WEM, where pem automatically sets up and mounting the cgroup, and assigns the SIM permissions.
In simplified WEM, you run the script file called symcgroup.sh, which is included with IBM Spectrum Symphony. Use it to set up and configure the cgroups hierarchy on each compute host. The script file performs the following functions:
  1. Configure cgroups

    The script detects the cgroup memory or CPU configuration on the host it is running on. If the memory or CPU subsystem has been attached to a hierarchy, the script generates the IBM Spectrum Symphony product hierarchy IBM_SYMPHONY_PRODUCT in the hierarchy where the subsystem is located. If the subsystem has not been attached, the script attaches it to the default hierarchy (/cgroup/memory or /cgroup/cpu), and generates the IBM_SYMPHONY_PRODUCT hierarchy at that location.

  2. Assign the proper permission to each cgroup directory.
    For each leaf consumer, the script creates a cgroup directory, as follows:
    • /cgroup/memory/IBM_SPECTRUM_SYMPHONY/clustername/consumer
    • /cgroup/cpu/IBM_SPECTRUM_SYMPHONY/clustername/consumer
    The script also assigns directory ownership to the consumer execution user.
Note:
The SIM completes the cgroup configuration, as follows:
  1. Creates a cgroup directory for each SI.
    For example, create:
    • /cgroup/memory/IBM_SPECTRUM_SYMPHONY/clusterName/consumer/applicationName/resourceGroup/serviceName/SI_PID/
    • /cgroup/cpu/IBM_SPECTRUM_SYMPHONY/clusterName/consumer/applicationName/resourceGroup/serviceName/SI_PID/

    A file named tasks is created under this folder.

  2. Sets the resource limits.

    Set memory limit values of memory.memsw.limit_in_bytes and memory.limit_in_bytes for the SI group.

    Set CPU limit values of cpu.cfs_period_us and cpu.cfs_quota_us for the SI group.

  3. Attaches each SI to its cgroup directory and puts the SI PID into the following locations:
    • /cgroup/memory/IBM_SPECTRUM_SYMPHONY/clusterName/consumer/applicationName/resourceGroup/serviceName/SI_PID/tasks
    • /cgroup/cpu/IBM_SPECTRUM_SYMPHONY/clusterName/consumer/applicationName/resourceGroup/serviceName/SI_PID/tasks

In advanced WEM, pem and SIM automatically perform the cgroups configuration steps as described for simplified WEM.