Enforcing IBM® Spectrum LSF job memory and swap with Linux cgroups

Enable cgroup memory and swap enforcement in LSF.

About this task

Linux control groups (cgroups) limit, account, and isolate resource usage (CPU, memory, swap space, disk I/O, and other resources) of process groups by aggregating and partitioning sets of tasks and all their future children, into hierarchical groups with specialized behavior. cgroup support is a feature of the Linux kernel since kernel version 2.6.24.

All LSF job processes are controlled by the Linux cgroup system. If job processes on a host use more memory than the defined limit, the job will be immediately killed by the Linux cgroup memory sub-system.

Since Linux kernel 2.6.34, an eventfd-based generic API notification about changing status of a cgroup was introduced. With eventfd, LSF is able to be notified when job processes used more memory than the limit, then LSF will kill all processes of the job and provide specific termination reason that will be written into the LSF job accounting file and be displayed through bjobs –l.

The following steps enable cgroup memory and swap enforcement in LSF.

Procedure

  1. Mount the cgroup system.
    1. Check whether the cgroup is supported by your Linux system.

      The Linux kernel supports cgroups since v2.6.24, and supports eventfd-based notification of changing cgroup status since v2.6.34.

    2. Check whether the cgroup system is already mounted.
      $ grep cgroup /proc/mounts
      cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0
      cgroup /cgroup/memory cgroup rw,relatime,memory 0 0
      cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0
      cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0

      Four cgroup subsystems are mounted: cpuset, memory, freezer, and cpuacct.

    3. Mount cgroup subsystems.

      Edit the following lines in /etc/fstab:

      cgroup /cgroup/freezer cgroup freezer 0 0
      cgroup /cgroup/cpuacct cgroup cpuacct 0 0
      cgroup /cgroup/memory cgroup memory 0 0
      cgroup /cgroup/cpuset cgroup cpuset 0 0
    4. Submit the following job as root:
      # bsub -M 100 -v 50 my_program
    5. Run the following command as root
      # mount –a –t cgroup
  2. Enable LSF memory and swap enforcement.
    1. Define the LSB_RESOURCE_ENFORCE parameter in the $LSF_ENVDIR/lsf.conf file.
      LSB_RESOURCE_ENFORCE="memory"
    2. Run the lsfrestart command as the LSF administrator to restart LSF on all hosts.
  3. Verify LSF memory and swap enforcement.

    Submit a job with memory or swap enforcement requirements.

    Use bsub –M to specify memory limits for the job. Use bsub –v to specify swap limits for the job. You can also configure memory and swap limits in a queue (lsb.queues) and in an application profile (lsb.applications) with the MEMLIMIT and SWAPLIMIT parameters. Memory and swap limits are enforced per slot/task.

    In the following example, my_program asks for 2 slots and 100 MB memory limit per slot. Since the job runs on a single host, LSF sets up a cgroup memory sub system with a 200 MB limit. When the job uses more than 200 MB, the job is terminated.

    $ bsub -n 2 -M 100 –R “span[hosts=1] ” my_program

    In the following example, my_program asks for 100 MB memory and 50 MB swap. After my_program uses more than 100 MB of memory, the cgroup will start to use swap for the job process. The job will not be killed until the application reaches 150 MB memory usage (100 MB memory + 50 MB swap).

    $ bsub -M 100  -v 50 my_program
  4. Check job status.

    If the cgroup system supports eventfd notification, bjobs –l can show the job termination reason if the job is killed because it exceeds a memory limit.

    For example,

    $ bjobs -l 950
    Job <950>, User <user1>, Project <default>, Status <EXIT>, Queue <normal>, Comm
                         and <./eatmem 200>
    Thu Oct 17 02:25:49: Submitted from host <hostA>, CWD </home/user1>;
     MEMLIMIT
        100 M
    Thu Oct 17 02:25:49: Started on <hostA>, Execution Home </home/user1>,
                         Execution CWD </home/user1>;
    Thu Oct 17 02:26:01: Exited with exit code 137. The CPU time used is 0.7 second
                         s.
    Thu Oct 17 02:26:01: Completed <exit>; TERM_MEMLIMIT: job killed after reaching
                          LSF memory usage limit.
     MEMORY USAGE:
     MAX MEM: 100 Mbytes
     SCHEDULING PARAMETERS:
               r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
     loadSched   -     -     -     -       -     -    -     -     -      -      - 
     loadStop    -     -     -     -       -     -    -     -     -      -      - 
     RESOURCE REQUIREMENT DETAILS:
     Combined: select[type == any] order[r15s:pg]
     Effective: select[type == any] order[r15s:pg]