Preparing LSF to run Podman jobs

Prepare LSF to run jobs in Podman containers.

Before you begin

  • You cannot run Docker container jobs on LSF if you are also using LSF to run Podman jobs.
  • Get the following podman and podman-docker packages for the Podman 3.3.1 container engine.

    The Podman installation packages will be installed on the LSF server hosts where you intend to run Podman container jobs. The minimum Podman version is Podman 3.3.1 on a RHEL 8.2 host. For optimal performance, use Podman 3.3.1, or newer, on a RHEL 8.2.1 host.

    All LSF server hosts for Podman container jobs must be using the same version of the Podman packages.

  • cgroups v1 or v2 is enabled on the LSF server hosts where you intend to run Podman container jobs. Since Podman does not use cgroups when running in daemon-less mode, all Podman-related processes (podman, fuse-overlayfs, and common) are put in the job or task cgroup.

Procedure

  1. Set up the Podman container engine on each LSF server host that will run Podman container jobs.
    1. Log in to the LSF server host as root.
    2. Install the podman and podman-docker packages on the LSF server host.

      For example, for Podman 3.3.1:

      # rpm -qa |grep podman
      podman-3.3.1.module_el8.2.0+305+5e198a41.x86_64
      podman-docker-3.3.1.module_el8.2.0+305+5e198a41.noarch
    3. Create /etc/subuid and /etc/subgid files for LSF users.

      For example, the following files set subordinate UIDs and GIDs for the lsfadmin user for Podman containers:

      # cat /etc/subuid
      lsfadmin:100000:65536
      # cat /etc/subgid
      lsfadmin:100000:65536

      For a user to submit an LSF Podman job, the subordinate UID and GID for that user must be in these files.

    4. Enable the user linger state for the LSF users.
      $ sudo systemctl restart systemd-logind.service
      $ sudo loginctl enable-linger user_name

      The user linger state is required because LSF runs Podman jobs on execution hosts even if the user is not logged in to the execution host. This is required for LSF to kill Podman jobs.

    5. Update the UID and GID values.

      Run the following commands to update the newuidmap cap value, the newgidmap cap value, and the Podman subuid and subgid settings:

      $ sudo setcap cap_setuid+eip /usr/bin/newuidmap
      $ sudo setcap cap_setgid+eip /usr/bin/newgidmap
    6. Create a symbolic link from /usr/bin/python to the Python executable file.

      In RHEL 8.x, python3 is default installed version, and there is no /usr/bin/python executable file on the host. Since the execution driver uses /usr/bin/python to execute the Python script, you must create a link from /usr/bin/python linking to the available Python executable file (both python2.x and python3.x are available).

      For example, to link to python3,
      $ ln -s /usr/bin/python3 /usr/bin/python
    7. Edit the /etc/containers/registries.conf file and remove redundant entries to optimize the container image download time.

      For example, comment out all the registries except docker.io.

      [registries.search]
      #registries = ['registry.redhat.io', 'registry.access.redhat.com', 'quay.io', 'docker.io'] 
      registries = ['docker.io']
    8. Create a /etc/containers/nodocker file to suppress redundant Docker messages.
      # touch /etc/containers/nodocker
    9. Log in to the LSF server host as a user that will submit Podman jobs.
    10. If you use an LDAP server to manage the LSF users and the user home directories are on an NFS server, change the Podman user configuration file to save Podman images to a local file system instead of an NFS directory.

      Podman does not support building or loading container images on NFS home directories. For more details, refer to the Podman troubleshooting website (https://github.com/containers/podman/blob/master/troubleshooting.md#14-rootless-podman-build-fails-eperm-on-nfs).

      For example, perform the following steps to save Podman images to a local file system:

      1. Run the podman system reset command to reset the Podman configuration.
      2. Create Podman configuration files from the templates.
        $ mkdir -p $HOME/.config/containers
        $ cp /usr/share/containers/containers.conf $HOME/.config/containers
        $ cp /etc/containers/storage.conf $HOME/.config/containers
      3. Create a directory on the local file system, then confirm that your current user name is the owner of the directory and has read/write permissions for that directory.
        $ ls -l -d /podmanfs/userA/
        drwxr-xr-x 3 userA groupA 20 Nov 11 22:30 /podmanfs/userA/
      4. Update the volume_path, tmp_dir, and static_dir parameters in the containers.conf file, and update the graphroot, runroot, and [storage.options] parameters in the storage.conf file.

        For example, if you are using the bash shell, run the following commands:

        $ podmandir="/podmanfs/userA/"
        $ userconf=$HOME/.config/containers/containers.conf
        $ storageconf=$HOME/.config/containers/storage.conf
        $ userid=`id -u`
        $ sed -i "s|^#.*volume_path.*|volume_path=\"$podmandir/volumes\"|g" $userconf
        $ sed -i "s|.*tmp_dir =.*|tmp_dir =\"/tmp/run-$userid\"|g" $userconf
        $ sed -i "s|^#.*static_dir =.*|static_dir =\"$podmandir/libpod/\"|g" $userconf
        $ sed -i "s|graphroot =.*|graphroot=\"$podmandir/storage\"|g"  $storageconf
        $ sed -i "s|runroot =.*|runroot=\"/run/user/$userid\"|g"  $storageconf
        $ sed -i '/\[storage.options\]/amount_program = "/usr/bin/fuse-overlayfs"'  $storageconf
      5. Run the podman system migrate command to apply the changes.
      6. Run the podman info command to confirm that the changes are applied:
        $ podman info |grep -niE 'volumepath|graphroot|runroot'
        73:  graphRoot: /podmanfs/username/storage
        81:  runRoot: /run/user/userid
        82:  volumePath: /podmanfs/username/volumes
    11. Check to see if Podman is running correctly.

      Pull a Docker image and run a Podman job to check if the container ID is created.

      For example, to pull a CentOS image and run a Podman job, run the following commands:

      $ podman pull centos
      $ podman run --rm --detach centos sleep 200
      15ec62cece2cc2fa5b9fae6c114e97d655240623994814c9a97c64b9635c607c
      $ podman ps
      CONTAINER ID IMAGE                           COMMAND   CREATED       STATUS PORTS NAMES
      15ec62cece2c docker.io/library/centos:latest sleep 200 7 seconds ago Up     12345 fervent_sutherland

      The Podman job output is the container ID. Verify that the container ID directory is created:

      $ find /sys/fs/cgroup/ -iname 15ec62cece2cc2fa5b9fae6c114e97d655240623994814c9a97c64b9635c607c

      If the container ID directory is not created, log out, then log back in and try to start the container again.

  2. Edit the lsf.conf file and configure the following parameters:
    LSF_PROCESS_TRACKING=Y
    LSF_LINUX_CGROUP_ACCT=Y
    LSB_RESOURCE_ENFORCE="cpu memory"
  3. Edit the lsf.shared file and configure a new Boolean resource named docker. For example, copy and paste the following snippet into your lsf.shared file:
    ...
    Begin Resource
    RESOURCENAME  TYPE    INTERVAL INCREASING DESCRIPTION  # Keywords
    ...
       docker     Boolean ()       ()         (Podman-Docker container)
    ...
    End Resource 
  4. Edit the lsf.cluster file and attach the docker Boolean resource to the LSF server hosts that are running the Podman container engine.

    This configuration enables LSF to automatically dispatch Podman jobs to the LSF server hosts that are running the Podman container engine.

    ...
    Begin Host
        HOSTNAME  model  type  server  r1m  mem  swp  RESOURCES
        ...
        host1     !      !     1       3.5  ()   ()   (docker)
        ...
    End Host