Configuring LSF to run NVIDIA Docker jobs

Configure the NVIDIA Docker application profile or queue in LSF to run NVIDIA Docker jobs.

About this task

If you are using the NVIDIA Docker integration, you need to configure separate application profiles or queues to run NVIDIA Docker jobs.

You cannot run pre-execution and post-execution scripts in container jobs. The following are workarounds for specific pre-execution and post-execution operations:
  • To prepare data for the container as a pre-execution or post-execution operation, put this data into a directory that is mounted to a job container.
  • To customize the internal job container, you can customize the starter scripts to prepare the appropriate environment.

Procedure

Edit the lsb.applications or lsb.queues file and define the CONTAINER parameter for the application profile or queue to run NVIDIA Docker jobs.

If this parameter is specified in both files, the parameter value in the lsb.applications file overrides the value in the lsb.queues file.

CONTAINER=nvidia-docker[image(image_name) options(docker_run_options)]

In the following examples, LSF uses the ubuntu image to run the job in the Docker container.

  • For sequential jobs:
    CONTAINER=nvidia-docker[image(ubuntu) options(--rm)]

    The container for the job is removed after the job is done, which is enabled with the docker run --rm option.

  • For parallel jobs:
    CONTAINER = nvidia-docker[image(ubuntu)  options(--rm --net=host --ipc=host -v --runtime=nvidia /path/to/my/passwd:/etc/passwd)

    This command uses the following docker run options:

    --rm
    The container for the job is removed after the job is done
    --net=host
    LSF needs the host network for launching parallel tasks.
    -v
    LSF needs the user ID and user name for launching parallel tasks.
    --runtime=nvidia
    You must specify this option if the container image is using NVIDIA Docker, version 2.0.
    Note: The passwd file must be in the standard format for UNIX and Linux password files, such as the following format:
    user1:x:10001:10001:::
    user2:x:10002:10002:::

For more details, refer to the CONTAINER parameter in the lsb.applications file or lsb.queues file.