Submit jobs with affinity resource requirements on IBM POWER8 systems

Use the esub.p8aff external submission (esub) script to automatically generate the optimal job-level affinity requirements for job submissions on IBM POWER8 (ppc64le) systems.

To simplify job submission with affinity requirements for IBM POWER8 (ppc64le) systems, LSF includes the esub.p8aff external submission (esub) script to automatically generate affinity requirements based on the input requirements for the jobs. For the generated affinity requirements, LSF attempts to reduce the risks of CPU bottlenecks for the CPI allocation in MPI task and OpenMP thread levels.

Requirements

To submit affinity jobs on IBM POWER8 (ppc64le) systems, these hosts must be able to retrieve and change SMT configurations. Ensure that the following features are enabled in the LSF cluster:

  • Configure the initial SMT mode on all ppc64le execution hosts to the maximum available SMT number.
  • Install the ppc64_cpu command on all ppc64le execution hosts.
  • Enable at least one of the Linux cgroup features are enabled in LSF to support customized SMT. That is, ensure that least one of the following parameters are enabled in the lsf.conf file:
    • LSB_RESOURCE_ENFORCE
    • LSF_LINUX_CGROUP_ACCT
    • LSF_PROCESS_TRACKING

How to use the esub.p8aff script

Use the bsub -a command to run the esub.p8aff script:

bsub -a "p8aff (num_threads_per_task, SMT, cpus_per_core, distribution_policy)"

num_threads_per_task
The number of OpenMP threads per MPI task.

LSF uses this number to calculate the list of logical CPUs that are bound to each OpenMP thread. LSF sets the OMP_NUM_THREADS environment variable to this value for SMT jobs.

SMT
Optional. The required per-job SMT mode on the execution hosts.

Use this argument to specify the expected SMT mode that is enabled on the execution hosts. LSF sets the LSB_JOB_SMT_MODE environment variable to this value for SMT jobs.

LSF automatically adds the exclusive job option (-x) to the job submission to ensure that the execution hosts are not allocated for other jobs.

If you do not specify this argument, SMT mode is not used on the execution hosts.

cpus_per_core
Optional. The number of logical CPUs used per core for each MPI task.

LSF uses this argument to determine how many cores are spanned for each MPI task.

For example, if this argument is specified as 2 (two logical CPUs per core), LSF allocates the MPI task on three cores if the task requires a total of six threads.

distribution_policy
Optional. The required task distribution policy for the job.
This argument specifies the expected task distribution policy of the MPI tasks. Valid values are balance, pack, and any, which are the same as the corresponding values for the distribute keyword in the LSF affinity[] resource requirement string.

If you are not specifying the optional arguments, leave those arguments blank:

  • Use "-a p8aff(10,8,,pack)" if the cpus_per_core argument is not specified.
  • Use "-a p8aff(10,,,pack)" if the SMT and cpus_per_core arguments are not specified.
  • Use "-a p8aff(10,8,2)" or "-a p8aff(10,8,2,)" if the distribution_policy argument is not specified.
  • Use "-a p8aff(10,8)" or "-a p8aff(10,8,,)" if the cpus_per_core and distribution_policy arguments are not specified.

Submitting jobs without job-level affinity requirements

If the user specifies at least the first argument (num_threads_per_task) when submitting jobs without job-level affinity requirements (that is, without specifying -R "affinity[]"), the esub.p8aff script automatically generates the job-level affinity requirements for the job.

For example, if you specify the following arguments, the esub.p8aff script informs LSF that each MPI task in the job that uses 10 OpenMP threads and the execution hosts must be configured with 4 SMTs:

bsub -n 2 -a "p8aff(10,4,1,balance)" myjob

The esub.p8aff script generates the following job-level affinity requirement based on these arguments:

-R "affinity[thread(1, exclusive=(core,intask))*10:cpubind=thread:distribute=balance]"

LSF generates the affinity requirements to ensure that it allocates one logical CPU on each physical core for each MPI task. LSF attempts to distribute the job tasks equally across all processor units on the allocated hosts.

Submitting jobs with job-level affinity requirements

If you specify the num_threads_per_task or SMT arguments when submitting jobs with job-level affinity requirements (that is, by specifying -R "affinity[]"), esub.p8aff sets the OMP_NUM_THREADS or LSB_JOB_SMT_MODE environment variables for the job. LSF configures OpenMP thread affinity and the requested SMT mode on the execution hosts according to these environment variables.

Note: If you specify the num_threads_per_task, SMT, or both arguments while using job-level affinity requirements, esub.p8aff ignores the cpus_per_core and distribution_policy arguments.

For example, if you specify the following arguments while using the job-level affinity requirements, the esub.p8aff script informs LSF that each MPI task in the job that uses 10 OpenMP threads and the execution hosts must be configured with 4 SMTs:

bsub -n 2 -a "p8aff(10,4)" -R "affinity[thread(1, exclusive=(core,intask))*10:cpubind:distribute=balance]" myjob

This particular example shows t*+hat you specified your own affinity requirements without relying on esub.p8aff to automatically generate one for you.

Per-job SMT mode configurations

The LSF administrator must set the initial SMT mode to the maximum available SMT number on all execution hosts before LSF schedules jobs. LSF uses this number as the default SMT mode if there are no jobs running on the hosts. If the initial SMT mode is not the maximum allowed SMT number, LSF uses this mode as the default SMT mode on the hosts, which means that LSF assumes that this smaller value is the maximum number of SMT resources available for job scheduling.

LSF uses the ppc64_cpu command before starting tasks on each host to configure the SMT mode according to the LSB_JOB_SMT_MODE environment variable. After the job finishes, LSF sets the SMT mode on each host back to the default SMT number for that host.

Integration with OpenMP thread affinity

LSF supports OpenMP thread affinity in the blaunch distributed application framework. LSF MPI distributions must integrate with LSF to enable the OpenMP thread affinity.

When the OMP_NUM_THREADS environment variable is set for the job, LSF automatically sets the following environment variables that are related to the OpenMP thread affinity that is based on the allocated logical CPUs for the task:

  • OMP_PROC_BIND: LSF sets this environment variable to "TRUE" if the OMP_NUM_THREAD environment variable is available.
  • OMP_PLACES: LSF calculates the list of logical CPUs to bind each OpenMP thread to a logical CPU.

Each OpenMP thread is bound to an individual logical CPU to avoid the thread from switching overheads. LSF evenly binds each OpenMP thread to one logical CPU from the list of allocated logical CPUs for the current MPI task.

LSF applies the OpenMP thread affinity in the following manner:

  • If the number of allocated logical CPUs is equal to the number of OpenMP threads per task (that is, the OMP_NUM_THREADS environment variable value), each OpenMP thread is bound to a separate logical CPU.

    For example, if the logical CPUs that are allocated to an MPI task with two OpenMP threads are logical CPUs 0 and 8, the OMP_PLACES environment variable is set to "{0},{8}".

  • If the number of allocated logical CPUs is larger than the value of the OMP_NUM_THREADS environment variable, the list of logical CPUs is placed into multiple groups based on the number specified by the OMP_NUM_THREADS environment variable. The first logical CPU in each group is bound to each OpenMP thread. All remaining CPUs that do not divide evenly into the number of OpenMP threads are added to the last group.

    For example, if the logical CPUs that are allocated to an MPI task are 0, 1, 2, 3, and 4, and there are two OpenMP threads that are started by this task, there would be two CPU groups: {0,1} and {2,3,4}. Then, OMP_PLACES environment variable is set to "{0},{2}".

  • If the number of allocated logical CPUs is smaller than the value of the OMP_NUM_THREADS environment variable, all the threads are bound to all the logical CPUs.

    For example, if a job's OpenMP thread number is 5 and the affinity requirement is "thread(2)", LSF evenly binds the allocated CPUs to each OpenMP thread by using the round-robin manner. The OMP_PLACES environment variable is set to "{0},{1},{0},{1},{0}" if LSF allocates logical CPUs 0 and 1 to the task.