jsrun command
Purpose
Syntax
jsrun[option] [command]
Description
The jsrun command is a tool within IBM® Job Step Manager (JSM) software package that is used for managing an allocation that is provided by an external resource manager. You can use this command to specify requirements for a subset of the resources that are allocated by a resource manager.
The jsrun command provides features for stdio management, signal propagation, and task termination cleanup. This command starts parallel tasks within the context of a PMIx server that allows processes to register as PMIx clients and access PMIx functions such as publishing of data and event notification, peer identification, and data exchange.
An allocation request must specify a number of CPUs. An allocation request can also specify GPU and memory resources, and can request information about how resources are selected from available sockets and nodes. A job step launch request must specify the number of tasks to start.
The following are terms that are used in JSM
- Job
A collection of resource that is allocated by the resource manager and any processes that are run within that allocation. - Step allocation
An allocation that is created by the jsrun command. - Task
A process started by the jsrun command. - Job step
A step allocation and a group of tasks that are created and started together. - Resource set
A collection of resources that are grouped into sets that contain the same number of CPUs, memory, and GPUs. All resources within a set are from the same host (by default also from the same socket).
Options
Requesting resource sets
You can use the following options for the jsrun command to determine resource sets
-
-n , --nrs #|ALL_HOSTS
Indicates the number of resource sets to allocate. A resource set is a grouping of allocated resources that can be assigned to one or more tasks. If you do not use this option, the number of resource sets that are created is equal to the value specified by thenp
option.To oversubscribe resources, specify a value for the
-np
option that is greater than the--nrs
option value. A warning message is shown when CPUs are oversubscribed. The warning message is useful to alert users when they have inadvertently oversubscribed some resource sets while other resource sets are underutilized. You can ignore the warning message when oversubscribing CPUs are intentionally oversubscribed in order to use multiple hardware threads. To turn off or hide this warning, users can set JSM_JSRUN_NO_WARN_OVERSUBSCRIBE=1 in their environment.A resource set does not contain resources from multiple hosts. If you specify
ALL_HOSTS
, the number of resource sets is defined as the number of available hosts in the allocation. -
-c , --cpu_per_rs #|ALL_CPUS
Specify the number of CPUs to assign to each resource set within an allocation. The default value is 1. If you specify ALL_CPUS, each resource set contains the number of CPUs that are available on each compute node. - -g , --gpu_per_rs #|ALL_GPUs
Specify the number of GPUs to assign to a resource set. The default value is 0. If you specify ALL_GPUs, each resource set contains the number of GPUs that are available on each compute node. -
-J , --use_reservation #
Specify an existing resource set allocation to be used for a job step. The preexisting resource allocation is called a reservation. You cannot use this option with any other resource set options (-n
,-c
,-g
,-l
,-m
, and-r
).You can run multiple job steps by using the same reservation either simultaneously or consecutively. When you run multiple job steps simultaneously within the same reservation, those job steps do not attempt to coordinate the use of the resources within the reservation. In this scenario, the tasks from different job steps can be bound to the same CPU cores within a reservation.
-
-l , --latency_priority LIST
Determines the priority of assigning resources. You can list each option one time in a comma-separated list. The resources that are selected for the allocation are used to optimize the first criteria in the list. The first criteria is used to satisfy the allocation. When multiple potential allocations can equally satisfy a criteria, the next criteria on the list is used to decide between those options. An attempt is not made to create a balanced allocation that might be optimal in each criteria but can perform well across all criteria. The criteria identifies if more optimal resources can be obtained by waiting for job steps that are currently running to complete. When you use this option, JSM waits until these jobs are complete before it schedules the jobs that you specified with this option.The following are the criteria that are used for the
-l
option.NoteEach of the following criteria can be specified with capital or lowercase letters. When criteria are capitalized, they must be met as optimally as possible even if it requires waiting for other job steps to complete. When the criteria are lowercase, the criteria is met as optimally as possible with the hardware currently available at the time. -
cpu-cpu
Indicates that CPUs are assigned to a resource set to minimize the latency between CPUs. This criteria attempts to assign CPUs that are physically close to one another. Using this criteria minimizes the number of sockets that are assigned to each resource set, and therefore, minimizes the time for cache line sharing, locks, and other CPU-to-CPU communication. - gpu-cpu
Indicates CPUs and GPUs are assigned to give the best performance for transferring data between the CPU registers and cache to the GPU. In many cases, GPU performance might be improved by selecting memory that is closest to the GPUs. A resource set might contain multiple CPUs and multiple GPUs. An ideal allocation does not require that all CPUs have good performance with all GPUs in the resource set. Rather, each CPU is required to have good performance with a number of GPUs determined by the ratio of CPUs to GPUs in the resource set. - gpu-gpu
Indicates that the GPUs selected are assigned to minimize the time that is required to transferring data between GPUs of the same resources set. - memory-memory
Indicates that memory is assigned to reduce the number of times memory to memory transfers occur between resource set using an RDMA engine. This criteria attempts to select blocks of memory that is contiguous and come from the same physical memory banks. This criteria prefers memory that is close together even when the CPUs available might not be physically close to the assigned memory. For example, given the choice between selecting 200MB from socket 0 or 100 MB from socket 1 and 100 MB from socket 2, selecting cpu-memory causes the 200 MB to be chosen even if the CPUs assigned to the resource set come from socket 0 and 1. - cpu-memory
Indicates that memory and CPUs are assigned to reduce the time to transfer data between memory and CPU registers. The memory is divided equally by the number of CPUs. A possible allocation is evaluated based on the distance of each CPU to the corresponding portion of the memory. - gpu-memory
Indicates that memory and GPUs are assigned to provide the best performance for transferring data between GPUs and memory. The memory that is assigned to each resource set is divided equally across the GPUs. A possible allocation is evaluated based on the distance of each GPU to the corresponding portion of the memory. - -m , --memory_per_rs # ALL_MEM | SHARE_ALL_MEM | AVAIL_MEM | SHARE_AVAIL_MEM | FLEXIBLE_MEM | DEFAULT_MEM>
Specify the number of megabytes of memory (1,048,756 bytes) to assign to a resource set. The default value (DEFAULT_MEM) is the megabytes of physical memory per CPU on the host that is multiplied by the number of CPUs in the resource set. If you specify ALL_MEM, each resource set contains the amount of memory available on each compute node. This option must be used when you want to ensure that JSM assigns only a single resource set per node and that resource set has access to all the memory on the node. The option SHARE_ALL_MEM only assigns nodes with no other job steps running on that node and it assigns the node's full amount of memory by dividing it equally among the newly created resource sets on that node. This option must be used if you want to make sure your job step has access to all the memory on each node but it is permissible for JSM to place multiple resource sets from this job step onto a single node. If you specify SHARE_AVAIL_MEM, all memory on the node that is not currently assigned to running job steps or reservations is divided equally across any newly created resource sets. You must use this option if you want your job step to use all remaining memory on a node, but it is permissible for JSM to assign your resource sets to node which are currently running other job steps. If you specify FLEXIBLE_MEM, then the newly created resource sets will have access to all unassigned memory on the node and any memory that is currently assigned to existing job steps which also specified the FLEXIBLE_MEM option. Note that this flexible memory pool is not divided across resource sets, but each resource set has access to the full amount of memory in the pool at the time the job-step is created. You must use this option if you want multiple job steps to share memory resources between them. - -r , --rs_per_host #
Indicates the number of resources that are distributed per host. The number of hosts that you specify must evenly divide the number of resource sets, otherwise an error is returned. For example, if you specify 3 resources per host and have 10 resource sets an error is returned. -
-K, --rs_per_socket #
Indicates the number of resource sets that are created per hardware socket. If the number of resource sets to be created is not divided evenly by the value specified, then one socket will contain fewer than the specified number of resource sets.For example, if you specify 10 resource sets and three resource sets per socket, JSM will allocate 3 resource sets on each of 3 sockets and a fourth socket will contain a single resource set.
When both the
rs_per_socket
and thers_per_host
options are specified, both criteria must be satisfied. Ifrs_per_socket
is specified without specifying the number of resource sets to create (using--nrs
or--rs_per_host
ornp
), then each socket in the allocation will be assigned the number of resource sets specified.If rs_per_socket is specified but r
s_per_host
is not, then JSM may allocate a different number of resource sets on each host as long as the number of resource sets per socket specification is satisfied. For example, requesting 4 resource sets using the commandjsrun --rs_per_socket 2 --nrs 4
, could be satisfied using a single dual-socket host or using a single socket on each of 2 different dual-socket hosts. - -x, --exclude_hosts \
Indicates that the JSM resource allocator should not select resources from the list of hosts specified when filling the job step resource request. Host names that are not part of the user's allocation are ignored. If domain names are included in the name specification, those names are shortened to ignore the domain name. When both-np
and--nrs
are unspecified, JSM calculates these values based on the number of hosts available. These calculations will account for the hosts being excluded with the-x
option.
Explicit specification of resource sets
JSM normal command line options allow the user to describe the number of CPU, GPU and memory to assign to resource sets. These normal command line options also allow JSM the freedom to
choose the exact resources to use to satisfy the user's request. While the selection of resources can be further influenced by options such as, --rs_per_host
, --rs_per_socket
, and --latency_priority
,
the final selection of resources is done at JSM's discrimination. If the user requires the use of specific resources, such as when consecutive job steps require the exact same resources, there are two file-based specifications supported by
JSM for requesting resources.
The first approach presented here is a legacy approach which might be deprecated in the future. This approach uses the option -U
to specify a file containing the specification of the resource sets to be used by an allocation. The resource
sets described in the file must describe a set of resource sets that could have been allocated by JSM using only non-explicit command line options. Specifically,each resource set must contain the same number of CPU's, GPU's and memory
and a CPU or GPU can only be assigned to a single resource set.
You can use the following options for the jsrun command when consecutive job steps require the exact same resources
- -S , --save_resources filename
Indicates that the resources that are used for the job step are stored in the file that you specify. The file that is created when you run this option can be passed to other job steps by using the--user_resources
option. -
-U , --use_resource filename
Specifies a resource file for the resource set that is explicitly called with the following formatRS {host # cpu list of CPU identifiers gpu list of GPU identifiers mem list of memory region identifiers}
The delimiters "host", "cpu", "gpu", and "mem" must have a space after the '' character.
Memory is specified as
numa_id-#megabytes
. Each line in the file contains a single resource set specification.In the following example, three resource sets are indicated. The first and last resource set are on host 1 of the allocation and the second resource set in host 2 of the allocation. Each resource set must have the same number of CPUs and GPUs and total amount of memory. In the following example, the memory for the third resource set is split across two NUMA domains
RS 0 {host 1 cpu 0 1 2 3 gpu 1 mem 0-4096 }
RS 1 {host 2 cpu 0 1 2 3 gpu 1 mem 0-4096 }
RS 2 {host 1 cpu 4 5 6 7 gpu 0 mem 0-3072 1-1024 }
-
-U , --use_resource filename
Specifies a resource file for the resource set that is explicitly called with the following formatRS {host # cpu list of CPU identifiers gpu list of GPU identifiers mem list of memory region identifiers}
The delimiters "host", "cpu", "gpu", and "mem" must have a space after the '' character.
Memory is specified as
numa_id-#megabytes
. Each line in the file contains a single resource set specification.In the following example, three resource sets are indicated. The first and last resource set are on host 1 of the allocation and the second resource set in host 2 of the allocation. Each resource set must have the same number of CPUs and GPUs and total amount of memory. In the following example, the memory for the third resource set is split across two NUMA domains
RS 0 {host 1 cpu 0 1 2 3 gpu 1 mem 0-4096 }
RS 1 {host 2 cpu 0 1 2 3 gpu 1 mem 0-4096 }
RS 2 {host 1 cpu 4 5 6 7 gpu 0 mem 0-3072 1-1024 }
Note: -S and -U options are deprecated and may be removed in a future release. You must use the ERF file options --erf_input
and --erf_output
(reference to erf_format man
page) which provides more features and you can easly achieve the functionality provided by the -S and -U options.
The other method for describing explicit resources is the --erf_input
and --erf_output
options to the jsrun command. These options also specify a file-based approach to describing the resources assigned
to a job step. However, the file can be used to specify irregular or heterogeneous resource sets and resource sets with overlapping resource. In addition, the file must specify the mapping of ranks to resource sets and the binding of ranks to
smt-threads within each resource set. The ERF file can also specify a mapping of ranks to applications. If a mapping of ranks to applications is not specified, then a single application is assumed and is taken from the jsrun command
line.
The following options cannot be used with --erf_input
--tasks_per_rs
, np
, --cpu_per_rs
, --gpu_per_rs
, --latency_priority
, --memory_per_rs
, --nrs
,
--rs_per_host
, --rs_per_socket
, --appfile
, --allocate_only
, --launch_node_task
, --use_reservation
, --use_resources
, and --bind
.
- --erf_input filename
Specifies an explicit resource file containing a description of the resources, ranks, applications, rank mapping and binding for the job step. - --erf_output filename
Indicates that the resources that are used for the job step are stored in the file that you specify along with the mapping of ranks to resource sets and the binding of ranks to smt-threads. The file that is created when you run this option can be passed to other job steps by using the--erf_input
option. - --erf_output_pidx filename Stores the resources and rank mappings as documented by
--erf_output
except that physical IDs are used for CPU descriptions. Using this option allows applications to repeatedly execute on exactly the same physical CPU's regardless of how cgroups may have changed the CPU identification for the original execution. Launching job steps
You can use the following options for the jsrun command to control the number of tasks to start and how to map those tasks to resource sets - -a , --tasks_per_rs #
Specifies the number of tasks to start per resource set. You cannot use this option with the-p
ornp
options. -
-A , --allocate_only LABEL
Creates a reservation with the JOBID that is the LABEL value. This option does not create processes. You cannot use this option with any options that are related to creating a task (such as-p
,-a
,-f
, or-d
). A reservation exits until you kill the reservation by running the jskill command or you terminate the JSM instance.The LABEL value can be a maximum of 5 characters and must begin with an uppercase or lowercase letter (a-z, A-Z). The LABEL value cannot already be in use by another job step within the JSM instance. The LABEL value is used instead of assigning a number as the job step identifier. When you list job with the jslist command, the LABEL value is displayed as the JOBID of the reservation.
- -d , --launch_distribution cyclic, packed, plane #
Specify how tasks are started on the available resource sets within the step allocation. The default value is cyclic. - Cyclic
Places consecutive tasks on consecutive resource set in a round-robin fashion. - Packed
Assigns task to the first resource set until each CPU in the resource set is assigned to a task, and then starts assigning tasks to the second resource set, third resource set, fourth resource set (and so on). -
Plane
Uses a cyclic method for assigning tasks but uses the tasks of the specified plane value. For example, the-d plane4
option assigns the first 4 ranks to the first resource set, the next 4 ranks to the second resource, and so on, wrapping back to the first resource set if necessary to assign all tasks.The
--launch_distribution
option has priority over the--bind
option. In the following example, all 4 ranks are assigned to the first resource set and all of the resources in the second resource set are unusedjsrun -launch_distribution packed --nrs 2 --cpu_per_rs 4 --bind packed2 --np 4
The following example uses the plane and cyclic options that have tasks that are assigned for each resource set
jsrun --launch_distribution plane2 --nrs 2 --cpu_per_rs 4 --bind packed2 --np 4
jsrun --launch_distribution cyclic --nrs 2 --cpu_per_rs 4 --bind packed2 --np 4
- -f, --appfile filename
Specify a file that lists one or more executables to start a MIMD (multiple instruction multiple data) application. Each executable that is listed in the file, is preceded by the number of tasks to create from that executable, and the reservation that is used to launch the tasks. The three fields are separated by "" delimiter. Tasks are ranked in the order they are specified in the file. The following is an example of the file that results in 8 ranks where ranks 0 - 2 use the a.out executable and run on a reservation withmstr
as the JOBID, and ranks 3 - 7 use the b.out and run on a reservation withwrkr
as the JOBID
3 mstr a.out
5 wrkr b.out
You can use the same reservation in multiple lines in the file. The number of tasks that are listed in the file coordinate their use of the resources for the specified reservation. For example, if you run the jsrun -A myres --nrs 4 --rs_per_host 1 command with the following file, a single task is created on each of the four nodes of myres
2 myres a.out
2 myres b.out
- -H , --launch_node_task #
Specify the rank number of a task that is launched on the launch node instead of a compute node. This task is not assigned a specific CPUs, GPUs, or memory, and the task is not bound to specific resources. Other tasks are scheduled in rank order. For example, if task x is specified for placement on the launch node, tasks x-1 - n-1 are distributed to resource sets and bound to CPUs as if they were tasks x - n-2 of the same command without-H
option. Also, in this example one less process is created. - -p , --np #
Specifies the number of tasks to start. By default, each task is assigned its own resource set that contains a single CPU.
Process management
You can use the following options for the jsrun command to manage processes that were created by the jsrun command
-
-e , --stdio_mode individual | collected | prepended You can choose one of the following modes for how stdio handles tasks
- individual: The output of each task is sent to a separate file. The name of the files is based on the value of
--stdio_stdout
and--stdio_stderr
options. - collected: The output is gathered and presented as either the stdio of the jsrun command or sent to the file names specified by the
--stdio_stdout
and--stdio_stderr
options. This mode is the default option. - prepended: The output is similar to the collected option, except that each task output is prepended with its corresponding task identifier.
- individual: The output of each task is sent to a separate file. The name of the files is based on the value of
- -I , --stdin_rank rank number
Changes the default value of the standard input (stdin) from a rank that is not zero. If you do not use this option, the default value for the rank is zero. The following example has a three task MPI program that is calledmpi_HelloWorld
and rank 1 reads thesea.txt
file,jsrun --np 3 -I 1 -t ./sea.txt mpi_HelloWorld
. -
-k , --stdio_stderr filename
Specifies the name of the file to use for standard error (stderr) connections. If you use collected mode, the file name is created by JSM and contains the output of the ranks as collected by JSM. If you use individual mode, the file name is created by each rank and the output of each rank is directed to the file created. When you use individual mode, the file name can contain the special symbols %j, %h, %p and %t as described in the--stdio_input
option.If you do not specify a special symbol within a file name, the jsrun command converts the file name that you specified to file,
<name\>.%h.%j.%t.%p
, forstdout
andstderr
files to avoid multiple tasks that write to the same file name. For example,-o myoutput --np 2
results in files such asmyoutput.ibmpower03.3.0.1385
andmyoutput.ibmpower04.3.1.22438
. If the file cannot be opened, the output is sent to/dev/null
. The use of the special symbols allows different tasks within a job to receive stdin from different file names. -
-o , --stdio_stdout filename
Specifies the name of the file to use for standard output (stdout) connection. If you use collected mode, the file name is created by JSM and contains the output of the ranks as collected by JSM. If you use individual mode, the file name is created by each rank and the output of each rank that is directly directed to the file created. When you use individual mode, the file name can contain the special symbols %j, %h, %p and %t as described in the--stdio_input
option.If you do not specify a special symbol within a file name, the jsrun command converts the file name that you specified to file
<name\>.%h.%j.%t.%p
forstdout
andstderr
files to avoid multiple tasks that write to the same file name. For example,-o myoutput --np 2
results in files such asmyoutput.ibmpower03.3.0.1385
andmyoutput.ibmpower04.3.1.22438
. If the file cannot be opened, the output is sent to /dev/null. The use of the special symbols allows different tasks within a job to receive stdin from different file names. Special symbols are not supported in collected mode. - -t , --stdio_input filename
Specifies the name of the files to use for the standard input connection. When you use individual mode, the file name can contain the %j, %h, %p, and %t special symbols. These special symbols expand to the job step identifier, host name, pid, and task number. To prevent a % symbol from being converted, add another % before the first % symbol. For example, %%p is converted to %p. The %p and %j special symbols are intended only for use with stdout and stderr files.
Environment propagation
You can use the following options for the jsrun command to set environment variables.
- IBM_PMIX_JSRUN_PORT
- PMIX_SERVER_URI
- PMIX_NAMESPACE
- PMIX_RANK
- PMIX_SECURITY_MODE
- HOSTNAME
- SHELL
- -E , --env var=value
Exports the specified environment variable to the started processes before the program is run. This option overrides any locally defined environment variable with the same name. Existing environment variables can be propagated or new variables can be exported with corresponding values. For example, jsrun -E OMP_NUM_THREADS=4 -E PROG_INPUT. - -F , --env_eval var=val
Evaluates the environment variable in the remote environment and sets the environment variable in the environment of the started processes. This option overwrites the existing var value in the current environment with the new value that you specify. Verify that you escape any special symbols in the val. For example, you must escape the $ symbol like the following example-F FOO=\\$BAR
.
System controls
You can use the following options for the jsrun command to control the larger software environment that run tasks: <!---
- --spawn_support
This option must be specified if the application being launched intends to create new job steps using PMIx API calls directly or indirectly such as through the MPI calls MPI_Comm_spawn and MPI_Comm_spawn_multiple. This options will cause JSM to perform additional communication and setup required by PMIx to support the PMIx_Spawn API. --> - -L , --use_spindle 0|1
If you specify a value > 0 that spindle is used (if available) to start jobs. If you specify 0, then the spindle is not used. The default value is 0. -
-M , --smpiargs quoted_arg_list
By default, the jsrun command sets the minimum number environment variables to support a Spectrum™ MPI program. If you do not want to set any environment variables for a Spectrum MPI program, you can run the jsrun --smpiargs off command. To add more Spectrum MPI command line arguments for a Spectrum MPI program, you can specify a quoted string argument to the--smpiargs
option. For example,jsrun --smpiargs
"-port -gpu" enables the Spectrum MPI interconnect display (-port) and GPU support for Spectrum MPI applications (-gpu). The additional command line arguments are processed and the environment is updated accordingly.NoteNot all of the Spectrum MPI command line arguments are supported by the--smpiargs
option because of the difference in the runtime environments between Spectrum MPI programs that are launched with ORTE (mpirun command) and Spectrum MPI programs that are launched with JSM (jsrun command). If an argument is not supported, a warning message might be displayed at the start of the job request. For example, the affinity option (-aff
) of Spectrum MPI are not supported by JSM because they are ORTE-specific. The affinity option must be specified by using the jsrun commands affinity options. The interconnection selection options (-TCP
,-intra
,-port
,-gpu
)are supported by JSM. The MCA parameters that are passed through the--smpiargs
argument list are propagated to the Spectrum MPI program. For example, jsrun --smpiargs "-mca foo bar". For future notice, jsrun --smpiargs "--showonly" will not work in PTF10. -
-P , --pre_post_exec scriptinfo
Indicates that the script information that is specified runs a prologue and epilogue script before and after the job step. The script that you specify must be run as root. This option is ignored in a non-CSM environment. - -X , --exit_on_error 0|1
If you specify a value > 0 all processes in the namespace are terminated if any process exits with nonzero error status. The default value is 1.
Miscellaneous options
The following are other options that you can use with the jsrun command:
-
-b , --bind none | rs | packed[:smt][:#] | strided[:smt]<#>:
Specifies the binding of tasks within a resource set. The default option ispacked
, which indicates that the tasks shall be divided evenly across sockets and Non-uniform memory access (NUMA) domains based on the number of CPUs available in each socket and NUMA. For example, if a resource set has 4 CPUs that span L3 cacheA and 1 CPU from L3 cacheB and 1 CPU from L3 cacheC, then 2/3 of the tasks are assigned to the CPUs from L3 cacheA. 1/6 are assigned to the CPU from L3 cacheB, and 1/6 to the cpu from L3 cacheC.The tasks that are assigned to each CPU are numbered consecutively. If an integer value is specified after the
--bind packed
option, then each task is assigned the specified number of CPUs (up to the maximum value of the number of CPUs in each resource set).The assigned CPUs are reflected in the OMP_PLACES environment variable. However, the task itself is bound, by default, to only the first assigned CPU because the default is
--bind packed1
. Adjusting the number that is specified in the--bind packed
option changes the number CPUs that are bound to the task, up to number of CPUs assigned to the resource set.You can use the
--bind packed[smt]
option to specify the number of simultaneous multithreading (SMT) slots, also known as hardware threads, that you want to use during the binding of tasks. With this option, you do not have to bind to a full core. For example, the--bind packedsmt2
binds each process to the first two hardware threads in the cores that are available for the resource set. The number of hardware threads that you specify can be greater than the SMT level of the system. In this scenario, each process is bound to a sequential set of hardware threads on multiple cores.If you want the task to be bound to the full set of assigned CPUs (matching the set of CPUs that is defined by OMP_PLACES), you can use the
--bind rs
option.If you do not want any binding, you can set the
--bind
option to none. If a one-to-one mapping of tasks to resources sets exists and the--bind
option is set to none, the OMP_PLACES environment variable is set to reflect the number of CPUs in the resource set.The strided designation works like the packed option except that after binding cores/smt to a task, the next core/smt to be assigned is the core/smt who's offset is equal to the first core/smt assigned to the last resource set plus the stride amount. The stride must be greater or equal to the number of cores/smts being assigned to each task.
-
-B , --bind_gpus none | packed[:<#>] | strided:<#>:
Specifies the binding of tasks to GPUs within a resource set. The default option isnone
, which indicates that all tasks in a resource set have access to all GPUs in the resource set. A setting ofstrided:x:y
will assignx
GPUs to each task starting at GPU(x * y) % n
wheren
is the total number of GPUs in the resource set. A binding ofpacked:n
is equivalent tostrided:n:n
.The assigned GPUs are reflected in the CUDA_VISIBLE_DEVICES environment variable setting for each task.
-
-h , --chdir path
Starts the process with the current working directory that is set to the specified path. - -i , --immediate
Request for the jsrun command to exit before the job step completes. This is the preferred method to start multiple job steps which might run concurrently. The exit value of the jsrun command will be 0 if the job step was successfully queued for execution. The exist value of the jsrun command will be -1 if the job step failed prior to being queued for resources. You can determine the exit value of the application by viewing the output of the jslist command or by the return value ofjswait <job-step id>
. By default, the standard output of the job step is redirected to,/dev/null
, when the job steps runs successfully. You can change this location by using the options--stdio_stdout
and--stdio_stderr
to specify the names of files to placestdout
andstderr
. Alternatively, you can also use the--stdio_mode individual
option. The--stdio_mode individual
option, however, can create a significant number of files at large scale. - -? , --help
Display help text for the command. - -V , --version
Displays the version of JSM.
Examples
- To run five tasks with each task on its own core, run the jsrun --np 5 a.out command.
- To assign one core and one GPU to each of the five tasks, run the jsrun --np 5 --nrs 5 --cpu_per_rs 1 --gpu_per_rs 1 a.out command.
- To run eight tasks on 16 cores and to prefer cores that are colocated on the same socket, run the jsrun --np 8 -nrs 1 --cpu_per_rs 16 --latency_priority cpu-cpu .a.out command.
- To run 128 ranks where every 8 ranks share a GPU, run the jsrun --np 128 --tasks_per_rs 8 --gpu_per_rs 1 --cpu_per_rs 8 ./a.out command.
- To run eight tasks on four nodes with each task having three GPUs and located as close as possible to the assigned GPU, run the jsrun --np 8 --nrs 8 --rs_per_host 2 --gpu_per_rs 3 --latency_priority cpu-gpu command.
See also
jskill(1), jslist(1), jswait(1)
Parent topic Job Step Manager commands