Specifying data requirements for your job

Use the bsub -data command to specify files or folders to copy into the staging area before LSF schedules the job for execution.

About this task

Specify each required data file or folder by defining the source host and the full path to the required file or folder by using the following format:
  • For files: "[host_name:]/absolute_file_path"
  • For folders: "[host_name:]/absolute_folder_path/[*]"

Use this format to specify single or multiple data files or folders with the bsub -data command.

When you use the asterisk character at the end of the path, the data file requirements must be in quotation marks.

Note:
  • By default, if you do not specify a host name in the data file requirement, LSF uses the submission host name.
  • If the requested data files are on a different host than the submission host, the submission user must have passwordless ssh configured to the specified host. Configuring passwordless ssh allows LSF to collect data file information on the file to determine whether the file is already in the cache.
  • The source file names can contain only alphanumeric characters, dot (.), underscore (_), and hyphen (-). The file names cannot contain spaces.
  • Path names to files and folders can contain the colon character (:).
  • If you request a folder that contains symbolic links, the symbolic links are respected. Symbolic links within the folder are examined to determine if the links are valid. If a broken symbolic link is detected, the submission is rejected.
  • If you request a folder, you must have access to the folder and its contents. You must have read and execute permission on folders, and read permission on regular files. If you don't have access to the top-level folder and the folder contents, the submission is rejected.

Each data file requested results in a single transfer job that is submitted to LSF by the LSF data manager, unless the file already exists within the LSF data manager cache.

Specifying a folder as a data requirement generates a single transfer job, not separate transfer jobs for each file in the folder.

Procedure

  • Specify a single data file requirement by defining a single source host and file path in the bsub -data command.
    bsub -data "hostA:/data/file1.dat" myjob
  • Specify multiple data file requirements by using multiple -data options.
    bsub –data "hostA:/data/file1.dat" –data "hostA:/data/file2.dat" myjob
  • Specify multiple data file requirements within a single -data option by defining a space-separated list of source hosts and file paths.
    bsub –data "hostA:/data/file1.dat hostA:/data/file2.dat" myjob
  • Specify an entire directory recursively by defining the directory as the file path in the -data option.
    bsub –data "hostA:/data/" myjob

    This command requests all files in the folder data, and recursively requests all files in all subfolders, as the required data files.

  • Specify the immediate contents of a directory (but not recursively) by defining the directory with the asterisk character (*) as the file path in the -data option.
    bsub –data "hostA:/data/*" myjob

    This command requests all files at the top level of the folder data as the required data files, but does not recursively define any subfolders.

  • Specify a set of data files that are indexed by job array element by using the %I special character in the file name in the -data option and by defining the job array with the -J option.
    bsub –data "hostA:/data/file%I.dat" –J "MyJobArray[1-10]" myjob

    The %I special character is replaced with each of the array indexes, and each file is considered as a data file requirement to each of the job array elements. Therefore, /data/file1.dat is a data file requirement for MyJobArray[1], /data/file2.dat is a data file requirement for MyJobArray[2], and the remaining job array elements have similar data file requirements.