Data specification file format

If you need to transfer a large number of files as a job data requirement, use a data specification file to provide a list of files that are required by the job.

Use the following rules to create a data specification file:
  • The first line of the file must be the string #@dataspec.
  • Blank lines and commented lines (beginning with #) preceding and following the #@dataspec string are ignored.
  • The host_name:file_path pair specifies the location of the required data file.
  • The host name must be a full host name. If no host name is specified, the submission host is used.
  • IP addresses are not supported in lieu of host names.
  • The file_path must be an absolute path (not a relative path) on the host.
  • The file_path can contain only alpha-numeric characters (A-Z, a-z, and 0-9) and the following special characters: period (.), underscore (_), and dash (-). Spaces and other special characters are not supported except when a wildcard is used. The path to the data specification file itself must also conform to this convention except for wildcard characters: all paths resolved due to a wildcard are interpreted as files to transmit.
  • Symbolic links (not as a result of resolving a wildcard character) are not permitted.
The following are rules for wildcard characters in the file paths that are used in the data specification file:
  • Ending the file with a slash character (/) transfers all of the files in the directory and all of its subdirectories.
  • Ending the file path with slash and an asterisk (/*), transfers all files in immediate directory without recursion into subdirectories.
  • The asterisk (*) wildcard is only permitted after a slash (/) at the end of the file path.

    When you use the asterisk character at the end of the path, the data requirements must be in quotation marks.

  • If the data requirement is accessible from the submission host, bsub checks to determine whether it is a directory. If it is a directory on a remote host, bsub rejects the job.
  • The %I wildcard resolves to an array index in a job array submission and can be used anywhere in the path. If the %I wildcard appears in the data specification file, but the job is not an array, it is interpreted as 0.