Manage I/O performance of the info directory

In large clusters, the large numbers of jobs results in a large number of job files stored in the LSF_SHAREDIR/cluster_name/logdir/info directory at any time. When the total size of the job files reaches a certain point, you will notice a significant delay when performing I/O operations in the info directory due to file server directory limits dependent on the file system implementation.

About this task

By dividing the total file size of the info directory among subdirectories, your cluster can process more job operations before reaching the total size limit of the job files.

Note: Job script files for jobs that are stored in the jobinfo cache are not stored in the info directory, but are stored in lsb.jobinfo.events file.

Procedure

  1. Use MAX_INFO_DIRS in lsb.params to create subdirectories and enable mbatchd to distribute the job files evenly throughout the subdirectories.
    MAX_INFO_DIRS=num_subdirs
    

    Where num_subdirs specifies the number of subdirectories that you want to create under the LSF_SHAREDIR/cluster_name/logdir/info directory. Valid values are positive integers between 1 and 1024. By default, MAX_INFO_DIRS is not defined.

  2. Run badmin reconfig to create and use the subdirectories.
    Note: If you enabled duplicate event logging, you must run badmin mbdrestart instead of badmin reconfig to restart mbatchd.
  3. Run bparams -l to display the value of the MAX_INFO_DIRS parameter.

Example

MAX_INFO_DIRS=10

mbatchd creates ten subdirectories from LSB_SHAREDIR/cluster_name/logdir/info/0 to LSB_SHAREDIR/cluster_name/logdir/info/9.

Configure a job information directory

Job file I/O operations may impact cluster performance when there are millions of jobs in a LSF cluster. You can configure LSB_JOBINFO_DIR on high performance I/O file systems to improve cluster performance. This is separate from the LSB_SHAREDIR directory in lsf.conf. LSF will access the directory to get the job information files. If the directory does not exist, mbatchd will try to create it. If that fails, mbatchd exits.

The LSB_JOBINFO_DIR directory must be:

  • Owned by the primary LSF administrator
  • Accessible from all hosts that can potentially become the management host
  • Accessible from the management host with read and write permission
  • Set for 700 permission
Note: Using the LSB_JOBINFO_DIR parameter will require draining the whole cluster.