Test your LSF installation

Before you make LSF available to users, make sure LSF is installed and operating correctly. You should:

  • Check the cluster configuration
  • Start the LSF daemons (LSF services)
  • Verify that your new cluster is operating correctly

If you have a mixed UNIX and Windows cluster, make sure you can perform operations from both UNIX and Windows hosts.

Check the cluster

Before you begin

Before using any LSF commands, wait a few minutes for LSF services to start.

Procedure

  1. Log on to any host in the cluster.
  2. Check the configuration files.

    C:\LSF_10.1.0> lsadmin ckconfig -v

    Typical output is as follows:

    C:\LSF_10.1.0>lsadmin ckconfig -v
    Checking configuration files ...
    EGO 3.4.0 build 497310, Aug 15 2018
    Copyright International Business Machines Corp. 1992, 2016.
    US Government Users Restricted Rights - Use, duplication or disclosure restricted
    by GSA ADP Schedule Contract with IBM Corp.
    
    binary type: win-x64
    fixes: P101982
    Reading configuration from C:\LSF_10.1.0\conf\ego\cluster1\kernel/ego.conf
    Sep 10 20:40:57 2018 3740:5008 5 3.4.0 EGO 3.4.0 build 497310, Aug 15 2018
    Copyright International Business Machines Corp. 1992, 2016.
    US Government Users Restricted Rights - Use, duplication or disclosure restricted 
    by GSA ADP Schedule Contract with IBM Corp.
    
    binary type: win-x64
    fixes: P101982
    
    Sep 10 20:40:57 2018 3740:5008 6 3.4.0 Lim starting...
    Sep 10 20:40:57 2018 3740:5008 6 3.4.0 LIM is running in advanced workload execution mode.
    Sep 10 20:40:57 2018 3740:5008 6 3.4.0 Master LIM is not running in EGO_DISABLE_UNRESOLVABLE_HOST mode.
    Sep 10 20:40:57 2018 3740:5008 5 3.4.0 C:\LSF_10.1\10.1\etc/lim.exe -C
    Sep 10 20:40:57 2018 3740:5008 6 3.4.0 LIM is running as IBM Spectrum LSF Standard Edition.
    Sep 10 20:40:57 2018 3740:5008 6 3.4.0 LIM is running as EGO Edition.
    Sep 10 20:40:57 2018 3740:5008 6 3.4.0 LIM is running as IBM Spectrum Conductor Edition.
    Sep 10 20:40:57 2018 3740:5008 6 3.4.0 reCheckClass: numhosts 2 so reset exchIntvl to 15.00
    Sep 10 20:40:57 2018 3740:5008 6 3.4.0 Checking Done.
    ---------------------------------------------------------
    No errors found.
    
  3. Start the LSF cluster.
    1. If you have a Windows-only cluster, start the LSF cluster:
      C:\lsf\10.1.0\bin> lsfstartup
      

      This command starts the LSF services, LIM, RES, and SBD on all LSF Windows hosts. It could take up to 20 seconds.

    2. If you have a mixed UNIX-Windows cluster, you need to log on to a UNIX host and start the UNIX daemons with lsfstartup, and then log on to a Windows host and use lsfstartup from a Windows host to start LSF services on all Windows hosts.
  4. Display the cluster name and management host name:

    lsid

Check the Load Information Manager (LIM)

Procedure

  1. Display cluster configuration information about resources, host types, and host models:

    lsinfo

    The information displayed by lsinfo is configured in LSF_CONFDIR\lsf.shared.

  2. Display configuration information and status of LSF hosts:

    lshosts

    The output contains one line for each host in the cluster. Type, model, and resource information is configured in the LSF_CONFDIR\lsf.cluster.cluster_name file. The cpuf matches the CPU factor given for the host model in LSF_CONFDIR\lsf.shared.

  3. Display the current load levels of the cluster:

    lsload

    The output contains one line for each host in the cluster. The status should be ok for all hosts in your cluster.

Check the Remote Execution Server (RES)

Before you begin

You must use your user password using lspasswd.

Procedure

  1. Run a command on one LSF host, using the RES:

    lsrun -v -m hostA hostname

  2. Run a command on a group of hosts, using the RES:

    lsgrun -v -m "hostA hostB hostC" hostname

  3. Check for OK status on cross-cluster configuration information:

    lsclusters -l

LSF on EGO

LSF on EGO allows EGO to serve as the central resource broker, enabling enterprise applications to benefit from sharing of resources across the enterprise grid.

How to handle parameters in lsf.conf with corresponding parameters in ego.conf

When EGO is enabled, existing LSF parameters (parameter names beginning with LSB_ or LSF_) that are set only in lsf.conf operate as usual because LSF daemons and commands read both lsf.conf and ego.conf.

Some existing LSF parameters have corresponding EGO parameter names in ego.conf (LSF_CONFDIR\lsf.conf is a separate file from LSF_CONFDIR\ego\cluster_name\kernel\ego.conf). You can keep your existing LSF parameters in lsf.conf, or your can set the corresponding EGO parameters in ego.conf that have not already been set in lsf.conf.

You cannot set LSF parameters in ego.conf, but you can set the following EGO parameters related to LIM, PIM, and ELIM in either lsf.conf or ego.conf:
  • EGO_DAEMONS_CPUS
  • EGO_DEFINE_NCPUS
  • EGO_SLAVE_CTRL_REMOTE_HOST
  • EGO_WORKDIR
  • EGO_PIM_SWAP_REPORT

You cannot set any other EGO parameters (parameter names beginning with EGO_) in lsf.conf. If EGO is not enabled, you can only set these parameters in lsf.conf.

Note:

If you specify a parameter in lsf.conf and you also specify the corresponding parameter in ego.conf, the parameter value in ego.conf takes precedence over the conflicting parameter in lsf.conf.

If the parameter is not set in either lsf.conf or ego.conf, the default takes effect depending on whether EGO is enabled. If EGO is not enabled, then the LSF default takes effect. If EGO is enabled, the EGO default takes effect. In most cases, the default is the same.

Some parameters in lsf.conf do not have exactly the same behavior, valid values, syntax, or default value as the corresponding parameter in ego.conf, so in general, you should not set them in both files. If you need LSF parameters for backwards compatibility, you should set them only in lsf.conf.

If you have LSF 6.2 hosts in your cluster, they can only read lsf.conf, so you must set LSF parameters only in lsf.conf.

LSF and EGO corresponding parameters

The following table summarizes existing LSF parameters that have corresponding EGO parameter names. You must continue to set other LSF parameters in lsf.conf.

lsf.conf parameter

ego.conf parameter

LSF_API_CONNTIMEOUT

EGO_LIM_CONNTIMEOUT

LSF_API_RECVTIMEOUT

EGO_LIM_RECVTIMEOUT

LSF_CLUSTER_ID (Windows)

EGO_CLUSTER_ID (Windows)

LSF_CONF_RETRY_INT

EGO_CONF_RETRY_INT

LSF_CONF_RETRY_MAX

EGO_CONF_RETRY_MAX

LSF_DEBUG_LIM

EGO_DEBUG_LIM

LSF_DHPC_ENV

EGO_DHPC_ENV

LSF_DYNAMIC_HOST_TIMEOUT

EGO_DYNAMIC_HOST_TIMEOUT

LSF_DYNAMIC_HOST_WAIT_TIME

EGO_DYNAMIC_HOST_WAIT_TIME

LSF_ENABLE_DUALCORE

EGO_ENABLE_DUALCORE

LSF_GET_CONF

EGO_GET_CONF

LSF_GETCONF_MAX

EGO_GETCONF_MAX

LSF_LIM_DEBUG

EGO_LIM_DEBUG

LSF_LIM_PORT

EGO_LIM_PORT

LSF_LOCAL_RESOURCES

EGO_LOCAL_RESOURCES

LSF_LOG_MASK

EGO_LOG_MASK

LSF_MASTER_LIST

EGO_MASTER_LIST

LSF_PIM_INFODIR

EGO_PIM_INFODIR

LSF_PIM_SLEEPTIME

EGO_PIM_SLEEPTIME


Parameters that have changed in LSF

The default for LSF_LIM_PORT has changed to accommodate EGO default port configuration. On EGO, default ports start with lim at 7869, and are numbered consecutively for pem, vemkd, and egosc.

This is different from previous LSF releases where the default LSF_LIM_PORT was 6879. res, sbatchd, and mbatchd continue to use the default pre-version 7 ports 6878, 6881, and 6882.

Upgrade installation preserves existing port settings for lim, res, sbatchd, and mbatchd. EGO pem, vemkd, and egosc use default EGO ports starting at 7870, if they do not conflict with existing lim, res, sbatchd, and mbatchd ports.

EGO connection ports and base port

On every host, a set of connection ports must be free for use by LSF and EGO components.

LSF and EGO require exclusive use of certain ports for communication. EGO uses the same four consecutive ports on every host in the cluster. The first of these is called the base port.

The default EGO base connection port is 7869. By default, EGO uses four consecutive ports starting from the base port. By default, EGO uses ports 7869-7872.

The ports can be customized by customizing the base port. For example, if the base port is 6880, EGO uses ports 6880-6883.

LSF and EGO needs the same ports on every host, so you must specify the same base port on every host.

Check LSF

Before you begin

The LIM and mbatchd must be running on the management host and on the submission host (the host from which you run the command).

Procedure

  1. Verify the LSF daemon configuration:

    C:\LSF_10.1.0>badmin ckconfig -v

    The following message appears: No errors found.

  2. Run some basic commands and check the status: OK (hosts) and Open:Active (queues):

    bhosts

    bqueues

  3. Display the default queue:

    C:\lsf\bin>bparams

  4. Submit a test job to the default queue named normal:

    C:\lsf\10.1.0\bin> bsub sleep 60

    Job <1> is submitted to default queue <normal>.

    Note that the LSF installer for Windows sets "Log on as batch job" rights on Windows execution hosts as a basic requirement to run jobs.

  5. Display the job status:

    C:\lsf\10.1.0\bin> bjobs

    If all hosts are busy, the job is not started immediately and the STAT column says PEND. The job sleep 60 should take one minute to run. When the job completes, LSF sends mail reporting the job completion.