Control hosts

About this task

Hosts are opened and closed by:

Procedure

  • an LSF Administrator or root issuing a command
  • configured dispatch windows

Close a host

Procedure

Run badmin hclose:
badmin hclose hostB
Close <hostB> ...... done

If the command fails, it might be because the host is unreachable through network problems, or because the daemons on the host are not running.

Open a host

Procedure

Run badmin hopen:
badmin hopen hostB
Open <hostB> ...... done

Configure dispatch windows

About this task

A dispatch window specifies one or more time periods during which a host receive new jobs. The host does not receive jobs outside of the configured windows. Dispatch windows do not affect job submission and running jobs (they are allowed to run until completion). By default, dispatch windows are not configured.

To configure dispatch windows:

Procedure

  1. Edit lsb.hosts.
  2. Specify one or more time windows in the DISPATCH_WINDOW column:
    Begin Host
    HOST_NAME     r1m      pg    ls     tmp    DISPATCH_WINDOW
    ...
    hostB         3.5/4.5  15/   12/15  0      (4:30-12:00)
    ...
    End Host
    
  3. Reconfigure the cluster:
    1. Run lsadmin reconfig to reconfigure LIM.
    2. Run badmin reconfig to reconfigure mbatchd.
  4. Run bhosts -l to display the dispatch windows.

Log a comment when closing or opening a host

Procedure

  1. Use the -C option of badmin hclose and badmin hopen to log an administrator comment in lsb.events:
    badmin hclose -C "Weekly backup" hostB

    The comment text Weekly backup is recorded in lsb.events. If you close or open a host group, each host group member displays with the same comment string.

    A new event record is recorded for each host open or host close event. For example:

    badmin hclose -C "backup" hostA

    followed by

    badmin hclose -C "Weekly backup" hostA
    

    generates the following records in lsb.events:

    "HOST_CTRL" "7.0 1050082346 1 "hostA" 32185 "lsfadmin" "backup"
    "HOST_CTRL" "7.0 1050082373 1 "hostA" 32185 "lsfadmin" "Weekly backup"
  2. Use badmin hist or badmin hhist to display administrator comments for closing and opening hosts:
    badmin hhist
    Fri Apr  4 10:35:31: Host <hostB> closed by administrator
    <lsfadmin> Weekly backup.

    bhosts -l also displays the comment text:

    bhosts -l
     
    HOST  hostA
    STATUS     CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
    closed_Adm 1.00     -      -      0      0      0      0      0      -
     
     CURRENT LOAD USED FOR SCHEDULING:
                  r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem   slots
     Total         0.0   0.0   0.0    2%   0.0    64    2    11 7117M  512M  432M       8
     Reserved      0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M       8
     
     LOAD THRESHOLD USED FOR SCHEDULING:
               r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
     loadSched   -     -     -     -       -     -    -     -     -      -      -
     loadStop    -     -     -     -       -     -    -     -     -      -      -
     
                    cpuspeed    bandwidth
     loadSched          -            -
     loadStop           -            -
     
     THRESHOLD AND LOAD USED FOR EXCEPTIONS:
                JOB_EXIT_RATE
     Threshold    2.00
     Load         0.00
     ADMIN ACTION COMMENT: "Weekly backup"

Use lock IDs to specify multiple reasons for closing a host

About this task

Different users can close a host for multiple reasons by specifying a different lock ID for each reason. For example, userA might be updating an application while userB is configuring the operating system. The host remains closed until both users complete their updates and open the host using their specific lock IDs.

Procedure

  1. Use the -i option of badmin hclose when closing a host to specify a lock ID to attach to the closed host. Optionally, use the -C option to attach a comment to the lock ID that explains the closing reason in more detail.

    badmin hclose -i lock_id [-C comment]

    If the host is already closed, this command stacks the new lock ID with any existing lock IDs on the closed host to ensure that the host remains closed if at least one lock ID is still attached to the host.

    Each lock ID is a string that can contain up to 128 alphanumeric and underscore (_) characters. The keyword all is reserved and cannot be used as the lock ID.

    userA closes the host to update application1:
    badmin hclose -i "lock_update_app1" -C "Updating application1"
    userB closes the host to configure the operating system:
    badmin hclose -i "lock_config_os" -C "Configuring OS"
  2. Use the bhosts -l command option to view all lock IDs and comments in tabular format, if there are any lock IDs that are attached to the host.
    ...
    ADMIN ACTION COMMENTS:
    LockId           EventTime                Admin        Messsage
    lock_update_app1 Mon Dec  2 19:41:44      userA        Updating application1
    lock_config_os   Mon Dec  2 19:51:03      userB        Configuring OS
    ...
  3. Use the -i option of badmin hopen to remove the specified lock ID from the closed host. Optionally, use the -C option to add comments.

    badmin hopen -i "lock_id ... | all" [-C comment]

    Specify a space-separated list of lock IDs to remove multiple lock IDs, or use the all keyword to remove all lock IDs from the closed host. If there are no more lock IDs attached to the host, this command also opens the host.

    userB finished configuring the operating system and removes the lock_config_os lock ID:
    badmin hopen -i "lock_config_os" -C "Finished OS configuration"
    Since userA is still updating application1 and the lock ID is still attached to this host, the host remains closed.
    userA finished updating application1 and removes the lock_update_app1 lock ID:
    badmin hopen -i "lock_update_app1" -C "Finished updating application1"
    There are no more lock IDs attached to the host, so this command also opens the host.

How events are displayed and recorded in the lease model of the LSF multicluster capability

In the resource lease model of the LSF multicluster capability, host control administrator comments are recorded only in the lsb.events file on the local cluster. badmin hist and badmin hhist display only events that are recorded locally. Host control messages are not passed between clusters in the lease model. For example. if you close an exported host in both the consumer and the provider cluster, the host close events are recorded separately in their local lsb.events.