Creating external scripts to monitor applications

Configure the application profile to enable the watchdog feature and specify the LSF Application Center Notifications server to receive notifications.

Before you begin

To ensure that the watchdog scripts can send notifications to the LSF Application Center Notifications server, define the LSF_AC_PNC_URL parameter in the lsf.conf file.

Procedure

  1. Create a watchdog script to monitor the application (by checking application data, logs, and other information) and send notification messages.

    In the script, use the bpost -N command option to send a notification (with the message in the -d option and the specified error level) to the LSF Application Center Notifications server:

    bpost -d "message" -N WARNING | ERROR | CRITICAL | INFO

    All job environment variables are available to the watchdog scripts. In addition, the following LSF job-level resource consumption environment variables are available to the watchdog scripts:

    • LSB_GPU_ALLOC_INFO
    • LSB_JOB_AVG_MEM
    • LSB_JOB_CPU_TIME
    • LSB_JOB_MAX_MEM
    • LSB_JOB_MEM
    • LSB_JOB_NTHREAD
    • LSB_JOB_PGIDS
    • LSB_JOB_PIDS
    • LSB_JOB_RUN_TIME
    • LSB_JOB_SWAP

    The watchdog script might have the following format:

    #!/bin/sh
    source <lsf_conf_dir>/profile.lsf
    <application_checking_commands>
    if <okay> then
      exit 0
    else
      if <warning_level> then
        bpost -N WARNING -d "WARNING: <warning_message>"
        exit 0
      else
        bpost -N CRITICAL -d "FATAL: <critical_message>"
        exit 1
      end if
    end if
    Note: You must add a command to source the LSF environment at the beginning of the watchdog script.
  2. Set the proper permissions for the script to ensure that the job submission user is able to execute the script.