Viewing LSF resource connector job events

The JOB_FINISH2 LSF event contains details about LSF resource connector jobs. The LSF lsb.stream file can then capture and stream actions about the JOB_FINISH2 event. Starting in Fix Pack 14, to provide more details in the JOB_FINISH2 event logs, LSF includes the RC_ACCOUNT and VM_TYPE fields with the JOB_FINISH2 event.

Procedure

  1. Enable JOB_FINISH2 LSF events:
    1. Log on to the host as the primary LSF administrator.
    2. Edit lsb.params configuration file to these include these configurations:
      • ENABLE_EVENT_STREAM=Y
      • ALLOW_EVENT_TYPE=JOB_FINISH2
    3. Save your changes to the lsb.params file and reconfigure the cluster for your changes to take effect:
      badmin reconfig
  2. Starting in Fix Pack 14, to provide more details in the JOB_FINISH2 event logs, the RC_ACCOUNT field is included with the JOB_FINISH2 event. This field will reflect the RC_ACCOUNT value assigned to the job, and follow existing override policies if specified for the job, project, application, or queue level. (If not specified at any of these levels, then the RC_ACCOUNT value shows the RC_ACCOUNT field, with a value of default).
    1. The RC_ACCOUNT values specified for each of these levels are located in different configuration files; if required, set the value within the appropriate parameter and configuration file:
    2. If you made changes to any of the configuration files, save your changes to the files and reconfigure the cluster for your changes to take effect:
      badmin reconfig
    Tip: If you do not set these resource connector account parameters, or if the LSF resource connector is not enabled, the RC_ACCOUNT field cannot be retrieved, but the JOB_FINISH2 event still shows the RC_ACCOUNT field as a value of default.
  3. Starting in Fix Pack 14, to provide the JOB_FINISH2 event logs more detail, include the VM_TYPE field with the JOB_FINISH2 event. To show the VM_TYPE field:
    1. Edit the lsf.shared file:
      1. Define the vm_type resource in the lsf.shared file. For example, to define a String resource called vm_type, specify:
        Begin Resource
        RESOURCENAME  TYPE    INTERVAL INCREASING  DESCRIPTION                       # Keywords
        ...
        vm_type       String  ()       ()          (The type of VM)
        ...
        End Resource
      2. Reconfigure LIM and restart the mbatchd daemon for your lsf.shared file change to take effect:
        lsadmin reconfig
        badmin mbdrestart
    2. Edit the user_data.sh script file for each cloud provider:
      1. LIM on the LSF management host retrieves the VM_TYPE field. A resource connector host them sends this information to the management host by way of the LSF_TOP/10.1/resource_connector/provider/scripts/user_data.sh script vm_type setting in the LSF_LOCAL_RESOURCES parameter within the lsf.conf file.
        Enable the user_data.sh script of each cloud provider to modify the value of the LSF_LOCAL_RESOURCES parameter in the lsf.conf file, by adding the following lines to the file:
        if [ -n "${vm_type}" ]; then
          sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1 [resourcemap ${vm_type}*vm_type]\"/" $LSF_CONF_FILE
        fi
        Note that the method of getting this VM_TYPE field varies depending on the cloud provider. For example:
        Example IBM® Cloud user_data.sh script
        vm_type=$(dmidecode |grep Manufacturer|grep IBM| cut -d ':' -f 4)
        if [ -n "$vm_type" ]; then
            sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1 [resourcemap $vm_type*vm_type]\"/" $LSF_CONF_FILE
            echo "Update LSF_LOCAL_RESOURCES in $LSF_CONF_FILE successfully, add [resourcemap ${vm_type}*vm_type]" >> $logfile
        else
            echo "Can not get instance VM type" >> $logfile
        fi
        Example Amazon Web Services (AWS) user_data.sh script
        vm_type=$(curl http://169.254.169.254/latest/meta-data/instance-type)
        #Note that this is cloud provider specific
        if [ -n "$vm_type" ]; then
        sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1 [resourcemap $vm_type*vm_type]\"/" $LSF_CONF_FILE
        else
        echo "vm_type doesn't exist in envrionment variable" >> $logfile
        fi
        Example Google Cloud Platform user_data.sh script
        vm_type=$(dmidecode |grep Manufacturer|grep IBM| cut -d ':' -f 4)
        if [ -n "$vm_type" ]; then
            sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1 [resourcemap $vm_type*vm_type]\"/" $LSF_CONF_FILE
            echo "Update LSF_LOCAL_RESOURCES in $LSF_CONF_FILE successfully, add [resourcemap ${vm_type}*vm_type]" >> $logfile
        else
            echo "Can not get instance VM type" >> $logfile
        fi
        Example Microsoft Azure CycleCloud user_data.sh script
        vm_type=$(curl -H Metadata:true "http://169.254.169.254/metadata/instance/compute/vmSize?api-version=2018-10-01&format=text")
        #Note that this is cloud provider specific
        if [ -n "$vm_type" ]; then
        sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1 [resourcemap $vm_type*vm_type]\"/" $LSF_CONF_FILE
        else
        echo "vm_type doesn't exist in envrionment variable" >> $logfile
        fi
        Example Microsoft Azure user_data.sh script
        vm_type=$(curl -H Metadata:true "http://169.254.169.254/metadata/instance/compute/vmSize?api-version=2018-10-01&format=text")
        #Note that this is cloud provider specific
        if [ -n "$vm_type" ]; then
        sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1 [resourcemap $vm_type*vm_type]\"/" $LSF_CONF_FILE
        else
        echo "vm_type doesn't exist in envrionment variable" >> $logfile
        fi

Results

The following JOB_FINISH2 event log shows the addition of the rc_account and vm_type fields:
"JOB_FINISH2" "10.11" 1666989002 2321 45 "userId" "0" "userName" "root" "numProcessors" "1" 
"options" "33816578" "jStatus" "64" "submitTime" "1666988304" "termTime" "0" "startTime" "1666988401" 
"endTime" "1666989002" "queue" "normal" "resReq" "profile==cx2_4x8" "fromHost" "rhel7x-mgmt-1" "cwd" 
"/share/10.1/lsf_rc" "jobFile" "1666988304.2321" "numExHosts" "1" "execHosts" "icgen2host-10-240-0-10" 
"slotUsages" "1" "cpuTime" "0.441864" "command" "sleep 600" "ru_utime" "0.229670" "ru_stime" "0.212194" 
"ru_maxrss" "3072" "ru_nswap" "0" "projectName" "default" "exitStatus" "0" "maxNumProcessors" "1" 
"exitInfo" "0" "chargedSAAP" "/root" "numhRusages" "0" "runtime" "601" "maxMem" "3072" "avgMem" "2048" 
"effectiveResReq" "select[(profile == cx2_4x8 ) && (type == any)] order[r15s:pg] " "subcwd" 
"/share/10.1/lsf_rc" "serial_job_energy" "0.000000" "numAllocSlots" "1" "allocSlots" 
"icgen2host-10-240-0-10" "ineligiblePendingTime" "-1" "options2" "1040" "hostFactor" "12.500000" 
"cpuPeak" "0.000000" "cpuEfficiency" "0.000000" "memEfficiency" "0.000000" 
"rc_account" "default" "vm_type" "cx2-4x8"