Monitoring job throughput with benchmark jobs

You can configure a benchmark job to measure job throughput in your cluster. A benchmark job is any job that you configure to submit to the LSF cluster at a specific interval. You can graphically view time values for the job such as communication times between RTM and LSF, how long it took to start a job, over time.

Configure a benchmark job to monitor job throughput

About this task

You can configure benchmark jobs to measure job throughput in your cluster. These jobs are submitted by RTM to LSF at the configured interval.
Important: In order for benchmark jobs to be submitted to LSF, RTM and LSF need to be in a uniform user namespace.

Procedure

  1. Select Console > Clusters > Benchmark Jobs and click the + button.
  2. Select values for your benchmark job and make sure that your benchmark job is enabled.
    Note:

    You can add users to the User Name drop-down list by editing the file RTM_TOP/cacti/plugins/benchmark/user-whitelist.txt. You must be logged in as root to edit this file.

    You can add commands to the Command for the Job Submission drop-down list by editing the file RTM_TOP/cacti/plugins/benchmark/cmd-whitelist.txt. You must be logged in as root to edit this file.

  3. Click Create.
  4. After your benchmark job has run, you can see its job information. Click the graph icon in the Actions column to view job information graphically.

Using the Benchmark Stats graph to analyze benchmark results

After your benchmark job has run, you can access a graphical view of the results by selecting Console > Clusters > Benchmark Jobs and clicking the Graph icon in the Actions column.

When the Benchmark Stats graph is displayed, click inside the graph to show daily, weekly, monthly, and yearly averages for the benchmark job.

The time values displayed for a benchmark job can indicate how fast your cluster is processing jobs that are sent by clients and can highlight issues in your cluster.

The following table illustrates the time values used for the calculations of durations that are displayed in the graph.

Time Duration Displayed on Graph Description Use the value to identify
Submit Time

The average time that it took for the job to be submitted from RTM to LSF.

The value is calculated as an average for all jobs in the time period:

LSF Submit Time - RTM Submit Time to LSF

The time stamps that are used for the calculation are:

  • LSF Submit Time: the date and time LSF recorded as the Submit Time.
  • RTM Submit Time to LSF: the date and time RTM submitted the job to LSF.
Communication duration from a client to LSF for job submission.
Seen Time

The average time that it took for the job from submission from RTM until the job was recognized as submitted by RTM.

The value is calculated as an average for all jobs in the time period:

RTM Seen Submit Time - RTM Submit Time to LSF

The time stamps that are used for the calculation are:

  • RTM Seen Submit Time: the date and time RTM identified the job was submitted in LSF.
  • RTM Submit Time to LSF: the date and time RTM submitted the job to LSF.
Full duration of communication from a client to LSF and from LSF to a client for job submission.
Start Time

The average time that it took for the job to start from the time it was submitted by RTM to the time it was recognized by RTM as started in LSF.

The value is calculated as an average for all jobs in the time period:

RTM Seen Start Time - RTM Submit Time to LSF

The time stamps that are used for the calculation are:

  • RTM Seen Start Time: the date and time RTM identified the job was started in LSF.
  • RTM Submit Time to LSF: the date and time RTM submitted the job to LSF.
How long the job took to start from the time it was submitted until the client identified it as started
Run Time

The average time that it took for a job to run to completion.

This value is the actual LSF run time:

LSF End Time - LSF Start Time

The time stamps that are used for the calculation are:

  • LSF End Time: the date and time the job ended in LSF.
  • LSF Start Time: the date and time the job started in LSF.
Actual time the job took to run
Done Time

The average time that it took for the job to finish from the moment it was submitted from RTM until it finished in LSF.

The value is calculated as an average for all jobs in the time period:

LSF End Time - RTM Submit Time to LSF

The time stamps that are used for the calculation are:

  • LSF End Time: the date and time the job ended in LSF.
  • RTM Submit Time to LSF: the date and time RTM submitted the job to LSF.
Duration of time from job submission from the client until the job finished in LSF
Seen Done Time

The average time that it took for the job to finish from the moment it was submitted from RTM until the moment it was identified by RTM as finished in LSF.

The value is calculated as an average for all jobs in the time period:

RTM Seen Done Time - RTM Submit Time to LSF

The time stamps that are used for the calculation are:

  • RTM Seen Done Time: the date and time RTM identified the job was finished in LSF.
  • RTM Submit Time to LSF: the date and time RTM submitted the job to LSF.
Full duration from job submission from the client until the job is recognized as finished by the client. Use Seen Done Time - Done Time to evaluate the communication duration between LSF and RTM after the job finished.

Viewing benchmark job results over a specific time period

About this task

View benchmark job results over a selected time period and filter according to cluster name, benchmark name, job status, and search across benchmark results.

Procedure

Select Cluster > Reports > Benchmark Results.

Viewing benchmark jobs that exceed thresholds

About this task

View benchmark jobs that exceed alert thresholds or that reach the maximum job run time in the Cluster Dashboard.

Procedure

  1. Select Cluster > Dashboards > Cluster and look at the section Benchmark Jobs Exceptions.
  2. Click the value in the Benchmark Name column to view more details about the benchmark job and the Benchmark Submission Stats graph.