IBM Support

'GRID - Cluster/Host Load Average' graph

Question & Answer


Question

What does 'GRID - Cluster/Host Load Average' graph shows?

Answer


GRID - Cluster/Host Load Average

This graph shows measured LSF load averages as collected from the Platform LSF LIMs running on each monitored batch host. The graph reports three load averages (15 seconds, 1 minute and 15 minutes).

These Platform LSF load indices are actually reporting the average run-queue length over the measured period, so busy systems will show higher values.

Load indices collected by LIM:

IndexMeasuresUnitsDirectionAveraged overUpdate Interval
r15srun queue lengthprocessesIncreasing15 seconds15 seconds
r1mrun queue lengthprocessesIncreasing1 minute15 seconds
r15mrun queue lengthprocessesIncreasing15 minutes15 seconds

Because Platform RTM is again averaging these reported results, this graph is “an average of averages. There’s a graph for every LSF batch host on each tracked metric.

/usr/bin/rrdtool graph - \
--title='rbplsf913_Summary - GRID Load Average' \
DEF:a='/opt/IBM/cacti/rra/rbplsf913_summary_r1m_7.rrd':'r15s':AVERAGE \
DEF:b='/opt/IBM/cacti/rra/rbplsf913_summary_r1m_7.rrd':'r1m':AVERAGE \
DEF:c='/opt/IBM/cacti/rra/rbplsf913_summary_r1m_7.rrd':'r15m':AVERAGE \


The graph below (see attachment) is showing variations in load averages over time and the differences between the 15 seconds, one minute and 15 minutes averages. When interpreting this data we need to consider that Platform RTM is only sampling these metrics at the polling interval.

There is one setting in RTM, @ Console > Grid Settings > Poller > CPU Run Queue Length Load Indices Type: DEFAULT/EFFECTIVE/NORMALIZED.

For Effective, LSF scales the run queue value on multiprocessor systems to make the CPU load of uniprocessors and multiprocessors comparable (lsload -E). For Normalized, LSF also adjusts the CPU run queue based on the relative speeds of the processors (CPU Factor, lsload -N). The default is raw data (lsload -l).

[{"Product":{"code":"SSZT2D","label":"IBM Spectrum LSF RTM"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Graphs","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"Standard","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 June 2018

UID

isg3T1026279