Job Info section
The Job Info section is in the Grid menu bar.
Host Job Statistics Viewer
- Host Name. The name of the host. Click a host name to show running jobs for this host (on the page).
- Cluster. The LSF cluster to which this host belongs.
- Type. The type of host, as defined in the LSF configuration.
- Model. The model of the host, as defined in the LSF configuration.
- Load/Batch. The current Load and Batch status of the host.
If you choose Out of Service option on the Status field, jobs with RES-Down status are seen on this column. If you choose In Service, jobs with RES-Down status are not displayed here. You must set how the Down Service Satus is treated (through
) for the corresponding option to be available in the Status filter. - CPU Fact. The processor factor of the host, as defined in the LSF configuration.
- CPU Pct. The current processor utilization on the host.
- RunQ 1m. The exponentially averaged effective processor run queue length for this host over the last minute.
- Mem Usage. The percentage of memory usage of all jobs that are running on this host as a percentage of total memory.
- Page Usage. The percentage of page usage of all jobs that are running on this host as a percentage of total page size.
- Page Rate. The memory page scan rate of the host.
- Max Slots. The maximum number of job slots that can be allocated to this host.
- Num Slots. The number of jobs slots that are used by jobs that are dispatched to this host.
- Run Slots. The number of job slots that are used by jobs that are running on this host.
- SSUSP Slots. The number of job slots that are used by system-suspended jobs on the host.
- USUSP Slots. The number of job slots that are used by user-suspended jobs on the host.
- Reserve Slots. The number of jobs slots that are used by pending jobs that have job slots that are reserved within the host.
If graphs are created for this host, a graph icon is displayed to the left of the host name. Click the icon to view graphs for the host.
Host Group Job Statistics
Go to the Host Group Job Statistics Viewer by clicking By Host Group under the Job Info section of the Grid menu bar. In many respects, the information that is shown is similar to that obtained through the LSF bhosts command with condensed host groups.
The Status filter is populated with all unique Load and Batch statuses that are currently experienced by hosts in any cluster.
- Host Group. The name of the LSF host group. Click a host group name to show running jobs for this group (on the page).
- Cluster. The LSF cluster to which this host group belongs.
- Load/Batch. The current Load and Batch status for the host group. If no Status filter is set,
this field shows N/A. Otherwise, it shows the current value that is selected for the Status
filter.
If you choose Out of Service option on the Status field, jobs with RES-Down status are displayed. If you choose In Service, jobs with RES-Down status are not displayed. You must set how the Down Service Status is treated (through
) for the corresponding option to be available in the Status filter. - Total Hosts. The total number of hosts in this host group.
- Avg CPU %. The average processor utilization for hosts in this host group.
- Avg r1m. The average exponentially averaged effective processor run queue length for this host group over the last minute.
- Avg Effic. The average efficiency of the host group.
- Total CPU. The overall processor utilization rate of the host group.
- Max Memory. The maximum memory that is used by the host group.
- Max Swap. The maximum swap usage of the host group.
- Max Slots. The maximum number of job slots available for this host group.
- Num Slots. The number of jobs slots that are used by jobs that are dispatched to this host group.
- Run Slots. The number of job slots that are used by jobs that are running on this host group.
- SSUSP Slots. The number of job slots that are used by system suspended jobs on the host group.
- USUSP Slots. The number of job slots that are used by user suspended jobs on the host group.
- Reserve Slots. The number of jobs slots that are used by pending jobs that have job slots that are reserved within the host group.
You can view Job Statistics graphs by clicking the corresponding View Host Group Graphs action icon of the Host Group. These graphs present images and statistics to help you control jobs flexibly.
- GRID - Host Group - CPU Utilization
Presents statistics on processor utilization when the host is the server and its status is not in Unavail, Unreach, Unlicensed, or Closed-Admin.
- GRID - Host Group - Available Memory
Presents a general image of available memory by host group level for more flexibility in controlling jobs.
- GRID - Host Group - Host Details
Presents a general image of memory usage by host group level.
- GRID - Host Group - Memory Stats
Presents statistics on average or maximum memory when job status is RUNNING, USUSP, and SSUSP.
Project Viewer
Go to the Project Viewer by clicking By Project under the Job Info section of the Grid menu bar. Resources are shown in a cluster by project.
- Project Name. The name of the project. Click a project name to show running jobs for this project (on the page).
- Cluster Name. The LSF cluster to which this project belongs.
- Total Slots. The total number of job slots that are used for this project.
- Pending Slots. The number of job slots that are used by pending jobs for this project.
- Running Slots. The number of job slots that are used by running jobs for this project.
- Avg Effic. The average efficiency of this project.
- Max Mem. The maximum memory that is used by this project.
- Avg Mem. The average amount of memory that is used by this project.
- Max Swap. The maximum swap space that is used by this project.
- Avg Swap. The average swap space that is used by this project.
- Total CPU. The overall processor utilization of this project.
License Project Viewer
Go to the License Project Viewer by clicking By License Project under the Job Info section of the Grid menu bar. Resources are shown in a cluster by project.
- License Project. The name of the license project. Click a licenses project name to show running jobs for this licenses project (on the page).
- Cluster Name. The LSF cluster to which this license project belongs.
- Total Slots. The total number of job slots that are used for this license project.
- Pending Slots. The number of job slots that are used by pending jobs for this license project.
- Running Slots. The number of job slots that are used by running jobs for this license project.
- Avg Effic. The average efficiency of this license project.
- Max Mem. The maximum memory that is used by this license project.
- Avg Mem. The average amount of memory that is used by this license project.
- Max Swap. The maximum swap space that is used by this license project.
- Avg Swap. The average swap space that is used by this license project.
- Total CPU. The overall processor utilization of this license project.
Queue Viewer
Go to the Queue Viewer by clicking By Queue under the Job Info section of the Grid menu bar. This display is similar to the LSF bqueues command, with these exceptions: It includes the average and maximum run time of jobs in that queue, and the average and maximum pending time for the queues.
- Queue Name. The name of the LSF queue. Click a queue name to show running jobs in this queue (on the page).
- Cluster Name. The LSF cluster to which this queue belongs.
- Priority. The priority of the queue.
- Status Reason. The status of the queue, with further detail about the status.
- Max Slots. The maximum number of job slots that can be used by the jobs in the queue.
- Num Slots. The total number of available slots for this queue.
- Run Slots. The number of job slots that are used by running jobs in the queue.
- Pend Slots. The number of job slots that are used by pending jobs in the queue.
- Suspend Slots. The number of jobs slots that are used by suspended jobs in the queue.
- AVG Pend. The average number of job slots that are held by pending jobs in the queue.
- MAX Pend. The maximum number of job slots that are held by pending jobs in the queue.
- AVG Run. The average number of job slots that are held by running jobs in the queue.
- MAX Run. The maximum number of job slots that are held by running jobs in the queue.
If you select any queue, you are directed to a display of all RUNNING jobs within that queue.
Viewing a fair share Tree
Fair share shows the slot usage and a ratio of all the charge groups on the same level in a fair share tree.
To access the queue fair share tree in the Queue Viewer, go to View Fairshare Tree icon appears in the Actions icon list for the queue. The plug-in shows the immediate sub-level of the selected node and displays the usage and configured ratio in a Fairshare Tree tab.
. When fair share tree data is found in a queue, aTo configure the Fairshare Tree Host Slot Selection Criteria, go to . Choose a cluster or add a cluster, and select the Advanced tab.
View Job Array Listing page
Go to the View Job Array Listing page by clicking By Array under the Job Info section of the Grid menu bar. The page shows information similar to the LSF bjobs –A <job_id> command, but also includes aggregate information for the job array as a whole.
- Array ID. The job array ID.
- Job Name. The name of the job.
- User ID. The identifier of the user who submitted the job array.
- Total Jobs. The total number of jobs in the job array.
- Pending Jobs. The number of jobs that remain pending in the job array.
- Running Jobs. The number of currently running jobs.
- Done Jobs. The number of jobs that are completed without error.
- Exit Jobs. The number of jobs where errors prevented the job from completing.
- Array Effic. The average CPU efficiency of jobs in the job array.
- Avg Memory. The average memory that is used by jobs in the array.
- Avg Swap. The average swap space that is used by jobs in the array.
- Total CPU Time. The total CPU time that is used by all started jobs in the job array.
If you select any array, you are directed to a display of all "ACTIVE" and "FINISHED" jobs within the job array.
View Job Listing page
Go to the View Job Listing page by clicking Details under the Job Info section of the Grid menu bar. Filter batch job information to view only the job types you are interested in.
You can filter your view of the data by providing a resource string that conforms to the LSF bhosts -R command format. This displays jobs that are running on hosts that match the resource requirement. It has no comprehension of any job-specific resource requirements.
Clear the Dynamic check box if you do not want to immediately update page information each time you change a filter setting, and instead want to wait until you complete all filter settings and then click Go.
- Job ID. The job ID that LSF® assigned to the job. Click a job number to view an information page that contains details about that job (including general job information, job submission details, the job execution environment, current/last job status, and a graphical job history).
- Job Name. The name of the job.
- Status. The status of the job.
- State Changes. The number of times that the status of the job was changed.
- User ID. The LSF user who submitted the job.
- Mem Usage. Total resident memory usage of all processes in the job, in MB.
- VM Size. Total virtual memory usage of all processes in the job, in MB.
- CPU Usage. CPU utilization for this job.
- CPU Effic. The efficiency with which this job is using the CPU allocated to it, expressed as a percentage.
- Start Time. The time at which the job was started.
- Pend. The length of time in which the job was in the pending state.
- Run. The length of time for which the job was running.
- SSusp. The length of time the job was suspended by the system.
At the bottom of the Details page there are color-codes that indicate job efficiency thresholds, including Warning, Alarm, Flapping, and Dependencies. You can set the colors for each of these thresholds from the Console tab, on the page, along with the thresholds themselves.
View Jobs by Application
Go to the Batch Application page by clicking By Application under the Job Info section of the Grid menu bar. You can quickly check job details by application on the page.
By default, only applications with running jobs are listed. To view all applications, click that Include Unused Applications check box when you filter the search.
- LSF
Application must be configured first at:
$LSF_ENVDIR/lsbatch/<clustername>/configdir/lsb.applications.
- Unused application statuses are cleared at regular intervals. You can configure statuses by going to Poller tab. Set the interval under Cluster Graph Management section. and click the
You can click View Active Jobs by Application action icon to view the active jobs by application.
If you want to see the running jobs by application, click the Application name link.
You can also click View Application Graphs action icon to see the job application graphs. The following are the six new graphs:
- GRID - Applications - Efficiency
- GRID - Applications - Memory Stats
- GRID - Applications - Pending Jobs
- GRID - Applications - Running Jobs
- GRID - Applications - Total CPU
- GRID - Applications - VM Stats
View Application Profile in Job Detail page
Go to the Job Detail page by clicking Details under the Job Info section of the Grid menu bar. Filter batch job information to view only the job types you are interested in. You can filter the jobs by application by selecting All or the specific application through the Apps field. Click Go after you complete the filter setting.
Double-click the JobID of the job you want to see more details. Click Job Detail tab.
On the Submission Details section of the page, the Application details are displayed.
Purge Application information
Purge old application information by setting the application purge frequency.
- Click .
- Click the Poller tab.
- On the Cluster Graph Management section, set the Queue Host Group and Application Purge Frequency to how long after the application is removed from the system you want to have the corresponding graphs removed.
- Click Save.
View Jobs by Group
Go to the Batch Job Groups page by clicking By Group under the Job Info section of the Grid menu bar. You can quickly check job details by group from the page.
You can click View Active Jobs by Job Group action icon to view jobs by job group.
If you want to view running jobs by job group, click the corresponding Group Name link.
- If you filter the Group by Level, data from hidden sublevels are shown in the upper level job
group.
The maximum level on the Level drop-down list is based on the job group aggregation level setting.
- By default, unused job groups are not displayed. You can filter your search to show unused groups by clicking the Include Unused Groups search criteria.
- GRID - Job Groups - Efficiency
- GRID - Job Groups - Memory Stats
- GRID - Job Groups - Pending Jobs
- GRID - Job Groups - Running Jobs
- GRID - Job Groups - Total CPU
- GRID - Job Groups - VM Stats
Set Job Group Aggregation Level Limit
Set the job group aggregation level to limit the Level filter options.
- Click the Console tab.
- Under the Configuration section of the Console menu bar, click Grid Settings.
- Click the Aggregation tab.
- On the Job Group Tracking section, specify the Aggregation Level.
RTM aggregates job group information only by the specified level. If the level is set to 0, the job group data is not aggregated.
Note: Only users with the View Job Group Data permission can see the Group Data. Permission can be set in the realm permission for each user account. -
Click Save.
Purge Job Group information
Purge old job group information by setting the job group purge frequency.
- Click the Console tab.
- Under the Configuration section of the Console menu bar, click Grid Settings.
- Click the Poller tab.
- On the Cluster Graph Management section, set the User, User Group, Job Group, Project, License Project Purge Frequency to how long after the job group is not updated you want to automatically remove the corresponding graphs.
- Click Save.
View Job Group in Job Detail page
Go to the Job Detail page by clicking Details under the Job Info section of the Grid menu bar. Filter batch job information to view only the job types you are interested in. In this release, you can filter the jobs by selecting the group through the JGroup field. Click Go after you complete the filter setting.
Double-click the JobID of the job you want to see more details. Click Job Detail tab.
On the Submission Details section of the page, the Job Group details are displayed.
View Exception Status in Job Detail page
- JOB_IDLE for idle job exception handling
- JOB_OVERRUN for exception handling of jobs that run longer than specified run time
- JOB_UNDERRUN for exception handling of jobs that exits before the specified number of minutes
After you configure these parameters, you can submit jobs to the new queue which now includes the exception status definition.
Go to the Job Detail page by clicking Details under the Job Info section of the Grid menu bar. Filter batch job information to view only the job types you are interested in. Click Go after you complete the filter setting
Double-click the JobID of the job you want to see more details. Click Job Detail tab.
On the Current/Last Status section of the page, the Exception Status (such as idle, overrun, or underrun) details are shown. Abnormal exit shows job exit information that is predefined in LSF.
Monitor Host level rusage in the Job Details page
You can monitor host level memory usage for every job you submit using the blaunch command in LSF. Make sure that the LSF_HPC_EXTENSIONS = HOST_RUSAGE parameter is configured in the lsf.conf file.
Enable Pending Reason History Reporting
You can collect and store all pending reasons throughout a job’s lifecycle. From this information, you can identify the area where jobs are pending for the longest duration.
- By default, the pending reason collection and aggregation feature is disabled.
- Collecting pending reason from LSF increases the polling time.
- For large clusters with over 100K jobs per day, you must configure CONDENSE_PENDING_REASON parameter in LSF. Refer to the LSF documentation for more details.
- Go to Console tab. the
- Click the Poller tab.
- On the Job Collection Settings section, click Enable Pending Reason History Reporting check box.
- Click Save.
Pending reasons duration analysis starts and a Gantt chart displays the Pending Reason History Report on the Pending Reasons tab in the job’s Details page.
Filter pending reasons on the Settings page
- Click the person icon in the top right and select Edit Profile.
- Click the Reasons tab.
- Select the check boxes corresponding to the reasons that you want the system to ignore.
You can also enter a reason in the field for Ignore Pending Reason by.
- Click Return to save your settings and return to the previous page.
- The Pending Reason and Suspend Reason sections lists the normal pending reasons (such as job slot limit reached, load information unavailable, idle time is not long enough).
- The Load Indices section lists resource-based reasons (such as r15s 15-second load averaged over the last 15 seconds, r1m 1-minute load average over the last minute, io Disk I/O rate averaged over the last minute).
The list includes all currently known RTM pending and suspend reasons. It may not include all LSF implemented pending and suspend reasons.
View Job Pending Reason History page
Go to the Job Pending Reason History page by clicking Details under the Job Info section of the Grid menu bar. Filter batch job information to view only the job types you are interested in. Click Go after you complete the filter settings.
Double-click the JobID of the job you want to see the pending reason history details.
Click Pending Reasons tab. The Job Pending Reason History page displays the Pending Reason History report.
- The Pending Reasons tab is only visible if a job has at least one pending reason that occurred in the past.
- By default, the report is sorted by pending duration, lists the top 20 pending reasons, and
applies the ignore setting of the RTM system.
- Pending Duration: displays reasons that are sorted by the total duration that the job spent for each reason
- Pending Reason Text: sorts by the pending reason text
- Start Time of the Pending Reason: sorts by the start time each pending reason occurred
- End Time of the Pending Reason: sorts by the end time of each pending reason occurred
- Time header (unit) automatically adapts as second/minute/hour/day by pending reason history.
- Report shows Now vertical line if the job is pending or suspended.
- Red line is displayed in the report when the job state is changed to running state. It signifies the current time of the job when it is not yet finished.
The Pending Reasons tab remains even if the job is finished.