Configuring RTM cluster settings

Configure IBM Spectrum LSF clusters to interact with IBM Spectrum LSF RTM.

Go to the Grid Settings page by clicking Console > Configuration > RTM Settings.

In Grid Settings, there is a tab for each category of RTM settings that you can change.

  • General: Configure the default user settings for your cluster.

    Among the fields that are displayed, take note of the following specific fields:
    • Summary Host name Substitute

      Conserve display space for common host name substrings.

    • Minimum User Screen Refresh Interval

      Restricts the minimum refresh interval that your users can set to reduce load on the system. Set a higher refresh interval for larger clusters to reduce system load.

    • Maximum Job Zoom Time Range

      Restricts the maximum job time in which your users can zoom in after which they cannot zoom in anymore. Restrict this setting to reduce load because the job zoom function is system-intensive. Set a smaller window for larger clusters to reduce system load.

    • Maximum Export Rows

      Restricts the maximum number of rows that your users can export to improve system performance. Increasing the maximum number of rows that are exported reduces system performance during export operations.

    • Cluster CPU Factor Leveling

      Important:

      Do not enable this setting unless you understand how to apply CPU factoring to hosts in your cluster. If you want to learn more about CPU factors, then refer to the LSF Administration documentation.

    • Enable Ajax filter: It is a powerful ajax search engine that lets you perform better search by filtering hosts, host group, users or user group. Enable this feature to improve your search criteria. Enter at least three '%' characters in the Grid tab to get a list of all hosts, host group, users, or user group.
  • Visual: Configure the tree categories visable in the tree view when a new cluster is added.
  • Poller: Configure poller defaults for data collection, interval settings, and thresholds.
  • Maint: Configure system maintenance settings. You can keep more data for smaller clusters, because there are fewer records for these clusters.
    Among the fields that the page displays, take note of the following specific fields:
    • Detail Job Data Retention Period

      Records job details such as total process count, threads, memory, PID, PGID, and swap, and these details are used to generate Job Graphs.

      The size of each job record depends on job volume and your cluster settings. The system can hold a maximum of 10 million records. Use this upper limit along with the approximate number of jobs per week in your cluster to determine the ideal retention period.

    • Summary Job Data Retention Period

      Individual job records are kept for this period after the job ended. The size of each job record depends on job volume and your cluster settings.

    • Daily Statistics Retention Period

      Records of daily summary statistics are kept for this period after the job ended. As these records are added every day, you can keep records for a longer time, depending on the job volume. Smaller clusters with less than one million jobs per year can have a retention period as high as three years.

    • Backup Cacti Database

      This field enables a disaster recovery backup to restore your Cacti and RTM configuration. Some job data is lost during the database restoration, though you can use other utilities to restore all the job data.

      Note:

      Database backup files are disk-intensive for larger clusters.

  • Archiving: Configure database archiving settings.

    Data archiving allows you to send job data to a non-local database to be used by other tools such as Cognos, Crystal Reports, and Tableau. Currently, only MySQL and MariaDB are supported as an external database. However, if you want to add another database, then contact Support.

  • Paths: Configures cluster directories and file paths. If the directories and file paths are found and verified, the message [OK: FILE FOUND] displays below the corresponding fields.
  • Thold: Configures cluster alert and threshold settings, including alerts and thresholds to identify when resources are idle/closed, low, busy, or starved.

    You can also set how the RES-Down Status is treated when you search for jobs By Host to By Group. On the Down Service Status section, select Out of Service or In Service. The specified option becomes available in the Status field on the Batch Host Filters when you click By Host or By Host Group under the Job Info section of the Grid tab. If you choose In Service, jobs with RES-Down status are not shown on the list. To see jobs with RES-Down status, choose Out of Service.

  • Aggregation: Configures default behavior for project information aggregation, host information aggregation, and memory tracking.
    Among the fields that the page displays, take note of the following specific fields:
    • User Group Aggregation Details

      Specifies how your cluster handles user group filtering. There are two ways to filter jobs by user groups. Setting the user group aggregation details to User Group Membership displays all the jobs of the user that belong to the group. The Job Specification option displays only the jobs that are submitted with bsub -G option.

      To set the method that matches your configuration:

      1. Click the Console tab.

      2. Under the Configuration section of the Console tab, click Grid Settings.

      3. Click the Aggregation tab.

      4. Select the User Group Aggregation Details (User Group Membership or Job Specification) from the list.

      5. Click Save.

      To see the display, go to the Grid tab then click Groups under User/Group Info section of the Grid tab. Click the Group Name link to view the list of jobs for the group based on the specified setting.

    • Wall clock Calculation Method

      Set this field for chargeback calculations, depending on whether you charge for suspend time.

    • Should Project Names be Aggregated?

      Enable this field to collect job data based on project names.

    • Project Aggregation Method

      Project aggregation is used for project names that contain hierarchical metadata to assist with tracking.

    • Should License Project Job Performance be Tracked?

      Enable this field to track running jobs that are associated with a License Project in Grid -> By License Project. By default, it is disabled and license project information is not shown.

    • Aggregation Level

      When you build a job group hierarchy, only a number of fields are significant for analysis. RTM aggregates job groups by only the specified level, if you set up a positive integer value.

  • Status/Events: Configure default behavior, thresholds, and visual cues for job flapping, cluster and job efficiency, PID levels, and job dependencies. If you set the color to None, the legend items does not appear, and colorizing of the GUI will not happen on the Job Details page.
  • Idle Jobs: Configure detection and email notification settings for idle jobs. You can configure the following global settings: Filter Name, Email Subject, Email Message, and Legend Background Color. Specific Idle job settings are set under the Cluster Management page in the Idle Jobstab.
  • Memory Exceptions: Configures detection and email notification settings for memory rusage violations.
  • Runlimit Exceptions: Configures detection and email notifications for job runtime limit exceptions.