Known issues and limitations

Details of the RTM known issues and limitations.


Issue Description
New for 10.2.0 Fix pack 1
Duplicate pending job records may appear in Job Detail page (Cluster > Job Info > Details) when filtered with Pend Level. This occurs because DISTINCT (when selecting records from across tables) was removed from RTM to improve performance.
The RTM Benchmark plugin does not work for LSF versions 10.1.0.10 or higher, if the RTM server host is not included in the LSF_ADDON_HOSTS on the LSF management host. Since LSF 10.1.0 Fix Pack 10, operations as 'root' are rejected if LSF_ROOT_USER is 'N' or is not configured, with an exception if the RTM server host is included in the LSF_ADDON_HOSTS on the LSF management host. However, the RTM Benchmark plugin only works if run as 'root'.
New for 10.2.0
License daily status result will be empty when changing user/Host/Cluster filter combination if Disable license feature peak utilization calculation is enabled.

If the option “Disable license feature peak utilization calculation” option under Configuration > Settings > License is selected, the License > Daily Statistics page will not be accurate. When changing filter combinations, some combination will not have data.

.
Cacti Aggregate graph function can not work, See https://github.com/Cacti/cacti/issues/2869 for Cacti on github.
Can not create threshold with data query item required, See https://github.com/Cacti/plugin_thold/issues/317 for Cacti on github.
Can not replace thold name to correct one and Warning/Alert HRULE description display incorrect, See https://github.com/Cacti/plugin_thold/issues/357 for Cacti on github.
Issues from 10.1.0 and older
The Rtmssh Plugin (controls the ssh action icon on Grid > Job Info, Grid > Load Info, and Grid > Host Info pages ) does not work on some Firefox (64bit) browsers. For example, this feature does not work correctly on Firefox (64bit) version 52.6.0 because Firefox changed its support. See https://support.mozilla.org/en-US/kb/use-java-plugin-to-view-interactive-content for more information.
The new Map feature can map a license feature name to another new name. However, the new name does not work in the License >Usage Reports >Charts page and continues to show the old feature name. For example, if the map feature has changed myfeature1 to my1 in the Console >License accounting >Feature page, it will continue to show as myfeature1 in License >Usage Reports >Charts page while showing the new my1 name in other RTM pages.
If the host name contains a period, RTM cannot identify the domain name and gets an incorrectly shortened name with run database_shorten_hostname.sh For example, if the host name is abc.dd (FQDNs is abc.dd.domain.com), the shorten hostname script will get the domain name as dd.domain.com, and host name as abc, not abc.dd.
If the host name contains a period, RTM cannot identify the domain name and gets an incorrectly shortened name with run database_shorten_hostname.sh

For example, if the host name is abc.dd (FQDNs is abc.dd.domain.com), the shorten hostname script will get the domain name as dd.domain.com, and host name as abc, not abc.dd.

For two LSF single pending reasons, RTM is not able to fetch their customized descriptions

Reason ID Default Description

PEND_NO_CANDIDATE_HOST (62) There are no suitable hosts for the job

PEND_JOB_SPREAD_TASK (312) Not enough hosts to meet the job's spanning requirement

ALL cluster option in metadata settings does not work  
Incorrect examples given for DiskU client poller installation In the section titled "Setting up remote DiskU client pollers", incorrect examples are given in step 3 of the procedure. It should appear as follows:

Install the rtm-client-10.2.0.<ARCH>.rpm.:

#rpm –ivh /mnt/rtm/ x86_64/ rtm-client-10.2.0.<ARCH>.rpm
Note: If you are installing the RTM client on RHEL and SLES, use the following to install:
rpm --ivh --prefix=Client_TOP /mnt/rtm/ x86_64/ rtm-client-<versioin>.<arch>.rpm

If you are installing the RTM client on Ubuntu, use the following to install:

rpm --ivh --nodeps /mnt/rtm/ x86_64/ rtm-client-<versioin>.<arch>.rpm
DiskU Plugin Limitation. An incorrect value for total users is shown in the By TagName page. As there is no tagname level aggregation, (the value is SUM based on user level or disku_users_totals), the value may be lager than the actual value when there are the same users in the different scan path with the same tagname.
Errors when upgrading RTM from previous version to 10.1 if LSF7.x cluster has been used. Scenarios:
  1. Upgrading RTM from 8.3, 9.1, or 9.1.2 to 10.1 If the customer has an LSF7 cluster, after upgrading, the customer can still use the old LSF7 cluster, but cannot add new LSF7 pollers or clusters. The LSF7 Cluster status is Dismissed; CLog has warning gridpend binary does not exist. But LSF7 poller works and data can be collected.
  2. Upgrading RTM from 9.1.3 to 10.1 If the customer has an LSF7 cluster, after upgrading, RTM will disable the LSF7 Cluster. If the customer still wants to use it, they need to upgrade the poller to 9.1.4 manually or upgrade the LSF Cluster. Otherwise, the LSF7 cluster cannot be used normally.
  3. Upgrading RTM from 9.1.4 to 10.1 If the customer has an LSF7 cluster, after upgrading, the customer can still use the old LSF7 cluster, but cannot add new LSF7 pollers or clusters.
LS plugin Limitation - RTM License Scheduler plugin does not support multiple license in one token After configuring License Scheduler to support multiple licenses in one token, the RTM License Scheduler plugin does not support this configuration.
Benchmark job Limitation - No records found if the number of days selected is greater than the retention period. If the set number of benchmark job history days is greater than Job Data Retention Period job, then after Job Data Retention Period job, the job information will be cleaned from grid_jobs and grid_jobs_finished table. Clicking the jobid on Benchmark job result page will show no record found.
The daily_replay and other exit jobs aggregation reports will be broken as a result of changes to lsb.accnt:JOB_FINISH line format in LSF 10.1 LSF 10.1 made changes to the lsb.acct:JOB_FINISH line format, merging "killed pending array job" as one line per kill action. It also made changes to the original jobFinishLog->idx as '0/-1'. Therefore, "gridacct" will insert only one job record for a serial of 'killed pending array job" with the incorrect 'indexid', and skip all other 'killed pending array job' in one line. If the LSF Admin enables the new JOB_FINISH format (lsb.params:JOB_ARRAY_EVENTS_COMBINE=Y), then the daily_replay and other exit jobs aggregation reports will be broken. Note: The JOB_ARRAY_EVENTS_COMBINE parameter is set to Y by default for fresh installations of LSF.

When hosts are filtered by Resreq in the Grid > Host dashboard, an error "Hosts not found" is displayed.

As a workaround, after selecting a cluster and filtering hosts by Resreq, add "Type=any" along with your search keyword in the Resreq field. For example, if you want to filter "1s=1", then enter "Type=any && 1s=1".

When queues are filtered by User in the Grid > Job Info > By Queue page, the Active Slots number is not equal to the sum of Run Slots, Pend Slots, and Suspend Slots.

The slot numbers do not match when All is selected in the User field. As a workaround, filter by any user name instead of All to match the slot numbers.

If the HPC Allocation feature is enabled, the starting and running tasks values are not consistent in the Grid > User / Group Info page.

This is a limitation of LSF 9.1.3 as fairshare displays number of slots instead of number of tasks.

The State Changes column shows incorrect values in the Grid > Job Info > Details page.

This is a limitation of RTM. The State Changes column may show an incorrect value in the Grid > Job Info > Details page for jobs that have been requeued.

LS Plugin Limitation - Inaccurate reserve token shown to users.

If a user over reserves a token for a job, and is using consuming fewer tokens, RTM will accurately show a reserve amount, but the user may not be able to search for those reserving jobs. Therefore, this may cause several problems. For example, the Demand column shows an incorrect number in the Grid > Dashboard > License Scheduler > Feature tab.

The job displayed in the JobIQ details page does not match counts on the RTM Summary Job History page.

The limitation is due to the difference in polling times. Heuristics poller runs every 5 minutes but the job poller runs more frequently. So the count may be out of sync until heuristics poller runs again.

Job efficiency is over 100% in the Job Details tab on the Grid > Job Info > Details page..

This is a limitation of RTM. When a job is requeued and submitted to one of the specified queues, job efficiency shows over 100% in the Job Details tab.

Issues with MySQL performance.

Change innodb_flush_log_at_trx setting in my.cnf to 2. This dumps the MySQL log to disk every second instead of every query commit. This change reduces the amount of random disk I/O and thus increases RTM's scalability.

An error 'A database insert failed! Error:'1064', ODBC:37000, SQL Fragment:'REPLACE INTO grid_clusters_perfmon_usage_metrics (clusterid,metric,used,free,total,present) VALUES ' is logged when perfmon collection is enabled on a cluster running on Power Systems.

This is a limitation of LSF on Power Systems because LSF does not collect MBD file descriptor usage metrics.

Machine runs out of memory for running the Cluster Dashboard with auto refresh.

Machine runs out of memory because the session files are overloading the worker threads memory. As a workaround, change the settings in /etc/http/conf/httpd.conf as:
StartServers 1
MinSpareServers 1
MaxSpareServers 5
MaxClients 25
MaxRequestsPerChild 300

If you want to tune the web server or recycle Apache processes, refer to http://www.hostinginside.com/billing/knowledgebase.php?action=displayarticle&id=4

All license-related alerts are no longer valid after upgrading RTM to 9.1.3.

If you are upgrading to 9.1.3, then all license-related alerts must be re-created.

SQL syntax error occurs when running RTM on RHEL 6.4 and 6.5 on IBM Power Systems.

SQL syntax error occurs when Red Hat Bugzilla number 1054953 is not applied. Contact Red Hat OS Support to get the fix pack.

No records are displayed under Syslog when a remote database is configured for RTM installed on SLES 11.

This issue is due to the wrong syslog path. As a workaround, follow these steps:
  1. Search for the syslogtomysql file. It can be under /opt/IBM/rtm/bin/syslogtomysql or /opt/rtm/bin/syslogtomysql based on where you have installed RTM.
  2. Edit /etc/init.d/syslog as
    #vim /etc/init.d/syslog
  3. Search the keyword syslogtomysql and find the line $RTM_TOP/rtm/bin/syslogtomysql > /dev/null 2>&1 &
  4. Correct this path: $RTM_TOP/rtm/bin/syslogtomysql > /dev/null 2>&1 &, for example, change the path to /opt/IBM/rtm/bin/syslogtomysql > /dev/null 2>&1 &.
  5. Restart the service /etc/init.d/syslog restart

If you have removed an ELIM value from LSF, it continues to be displayed in the various host tables in RTM.

When an ELIM is removed from LSF, it continues to show up in the Grid > Settings > Hosts page. Go to that page and click Save to refresh and remove the Elim.

No data is displayed when Previous Day is selected in License Daily Statistics Report.

Data is not displayed if you navigate to License > Usage Reports > License Daily Statistics Report and filter as Previous Day. As a workaround, select Yesterday instead of Previous Day to view data.

Sometimes audible alert is not activated for the unavailable hosts.

When you mute a triggered Unavailable Hosts alert, the mute is sustained for all other triggered alerts. Resume the dashboard alerts to trigger the audible alerts again.

When a SQL query is modified, the layout items occasionally contains old SQL query. When a SQL query in the data source is modified, the existing layout items change according to the new query. However, sometimes the existing layout items do not change even after modifying the SQL query. As a workaround, delete all existing layout items. The new layout items are then generated again according to the modified SQL query.
If both Xz_string and xz_string are defined as shared resource, only the first one is taken into account. RTM is not case-sensitive for non-binary string searches. If you search with Xz_string, then you get resources that start with either "A" or "a".
Browser hangs on the host dashboard after the resource requirement value is entered.

When LSF® LIM is down for a specific cluster, the Resource Requirement string filter does not work, and the page locks up until the LSF API times out. Restart your browser to correct this behavior and avoid by using the resource requirements filter when the cluster is offline.

The error "Error:'1060', Message:'Duplicate column name 'jobid'" is displayed after using a cross join SQL query in a grid alert.. When you define a grid alert by using a cross join SQL query, do not use "select * for the list of qualified column names. As a workaround, you can list the fields that you want to query in the SQL sentence after the word select.
When a job triggers an alert, a notification is sent to only one of the defined email addresses. If you submit jobs and assign multiple email addresses for alert notification, then the alert notification is sent only to the last email address.
ssusp time differs between LSF accounting and RTM accounting It is difficult to get ssusp time in IBM Spectrum LSF RTM if the total ususp time is less than the poller interval. A workaround is to decrease the poller interval but it may not apply to all due to system size.
stime, utime, and mem rusage reports for finished jobs are not the same in RTM and LSF For IBM® Platform LSF 7.0.2 and earlier versions, the stime, utime, and mem rusage reports for finished jobs in RTM and LSF do not match.
License data filtering does not work when fields contain commas or quotation marks Filtering does not work if the license server has commas in any of the filter fields. If the vendor name has a comma, it displays correctly on the detail page. However, if you try to filter by the vendor name, it removes all data and sets the vendor filter name back to "All".
Rsyslog cannot start due to a missing module The following error is scene in /var/log/messages when starting rsyslogd.
Oct 11 21:40:06 xrh51612 rsyslogd: could not load 
module '/usr/lib64/rsyslog/ommysql', dlopen: 
/usr/lib64/rsyslog/ommysql: cannot open shared 
object file: No such file or directory

Workaround:

  1. Go to /usr/lib64/rsyslog/ directory.
  2. Check for the ommysql.so shared library.

    If it is missing, the mysql module for rsyslog is not installed. Check your OS documentation to install it.

  3. Create symbolic link:
    ln -s ommysql.so ommysql
    
  4. Restart service:
    service rsyslog restart
    
Jobs running record status are shown as Exited and the job record is not found on the Job Graph/Job Detail page as the time zone of the lsfpoller in RTM Server is not adjusted to remote lsfpoller. All remote pollers must be in the time zone of the cluster. The timezone of the Cluster is set in Console > Grid Management > Cluster > [edit] > Configuration.
Job graph is not drawn if RRD file’s last update time is greater than update time The RRD files update times are based on the RTM hosts whereas the rusage update times are based on the cluster. This inconsistency happens when the actual time is out of sync with the LSF cluster.

Follow these workaround steps:

  1. Update rrd file.

    Example:

    /usr/bin/rrdtool update /opt/cacti/gridcache/923_0_3_1280202981_absolute.rrd --template

    utime:stime:mem:swap:npids:npgids:threads 1280203322:67:1:577764:801592:3:1:4 1280203478:165:2:577764:801592:3:1:4

    ERROR: illegal attempt to update using time 1280203322 when last update time is 1280203428 (minimum one second step)

    # date -d '1970-01-01 1280203428 sec utc' Tue Jul 27 12:03:48 CST 2010

    # date -d '1970-01-01 1280203322 sec utc' Tue Jul 27 12:02:02 CST 2010

  2. Delete the update_time="2010-07-27 12:02:02" record in the table "grid_jobs_rusage".

    The correct Job graph is drawn.

Cannot forward syslog messages to the RTM host. This message is displayed when the RTM host is using rsyslog and the other host, which is sending messages is using syslog.

To resolve this issue, edit/etc/rsyslog.conf by adding this line:

:hostname, contains, "syslogd"

When embedding graphs in Lotus Notes® email, an icon shows as a red X. When a graph is embedded in an email, the icon shows as a red X. The graph is attached to the email so you can view it, but it is not inline as expected.

As a workaround:

  1. Go to Console > Configuration > Settings
  2. Click the Misc tab.
  3. Select Enable Lotus Notus (R) tweak
Job time is not reported correctly for jobs with pre-execution scripts If a job has a pre-execution script, LSF includes the time in the running value and RTM also includes this time in the pending value.
Issues with requeued jobs
  • The final job status is not updated after a requeued job is finished.

    Sometimes IBM Spectrum LSF RTM does not show the correct status of a finished job that was requeued. Since LSF resets the run time when the job is requeued, the total time that is shown is based on the last requeue.

  • Value of PEND time on the Job List page is not updated for requeued jobs.

    Currently, IBM Spectrum LSF RTM does not show the correct details for the pending time in the Job List page. Get the correct information from the Job Details page.

  • Job RUN TIME does not show correct information if a job is requeued multiple times.
Internet Explorer cannot handle URL with underscore ("_") If you use the Internet Explorer (IE) browser to log in to a IBM Spectrum LSF RTM system that has an underscore in the host name, you can enter the login and password but it does not proceed past the Login page. This problem applies in both IE7 and IE8.

As workaround, use a different alias for the host or it’s IP address in the URL.

Existing host's graph is not updated after the host template's Associated Graph Templates or Associated Data Queries is changed After the host template’s Associated Graph Templates/Associated Data Queries is changed, the Data Queries are not automatically added and are reindexed only. For now, Graph Templates are only updated after more than 10 minutes.

Fixed bugs

Bugs fixed in each release of IBM Spectrum LSF RTM are listed in the Readme document available with the product download on IBM Fix Central (www.ibm.com/support/fixcentral).