IBM Support

Recover MySQL and RTM

Troubleshooting


Problem

Recover MySQL and RTM

Symptom

DB corrupted. RTM DB is corrupted beyond recovery and you have tried everything, some quick steps to bring RTM back up and running using last known good backup

Resolving The Problem

Sequence of steps are:  bring up MySQL DB, restore data from backup, import the missing job data since outage and replay daily stats.

 1)  Shut down services

service lsfpollerd stop
service licpollerd stop
service advocate stop
service crond stop
service httpd stop
service mysqld stop

2)  Bring up MySQL

rm -f /var/lib/mysql/ibdata*
rm -f /var/lib/mysql/ib_logfile*

rm -rf /var/lib/mysql/cacti

service mysqld start

- Check if no errors in /var/lib/mysql/<hostname>.err

3) Create cacti database

mysql> create database cacti;

4) Give permissions on all databases to DB user cacti on localhost

mysql> GRANT USAGE ON *.* TO 'cacti'@'localhost' IDENTIFIED BY PASSWORD '*4ACFE3202A5FF5CF467898FC58AAB1D615029441';
mysql> GRANT ALL PRIVILEGES ON `cacti`.* TO 'cacti'@'localhost';

mysql> GRANT SELECT ON `mysql`.`time_zone_name` TO 'cacti'@'localhost'

5) Import cacti data

mkdir /tmp/bak

cp /opt/IBM/cacti/backup/cacti_db_backup_95.tgz   /tmp/bak     # check which backup would be good depending on logs

cd /tmp/bak

tar -zxvf cacti_db_backup_95.tgz

cd cacti_backup

mysql cacti < cacti_db_struct_backup.sql

mysql cacti < cacti_db_backup.sql

6) Import partitioned data if partitioning is enabled and partition backup option is chosen. If you have tarzip files in /opt/IBM/cacti/backup/partition_backups/ directory, run the following (as root)-

cd ${RTM_TOP}/cacti/plugins/grid

php database_restore_partitions.php -f

 

7) Include ${RTM_TOP}/rtm/bin/ in the default library cache

ldconfig /opt/IBM/rtm/bin/

8) Start Services

service httpd start
service advocate start
service lsfpollerd start
service licpollerd start
service crond start

=> At this point, log on to RTM WebGUI, check if cluster is up, graphs are graphing, no further red errors in cacti logs, and recently submitted jobs can be seen on grid Jobs tab (running, finished or pending).

9) Import missing job data

Identify the clusterid of the cluster, LSF version and run the command below

cd /opt/IBM/rtm/lsf<version>/bin
./gridacct -C 1 -B "2016:01:31,2016:02:26" -D <path_to_lsb.acct>

In example above, clusterid=1, -B indicates jobs submitted in the range  - YYYY:MM:DD[:HH:MM],YYYY:MM:DD[:HH:MM]

10) Rebuild daily stats

cd /opt/IBM/rtm/cacti/plugins/grid

php database_replay_daily_stats.php --start='2016-01-31' --end='2016-02-26'

In the example above, daily stats for the repopulated job period is replayed.

Related Information

[{"Product":{"code":"SSVMSD","label":"Platform RTM"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":"Database","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.1.0;9.1.2;9.1.3;9.1.4","Edition":"","Line of Business":{"code":"","label":""}},{"Product":{"code":"SSVMSD","label":"Platform RTM"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}},{"Product":{"code":"SSZT2D","label":"IBM Spectrum LSF RTM"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":" ","Platform":[{"code":"","label":null}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
30 August 2019

UID

isg3T1023543