Troubleshooting
Problem
Recover MySQL and RTM
Symptom
DB corrupted. RTM DB is corrupted beyond recovery and you have tried everything, some quick steps to bring RTM back up and running using last known good backup
Resolving The Problem
Sequence of steps are: bring up MySQL DB, restore data from backup, import the missing job data since outage and replay daily stats.
1) Shut down services
service lsfpollerd stop
service licpollerd stop
service advocate stop
service crond stop
service httpd stop
service mysqld stop
2) Bring up MySQL
rm -f /var/lib/mysql/ibdata*
rm -f /var/lib/mysql/ib_logfile*
rm -rf /var/lib/mysql/cacti
service mysqld start
- Check if no errors in /var/lib/mysql/<hostname>.err
3) Create cacti database
mysql> create database cacti;
4) Give permissions on all databases to DB user cacti on localhost
mysql> GRANT USAGE ON *.* TO 'cacti'@'localhost' IDENTIFIED BY PASSWORD '*4ACFE3202A5FF5CF467898FC58AAB1D615029441';
mysql> GRANT ALL PRIVILEGES ON `cacti`.* TO 'cacti'@'localhost';
mysql> GRANT SELECT ON `mysql`.`time_zone_name` TO 'cacti'@'localhost'
5) Import cacti data
mkdir /tmp/bak
cp /opt/IBM/cacti/backup/cacti_db_backup_95.tgz /tmp/bak # check which backup would be good depending on logs
cd /tmp/bak
tar -zxvf cacti_db_backup_95.tgz
cd cacti_backup
mysql cacti < cacti_db_struct_backup.sql
mysql cacti < cacti_db_backup.sql
6) Import partitioned data if partitioning is enabled and partition backup option is chosen. If you have tarzip files in /opt/IBM/cacti/backup/partition_backups/ directory, run the following (as root)-
cd ${RTM_TOP}/cacti/plugins/grid
php database_restore_partitions.php -f
7) Include ${RTM_TOP}/rtm/bin/ in the default library cache
ldconfig /opt/IBM/rtm/bin/
8) Start Services
service httpd start
service advocate start
service lsfpollerd start
service licpollerd start
service crond start
=> At this point, log on to RTM WebGUI, check if cluster is up, graphs are graphing, no further red errors in cacti logs, and recently submitted jobs can be seen on grid Jobs tab (running, finished or pending).
9) Import missing job data
Identify the clusterid of the cluster, LSF version and run the command below
cd /opt/IBM/rtm/lsf<version>/bin
./gridacct -C 1 -B "2016:01:31,2016:02:26" -D <path_to_lsb.acct>
In example above, clusterid=1, -B indicates jobs submitted in the range - YYYY:MM:DD[:HH:MM],YYYY:MM:DD[:HH:MM]
10) Rebuild daily stats
cd /opt/IBM/rtm/cacti/plugins/grid
php database_replay_daily_stats.php --start='2016-01-31' --end='2016-02-26'
In the example above, daily stats for the repopulated job period is replayed.
Related Information
Was this topic helpful?
Document Information
Modified date:
30 August 2019
UID
isg3T1023543