Update IBM Spectrum LSF Suite for
Enterprise on a basic or large installation. IBM Spectrum LSF Suite for
Enterprise is locally installed, but with a shared configuration directory. This means that
the binary files are local, but the configuration files are on a shared file system.
Before you begin
IBM Spectrum LSF Suite for
Enterprise must be on a basic or large installation. For more details, refer to Determining the cluster configuration.
Choose a time in which your cluster is not running any jobs, such as during a scheduled
maintenance window, or quiesce the cluster to minimize the number of running jobs while you are
updating the cluster.
Important: Updating remote databases is not performed automatically. Systems using
remote databases must update them manually.
About this task
Since the management hosts use local binary installations, you can use a secondary management host to evaluate the
Fix Pack without affecting the live cluster.
Procedure
-
Download the Fix Pack from IBM Fix Central.
-
Log in to the deployment host and back up the deployment files.
For local installations, the existing IBM Spectrum LSF Suite for
Enterprise installation RPM files in the deployment are necessary in case there are
problems with the Fix Pack and you need to roll back to the previous version.
For example, back up the contents of the /var/www/html/lsf_suite_pkgs
directory:
cd /var/www/html/lsf_suite_pkgs
tar zcvf lsf-rpm-backup.tgz
-
Log in to the primary LSF
management host and
back up the contents of the LSF
conf directory.
cd /opt/ibm/lsfsuite/lsf/
tar zcvf lsf-conf-backup.tgz conf/
-
Log in to the database host and back up the LSF
database.
If you are updating a basic installation, the database host is the secondary LSF
management host.
For example, if you are using a MySQL database, use the mysqldump command to
back up the database.
- Back up Elasticsearch.
As of version 10.2 Fix Pack 10, Elasticsearch,
Logstash, and Kibana (ELK) are no longer bundled with the installation package. Customers who wish
to use a newer version or want to use specific Elasticsearch, Kibana, or Logstash features must
download and install them separately. Otherwise, customers can still use their existing 10.2 FP9
installed ELK package.
Note: The supported ELK version for version 10.2 Fix Pack 10 is 7.2.x or higher (but less than
version 8). IBM Spectrum LSF Suite for
Enterprise 10.2 Fix Pack 10 was fully tested on ELK 7.2.1.
See Installing Elasticsearch, Kibana, and Logstash for instructions on installing an external version
of Elasticsearch and configuration requirements for upgrading from a previous version of IBM Spectrum LSF Suite for
Enterprise using a bundled version of Elasticsearch.
Updating Elasticsearch will perform a re-indexing of the current indices. It is strongly
recommended to perform a data backup before proceeding. On configurations with multiple nodes
Elasticsearch, the backup directory must be mounted on each node using NFS. Creating the snapshot
will create the backup in each node, onto the NFS directory. Refer to
https://www.elastic.co/guide/en/elasticsearch/reference/6.6/modules-snapshots.html for more details.
Note: The default ES_PORT is 9200.
- Log in to every GUI_Role machine as root.
- Configure the Elasticsearch snapshot repository.
- If there is only one GUI_Role machine, put the snapshot repository on a local disk.
- Create the directory /opt/ibm/elastic/elasticsearch_repo with write and
execute permission for lsfadmin.
- In /opt/ibm/elastic/elasticsearch/config/elasticsearch.yml change the path
from /opt/ibm/elastic/elasticsearch_repo to
/opt/ibm/elastic/elasticsearch/config/elasticsearch.yml
- If there are multiple GUI_Role machines, the snapshot repository MUST be on a shared file system
(NFS) that all GUI_Role machines can access.
- On each GUI_Role machine, define the same shared location.
Create a directory
[share_dir]/elasticsearch_repo with write and execute permission for
lsfadmin. For example: /mnt/elasticsearch_repo
- In /opt/ibm/elastic/elasticsearch/config/elasticsearch.yml, change the path
from /mnt/elastic/elasticsearch_repo to
/opt/ibm/elastic/elasticsearch/config/elasticsearch.yml.
- Restart Elasticsearch to make the above changes take effect on each GUI_ROLE machine:
systemctl restart elasticsearch-for-lsf.service
- Stop the following services on each GUI_Host machine:
perfadmin stop all
pmcadmin stop
systemctl stop logstash-for-lsf.service
systemctl stop metricbeat-for-lsf.service
systemctl stop filebeat-for-lsf.service
- Log in to a GUI_Role machine.
- Create the repository location: es_backup in Elasticsearch. At a command
prompt, enter the
command:
curl -XPUT "[GUI_ROLE machine IP]:ES_PORT/_snapshot/es_backup" -H 'Content-Type: application/json' -d '{"type": "fs","settings": {"location": "es_backup_location","include_global_state": true,"compress": true}}'
- Create a snapshot:
es_snapshot:
curl -XPOST [GUI_ROLE machine IP]:ES_PORT/_snapshot/es_backup/data_backup?wait_for_completion=true -H 'Content-Type: application/json' -d '{ "indices": "lsf*,mo*,ibm*", "ignore_unavailable": true, "include_global_state": false }'
- Check the status of the
snapshot:
curl -XGET [GUI_ROLE machine IP]:ES_PORT/_snapshot/es_backup/data_backup?pretty
- Restart the services on each GUI_Host machine:
perfadmin start all
pmcadmin start
systemctl start logstash-for-lsf.service
systemctl start metricbeat-for-lsf.service
systemctl start filebeat-for-lsf.service
-
If your cluster does not have a secondary management host, promote one
LSF server host to become a secondary LSF
management host.
Using a secondary management host allows you to
evaluate the Fix Pack without affecting the primary management host and the live
cluster. If your cluster only has one management host and does not
have a secondary management host, you can
promote a single LSF server
host to a secondary LSF
management host.
-
Log in to the deployment host.
-
Navigate to the /opt/ibm/lsf_installer/playbook directory.
cd /opt/ibm/lsf_installer/playbook
-
Edit the lsf-inventory file.
Move one machine from the LSF_Server role to the
LSF_Masters role.
-
Test the new configuration by running the lsf-predeploy-test.yml
playbook.
ansible-playbook -i lsf-inventory lsf-predeploy-test.yml
Note: Do not use the --limit option because you are rebuilding configuration files
for the entire cluster.
-
If the test is successful, deploy the new configuration by running the
lsf-deploy.yml playbook.
ansible-playbook -i lsf-inventory lsf-deploy.yml
Note: Do not use the --limit option because you are rebuilding configuration files
for the entire cluster.
-
Apply the Fix Pack to the secondary LSF
management host
without affecting the LSF
management host.
From the Fix Pack downloaded in step 1, run the suite_fix.bin or
suite_fixpack.bin file on the deployment host.
From the /opt/ibm/lsf_installer/playbook directory, run the installation
with the lsf-upgrade.yml playbook with the --limit option to
apply the Fix Pack to just the secondary LSF
management host.
ansible-playbook -i lsf-inventory --limit {Secondary_Host} lsf-upgrade.yml
For
example,
ansible-playbook -i lsf-inventory --limit hostB lsf-upgrade.yml
-
Switch to the secondary LSF
management host to
evaluate the Fix Pack on the cluster.
-
Edit the /opt/ibm/lsfsuite/lsf/conf/lsf.conf file and switch the primary
and secondary management hosts.
Navigate to the LSF_MASTER_LIST parameter and switch the order of the
primary and secondary hosts:
LSF_MASTER_LIST="{Secondary_Host} {Primary_Host}"
-
Edit the
/opt/ibm/lsfsuite/lsf/conf/ego/cluster_name/ego.conf file
and switch the primary and secondary management hosts.
Navigate to the EGO_MASTER_LIST parameter and switch the order of the
primary and secondary hosts:
EGO_MASTER_LIST="{Secondary_Host} {Primary_Host}"
-
Restart the cluster to apply your changes.
Run the lsadmin reconfig command, wait until the command is complete, then run
the badmin mbdrestart command.
- Run the installation with the lsf-upgrade.yml playbook to deploy
the Fix Pack to the rest of your cluster.
-
Run the lsid command to verify that the secondary host is functioning as the
LSF
management host.
-
Run some commands to verify the update.
-
Run the lsid to see your cluster name and management host name.
-
Run the lshosts command to see the LSF
management host. The
LSF server hosts and client hosts are also listed.
-
Run the bhosts command to check that the status of each host is
ok, and the cluster is ready to accept work.
-
Test the cluster to evaluate the Fix Pack.
-
If the Fix Pack is working correctly, apply the Fix Pack to the rest of the cluster.
Switch back to the primary LSF
management host and
apply the Fix Pack to the rest of the cluster.
-
Edit the /opt/ibm/lsfsuite/lsf/conf/lsf.conf file and switch back to the
primary management
host.
Navigate to the LSF_MASTER_LIST parameter and switch the order of the
primary and secondary hosts:
LSF_MASTER_LIST="{Primary_Host} {Secondary_Host}"
-
Edit the
/opt/ibm/lsfsuite/lsf/conf/ego/cluster_name/ego.conf file
and switch back to the primary management host.
Navigate to the EGO_MASTER_LIST parameter and switch the order of the
primary and secondary hosts:
EGO_MASTER_LIST="{Primary_Host} {Secondary_Host}"
-
If you want to revert the secondary LSF
management host back
to LSF server
host, edit the /opt/ibm/lsf_installer/playbook/lsf-inventory file.
Move the secondary management host from the
LSF_Masters role to the LSF_Servers role.
-
Run the installation with the lsf-upgrade.yml playbook to deploy the Fix
Pack to the rest of your cluster.
ansible-playbook -i lsf-inventory lsf-upgrade.yml
This playbook shuts down the LSF
daemons, updates and rebuilds the contents of the shared configuration directory, then restarts the
LSF daemons.
Troubleshooting: If the Fix Pack is not working correctly, contact IBM Support for
assistance or revert your cluster to its prior state.
To revert your cluster to its prior state, switch back to the original primary LSF
management host and
revert the files using the backups.
- Edit the /opt/ibm/lsfsuite/lsf/conf/lsf.conf file and switch back to the
primary management
host.
Navigate to the LSF_MASTER_LIST parameter and switch the order of the
primary and secondary
hosts:
LSF_MASTER_LIST="{Primary_Host} {Secondary_Host}"
- Edit the
/opt/ibm/lsfsuite/lsf/conf/ego/cluster_name/ego.conf file
and switch back to the primary management host.
Navigate to the EGO_MASTER_LIST parameter and switch the order of
the primary and secondary
hosts:
EGO_MASTER_LIST="{Primary_Host} {Secondary_Host}"
- Revert the contents of the deployer (that is, the YUM repository) in the
/var/www/html/lsf_suite_pkgs directory.
You can either use a backup or an
older .bin file to restore the repository contents.
- Close the secondary LSF
management
host.
badmin hclose {Secondary_Host}
- Wait for any running jobs to finish on the secondary LSF
management host.
Run
the bhosts command to see if there are any running jobs on the secondary LSF
management
host.
bhosts {Secondary_Host}
You can proceed when the number of jobs
(NJOBS) is 0.
- Remove the LSF
packages from the secondary LSF
management
host.
yum remove lsf*
yum remove *lsf
yum clean all
- Reinstall the older packages to the entire
cluster.
ansible-playbook -i lsf-inventory lsf-deploy.yml
Troubleshooting: Restoring backup Elasticsearch data
To restore backed up Elasticsearch data, perform the following steps:
- Stop services on each GUI_Host machine:
perfadmin stop all
pmcadmin stop
systemctl stop logstash-for-lsf.service
systemctl stop metricbeat-for-lsf.service
systemctl stop filebeat-for-lsf.service
- To restore an index, delete the index you want to restore first by entering the following to
delete the
data:
curl -XDELETE [GUI_ROLE machine IP]:ES_PORT/[index_name]
curl -X POST "[GUI_ROLE machine IP]:ES_PORT/_snapshot/es_backup/data_backup/_restore" -H 'Content-Type: application/json' -d' { "indices": "index_name*", "ignore_unavailable": true, "include_global_state": true }'
For
example, to restore
lsf_events*
indices:
curl -XDELETE [GUI_ROLE machine IP]:ES_PORT/lsf_events*
curl -X POST "[GUI_ROLE machine IP]:ES_PORT/_snapshot/es_backup/data_backup/_restore" -H 'Content-Type: application/json' -d' { "indices": "lsf_events*", "ignore_unavailable": true, "include_global_state": true }'
- Restart the services on each GUI_Host machine:
perfadmin start all
pmcadmin start
systemctl start logstash-for-lsf.service
systemctl start metricbeat-for-lsf.service
systemctl start filebeat-for-lsf.service
- Clear browser data before logging in.