IBM Support

Stopping and Starting the HA Cluster - IBM PureData Systems for Analytics

Troubleshooting


Problem

This is for the steps needed to stop the HA cluster for maintenance work.

Cause

Environment

N100x N200x

Resolving The Problem

Use the steps below to shutdown cluster services or HA services:

Copy the system and IMM ip's from the /etc/hosts file.

Open 2 putty sessions as ROOT, not to nz and su to root, and directly to the IP's of the hosts.

Verify the state of the cluster.
As root run:
# crm_mon -1
Checking that nothing says Stopped, OFFLINE or Error.

As root run on both hosts:
# service drbd status (which ever host shows /export/home/ and /nz/ mounted is the primary node)
# service heartbeat status


AS NZ run the following:

$ nzstop

$ /nz/kit/bin/nzhostbackup /nzscratch/20141115.bkp

When the backup is complete, scp the .bkp file to /nzscratch/ on the other host.

Run the rest as user root:

Stop heartbeat on passive node first:
# service heartbeat stop
Then repeat on primary

Stop drbd on passive first:
# service drbd stop
Then repeat on primary

This turns off the services so they do not automatically start after a reboot.
Run on both hosts:
# chkconfig heartbeat off
# chkconfig drbd off


Verify the services are set to off on both hosts.
# chkconfig --list heartbeat
# chkconfig --list drbd


Run on both hosts to confirm services are not loaded:
# service drbd status
# service heartbeat status


At this point, issue shutdown or reboot command.

Use the steps below to start the cluster:

All commands to be run as root.
Start DRBD both nodes:
In the Putty window for HA1 run:
# service drbd start
In the separate window for HA2 run:
# service drbd start

Start heartbeat both nodes. Start on the node you wish to be primary,
then start heartbeat on the other node.
# service heartbeat start

Run on both hosts to set services to auto start after a reboot:
# chkconfig heartbeat on
# chkconfig drbd on


Verify drbd and heartbeat on both nodes:
# service drbd status
# service heartbeat status
# chkconfig drbd --list
# chkconfig heartbeat --list


Run as root to watch and verify cluster services have started:
# crm_mon -i2

Once all the services have started, change to nz user on primary node.
Run the following command to watch the spu's boot.
$ watch nzhw -type spu

Once spu's have booted, Ctrl-C and check nzstate shows online.

$ nzstate
Online

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Cluster","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 October 2019

UID

swg21690487