How to move SAP HANA database resources back to the correct node after a failover

How To

Summary

In a scale-up SLES for SAP cost-optimized configuration, there are non-production HANA databases running on the secondary node. If there is a problem with the production database resource on the primary node, it fails over to the secondary node. The non-production database resources are then stopped, until the production database can be failed back to the primary node.

Objective

The aim is to return to normal operation, with all the resources running in the usual location.

Environment

SLES for SAP cost-optimized configuration
Production HANA database resource runs on both nodes for high availability
One or more non-production HANA databases runs on secondary node

Steps

Check that both cluster nodes are online, and that the resources are failed over. Here is an example. You can run the command as root on either node.

# crm_mon -A -1
Stack: corosync
Current DC: saphana-node4-kvm (version 1.1.19+20181105.ccd6b5b10-3.16.1-1.1.19+20181105.ccd6b5b10) - partition with quorum
Last updated: Thu Jul  2 08:05:59 2020
Last change: Thu Jul  2 08:05:27 2020 by root via crm_attribute on saphana-node4-kvm

2 nodes configured
7 resources configured

Online: [ saphana-node3-kvm saphana-node4-kvm ]

Active resources:

 rsc_ip_HA1_HDB10	(ocf::heartbeat:IPaddr2):	Started saphana-node4-kvm
 Master/Slave Set: msl_SAPHana_HA1_HDB10 [rsc_SAPHana_HA1_HDB10]
     Masters: [ saphana-node4-kvm ]
     Slaves: [ saphana-node3-kvm ]
 Clone Set: cln_SAPHanaTopology_HA1_HDB10 [rsc_SAPHanaTopology_HA1_HDB10]
     Started: [ saphana-node3-kvm saphana-node4-kvm ]
 stonith-sbd	(stonith:external/sbd):	Started saphana-node3-kvm

Node Attributes:
* Node saphana-node3-kvm:
    + hana_ha1_clone_state            	: DEMOTED   
    + hana_ha1_op_mode                	: logreplay 
    + hana_ha1_remoteHost             	: saphana-node4-kvm
    + hana_ha1_roles                  	: 4:S:master1:master:worker:master
    + hana_ha1_site                   	: SITEA     
    + hana_ha1_srmode                 	: syncmem   
    + hana_ha1_sync_state             	: SOK       
    + hana_ha1_version                	: 2.00.040.00.1553674765
    + hana_ha1_vhost                  	: saphana-node3-kvm
    + lpa_ha1_lpt                     	: 30        
    + maintenance                     	: off       
    + master-rsc_SAPHana_HA1_HDB10    	: 100       
* Node saphana-node4-kvm:
    + hana_ha1_clone_state            	: PROMOTED  
    + hana_ha1_op_mode                	: logreplay 
    + hana_ha1_remoteHost             	: saphana-node3-kvm
    + hana_ha1_roles                  	: 4:P:master1:master:worker:master
    + hana_ha1_site                   	: SITEB     
    + hana_ha1_srmode                 	: syncmem   
    + hana_ha1_sync_state             	: PRIM      
    + hana_ha1_version                	: 2.00.040.00.1553674765
    + hana_ha1_vhost                  	: saphana-node4-kvm
    + lpa_ha1_lpt                     	: 1593691527
    + master-rsc_SAPHana_HA1_HDB10    	: 150

In this case, node3 is the primary node and node4 is the secondary.
Both nodes are online, but the "master" database resource is running on the secondary node, and the "slave" resource is running on the primary.
Check the hana_<sid>_sync_state. In this case, it is now PRIM on the failover node, and SOK on the primary node.

Note: If the sync state is not SOK, fail back is not possible.

Also, you need to check that AUTOMATED_REGISTER is set to "true" for the production database resource primitive. Run this command as root on either node, and look for AUTOMATED_REGISTER.
```
/usr/sbin/crm configure show
```
If the status is SOK on the intended primary node, put the failover node into standby mode. You can run this command on either cluster node. The -w option waits for the resources to move before the command finishes.
```
crm -w node standby saphana-node4-kvm
```
Next, run this command as root on either node to monitor the resources and watch the database resource fail back over to the primary node.
```
crm_mon -r -n
```
Check the sr_state once more to confirm that it now says PRIM on the primary node and SOK on the secondary.
When that is done, you can take the secondary node out of standby.
```
crm node online saphana-node4-kvm
```
Check the status of the resources once more.
```
crm_mon -A -1
```
We expect these results:
The production database msl resource is master on the primary node.
The production database is slave on the secondary node.
The primary node is PROMOTED and the secondary is DEMOTED.
The sr_state is PRIM on the primary node and SOK on the secondary.

Additional Information

Here are some general tips:

Read this page carefully.
Make sure you understand the concepts before you make changes to a production system.
Make one change at a time.
Allow time for each action to finish.
Monitor the cluster resources for some time to make sure there are no unintended consequences, before you continue with the next step.

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGMV168","label":"SUSE Linux Enterprise Server"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB57","label":"Power"}}]

Was this topic helpful?

Document Information

More support for:
SUSE Linux Enterprise Server

Software version:
All Version(s)

Document number:
6243390

Modified date:
01 April 2021

UID

ibm16243390

IBM Support

Tips