IBM Support

Cancelling Db2 fixpack/modpack update results in RSCT subsystems failing to start up causing the host to become unavailable

Troubleshooting


Problem

Cancelling an uncommitted Db2 online/offline fixpack/modpack update results in RSCT subsystems failing to start up causing the host to become unavailable.

Symptom

1. While cancelling an uncommitted online/offline fixpack/modpack update to Db2 V11.1+ from higher releases of Db2 or to higher releases of Db2 V11.5.0.0 Special build from V11.5.0.0, installFixPack command encounters the following error: 

Task #6 start 
Description: Exiting cluster management software out of maintenance mode 
Estimated time 6 second(s) 
Task #6 end 
 
rc = 5078 
For more information see the DB2 installation log at "/tmp/rollback2.log". 
DBI1264E  This program failed. Errors encountered during execution were 
     written to the installation log file. Program name: 
     installFixPack. Log file name: /tmp/rollback2.log. 

 
Explanation: 
 
This message is returned when some processes and operations have failed. 
Detailed information about the error was written to the log file. 
 
User response: 
 
Contact IBM support to get assistance in resolving this issue. Keep the 
log file intact as this file is an important reference for IBM support. 
 
 
Related information: 
Contacting IBM Software Support 

2. After rebooting a host where update to Db2 V11.1.4.5+ or Db2 V11.5.0.0+ is cancelled, Db2 on the host wouldn’t start up.

Cause

“/usr/lib/systemd/system/srcmstr.service” is removed and not re-created while cancelling update procedure. 

Environment

Intel Linux and Power Linux LE (RHEL 7+ or SLES 12+) 

Diagnosing The Problem

Diagnosis for Symptom #1: 

Checking “/tmp/rollback2.log”(the installFixPack log file), exiting cluster manager maintenance is failing: 

Exiting cluster management software out of maintenance mode task of a rolling update operation failed. 

Execution of a rolling update task failed with an error. 

Error message: 

Unable to exit maintenance mode on the local host. 

A diagnostic log has been saved to '/tmp/ibm.db2.cluster.OVsHSZ'. 

Refer to db2diag.log for more details. 

DBI1580E  The fix pack update operation failed because the cluster 

      manager cannot exit maintenance mode. 

In “/tmp/ibm.db2.cluster.OVsHSZ”(Note: The filename may vary), the following errors are logged: 

2019-11-25-13.36.53.059335-300 I3353E419             LEVEL: Error 

PID     : 2819                 TID : 139997070636928 PROC : db2cluster 

INSTANCE:                      NODE : 000 

HOSTNAME: myhost1

FUNCTION: DB2 UDB, high avail services, getLastErrMsg, probe:390 

DATA #1 : signed integer, 4 bytes 

DATA #2 : String, 84 bytes 

Line # : 390--- 2610-602 A session could not be established with the RMC subsystem. 

.. 

2019-11-25-13.36.53.060593-300 E4183E383             LEVEL: Error 

PID     : 2819                 TID : 139997070636928 PROC : db2cluster 

INSTANCE:                      NODE : 000 

HOSTNAME: myhost1

FUNCTION: DB2 UDB, high avail services, sqlhaSetupHAInfrastructure, probe:80 

RETCODE : ECF=0x90000534=-1879046860=ECF_SQLHA_OPEN_FAILED 

          Error Opening Cluster Manager 

Diagnosis for Symptom #2: 

After rebooting a host where update to Db2 V11.1.4.5+ or Db2 V11.5.0.0+ is cancelled, Db2 on the host wouldn’t start up since RSCT subsystems are offline. Run the following command to verify the state:

$ lsrpdomain 

/opt/rsct/bin/lsrsrc-api: 2612-022 A session could not be established with the RMC daemon on "local_node". 

Resolving The Problem

The following commands need to be executed as root user: 

1. Verify that “/usr/lib/systemd/system/srcmstr.service” is missing 

2. Run “srcmstrctrl -A” to recreate the srcmstr.service file 

$ /sbin/srcmstrctrl -A 
Adding /usr/lib/systemd/system/srcmstr.service for systemctl... 

3. Run “rmcctrl -z; rmcctrl -A” to restart RSCT subsystems 

$ rmcctrl -z; rmcctrl -A 
0513-071 The ctrmc Subsystem has been added. 
Adding /usr/lib/systemd/system/ctrmc.service for systemctl ... 
Created symlink from /etc/systemd/system/multi-user.target.wants/ctrmc.service to /usr/lib/systemd/system/ctrmc.service. 
Created symlink from /etc/systemd/system/network-online.target.wants/ctrmc-start-only.service to /usr/lib/systemd/system/ctrmc-start-only.service. 
0513-059 The ctrmc Subsystem has been started. Subsystem PID is 19435. 

4. Verify that “/usr/lib/systemd/system/srcmstr.service” exists 

 $ ls -l /usr/lib/systemd/system/srcmstr.service 
-rw-r--r-- 1 root root 416 Nov 25 22:08 /usr/lib/systemd/system/srcmstr.service 

5. Run “export DB2INSTANCE=<instance-name>;<sqllib-path>/bin/db2cluster -cm -exit -maintenance” to exit cluster manager from maintenance mode 

6. Run “export DB2INSTANCE=<instance-name>;<sqllib-path>/bin/db2cluster -cfs -exit -maintenance” to exit cluster filesystem from maintenance mode 

7. Re-run the failing installFixPack command again

8. As instance owner, start the Db2 instance on the host
db2start instance on host-name

9. To start the database manager, as instance owner, run the db2start command:
db2start member/cf member-id/cf-id

Note: These steps need to run on every host where the update is cancelled. 

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"pureScale","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
07 December 2022

UID

ibm11117533