IBM Support

How to rebuild the corrupted WDP-DB2-0 database

Troubleshooting


Problem

WDP-DB2-0 pod is inconsistent state and it is unable to recover due to database got corrupted.

Symptom

The WDP-DB2-0 pod is running, but not in a running state. The pod log reports issue:
DB20000I The ACTIVATE DATABASE command completed successfully.
Activate ILGDB DB
SQL1042C An unexpected system error occurred. SQLSTATE=58004

Cause

This is usually caused by the node where the pod is running got restarted and the persistent storage attached to the pod is not available or corrupted.

Environment

Cloud Pak for Data
Cloud Pak for Data Systems
OpenShift 3.11, 4.3, 4.5, 4.6

Diagnosing The Problem

Check status of the  nodes, pods and persistent storages
oc get pods -o wide | egrep -iv '1/1|2/2|3/3|4/4|completed'
oc logs wdp-db2-0
oc get nodes
oc get pv
oc get pvc
oc describe pods wdp-db2-0
Check the status of the persistent volume claim and storage attached to the wdp-db2-0 pod
oc get pvc  $(oc describe pods wdp-db2-0 | grep ClaimName | awk '{ print $2 }' )

oc get pv $(oc get pvc  --no-headers $(oc describe pods wdp-db2-0 | grep ClaimName | awk '{ print $2 }' ) | awk '{ print $3 }')

Resolving The Problem

Option 1: 

Step 1: Stop and restart the db2 instance

Run the following command

oc rsh wdb-db2-0
su - db2inst1

db2 force application all
db2stop

db2start
db2 activate db ilgdb
db2 activate db bgdb
db2 activate db lineage
db2 activate db wfdb

exit
exit



Step 2.Delete the pods to restart 
oc delete pods wdp-db2-0
Option 2: 
Only, If the above steps did not recover the instance, please proceed to next step
If a system has been rebooted without gracefully shutting DB2 down first, the database manager would not have a chance to remove any locks from files it was holding.  Follow the below technote.

https://www.ibm.com/support/pages/how-resolve-sql1042c%C2%A0%C2%A0unexpected%C2%A0system%C2%A0error%C2%A0occurred%C2%A0upon-database-connection-after-unexpected-outage

Option 3: Rebuilding the wdp-db2-0 and database gets initialized, manual re-sync is required.

In this option, existing persistent storage will be deleted and recreated to fix the corrupted storage volume.

Scale down the wdp-db2-0 pod:

oc scale sts wdp-db2  --replicas=0

image 7955

Delete the persistent storage volume

# delete pv
export PV=$(oc get pvc  --no-headers $(oc describe pods wdp-db2-0 | grep ClaimName | awk '{ print $2 }' ) | awk '{ print $3 }')
oc delete pv $PV

# delete pvc
export PVC=$(oc describe pods wdp-db2-0 | grep ClaimName | awk '{ print $2 }' )
oc delete pvc $PVC

image 7956

image 7957

Scale back the wdp-db2-0 to 1

oc scale sts wdp-db2  --replicas=1

image 7958

This will rebuild the wdp-db2-0 datatabase instance. You can check the status  in pods log

oc logs wdp-db2-0 -f

image 7959

image 7960

Wait for the pods to come up. After that you need to resync the glossary terms, workflow

delete the following service pods:

wkc-glossary-service, wkc-workflow-service, wdp-policy-service, wdp-lineage

oc delete pods $(oc get pods --no-headers | egrep 'wkc-glossary-service|wdp-policy-service|wdp-lineage|wkc-workflow-service'|grep -iv "Completed")

Check the status of the rebuilt database and tables

oc rsh wdp-db2-0

su - db2inst1

db2 connect to ilgdb

db2 "select tabname from syscat.tables where tabschema = 'DB2INST1'"

image 7961

image 7962

 Now you can notice the database is rebuilt.

image 7963

Document Location

Worldwide

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m50000000ClUzAAK","label":"Administration"},{"code":"a8m50000000ClVtAAK","label":"Organize->Governance"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
25 January 2021

UID

ibm16407820