Data Virtualization post-restore hooks fail because pods are unreachable
When you do offline restores, Data Virtualization pods might show as running, but the IP addresses that are assigned to them are invalid and the pods cannot connect to each other. The Data Virtualization post restore hooks might fail because the pods are unreachable.
Symptoms
Data Virtualization instances can be provisioned on multiple namespaces (the control plane namespace or tethered namespaces). Be sure to use the correct Data Virtualization instance namespace when you complete these steps.
- Run the following command to check the pods:
oc -n <namespace> logs <podname>- For example, to check head pod logs for a Data
Virtualization instance that is provisioned in a
control plane namespace
zen, run the following command:oc -n zen logs c-db2u-dv-db2u-0 - To check head pod logs for a Data
Virtualization instance that is provisioned in a tethered
namespace
tn1, run the following command:oc -n tn1 logs c-db2u-dv-db2u-0
Check if the pod errors are similar to the following:
Error: Head pod c-db2u-dv-db2u-0 logs : Error: unable to initialize: Get "https://xxx.xxx.xx.x:443/api/v1/namespaces/dv1/configmaps/c-db2u-dv-db2u-api": dial tcp xxx.xxx.xx.x:443: connect: no route to host Hurricane pod logs : Error: dial tcp: lookup c-db2u-dv-db2u-internal.dv1.svc on 172.30.0.10:53: read udp 10.254.16.131:40797->172.30.0.10:53: i/o timeout - For example, to check head pod logs for a Data
Virtualization instance that is provisioned in a
control plane namespace
- Login to the head pod and check the
bigsqlstatus:
Theoc -n <namespace> rsh c-db2u-dv-db2u-0 bash su db2inst1 bigsql statusbigsqlstatus might show the following error:[db2inst1@c-db2u-dv-db2u-0 - Db2U /]$ bigsql status SERVICE HOSTNAME NODE PID STATUS c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal.zen.svc.cluster.local - - Unreachable Big SQL Master c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.zen.svc.cluster.local 0 - DB2 Not Running Error: dial tcp: lookup c-db2u-dv-db2u-internal.zen.svc on 172.30.0.10:53: read udp 10.254.20.103:45777->172.30.0.10:53: i/o timeout Error: dial tcp: lookup c-db2u-dv-db2u-internal.zen.svc on 172.30.0.10:53: read udp 10.254.20.103:52983->172.30.0.10:53: i/o timeout
Causes
After a restore, pods use the same IP address as what is used in the backup but the IP address gets assigned to different worker nodes. As a result, pods get assigned IP addresses that do not belong to the subnet of their worker node.Diagnosing the problem
Data Virtualization instances can be provisioned on multiple namespaces (the control plane namespace or tethered namespaces). Be sure to use the correct Data Virtualization instance namespace when you complete these steps.
- Get the worker node to which the pod is
assigned:
oc -n <namespace> describe po <podname> | grep Node - Get the IP address that is assigned to the
pod:
oc -n <namespace> describe po <podname> | grep IP - Check the IP address and subnet of the worker node that you obtained in step
1:
oc describe node <workernodename>| grep IP oc describe node <workernodename>| grep subnet Download the Kubernetes resources backup file (for example, tenant-offline-b2).
To get the backup file name, run the following command:
To download the backup file, run the following command:cpd-cli oadp backup lscpd-cli oadp backup download <backupname>- Unzip the backup tar file
<backupname>.tar.gz:
tar -zxvf <backupname>.tar.gz - Get the IP address and worker node details from the backup file of a pod that has the issue:
- For the head pod
c-db2u-dv-db2u-0, run the following commands:cat resources/pods/namespaces/<namespace>/c-db2u-dv-db2u-0.json |python -m json.tool | grep nodeNamecat resources/pods/namespaces/<namespace>/c-db2u-dv-db2u-0.json |python -m json.tool | grep podIP - For the worker pod
c-db2u-dv-db2u-1, run the following commands:cat resources/pods/namespaces/<namespace>/c-db2u-dv-db2u-1.json |python -m json.tool | grep nodeNamecat resources/pods/namespaces/<namespace>/c-db2u-dv-db2u-1.json |python -m json.tool | grep podIP -
For the hurricane pod, run the following commands:
cat resources/pods/namespaces/<namespace>/<hurricane-podname>.json |python -m json.tool | grep nodeNamecat resources/pods/namespaces/<namespace>/<hurricane-podname>.json |python -m json.tool | grep podIP - For the dv-utils pod
c-db2u-dv-dvutils-0, run the following commands:cat resources/pods/namespaces/<namespace>/c-db2u-dv-dvutils-0.json |python -m json.tool | grep nodeNamecat resources/pods/namespaces/<namespace>/c-db2u-dv-dvutils-0.json |python -m json.tool | grep podIP
- For the head pod
- Compare the worker node and IP address that you obtained in step 6 with the worker node and IP address that you obtained in Steps 1 and 2. If the IP address is same and worker node is different, then the issue is as described above. Proceed to complete the resolution steps.
- Repeat steps 1 - 7 for all Data Virtualization pods that have this issue.
Resolving the problem
- Restart the Data
Virtualization pods to refresh the IP
address:
oc -n <namespace> delete pod <podaname> - Re-run the Data
Virtualization post-restore hooks if they have failed:
- Run the following commands for head, worker, and hurricane
pods:
oc -n <namespace> rsh <podname> bash su db2inst1 /db2u/scripts/bigsql-exec.sh /usr/ibmpacks/current/bigsql/bigsql/bigsql-cli/BIGSQL/package/scripts/bigsql-db2ubar-hook.sh -H POST -M RESTORE - Run the following commands for the
dv-utilpod:oc -n <namespace> rsh <dvutil-podname> bash su db2inst1 /opt/dv/current/dv-utils.sh -o start --is-bar
- Run the following commands for head, worker, and hurricane
pods: