Data Virtualization post-restore hooks fail because pods are unreachable

When you do offline restores, Data Virtualization pods might show as running, but the IP addresses that are assigned to them are invalid and the pods cannot connect to each other. The Data Virtualization post restore hooks might fail because the pods are unreachable.

Symptoms

Note:

Data Virtualization instances can be provisioned on multiple namespaces (the control plane namespace or tethered namespaces). Be sure to use the correct Data Virtualization instance namespace when you complete these steps.

Run the following command to check the pods:

oc -n <namespace> logs <podname>

For example, to check head pod logs for a Data Virtualization instance that is provisioned in a control plane namespace zen, run the following command:
```
oc -n zen logs c-db2u-dv-db2u-0
```
To check head pod logs for a Data Virtualization instance that is provisioned in a tethered namespace tn1, run the following command:
```
oc -n tn1 logs c-db2u-dv-db2u-0
```

Check if the pod errors are similar to the following:

Error: Head pod c-db2u-dv-db2u-0 logs :
Error: unable to initialize: Get "https://xxx.xxx.xx.x:443/api/v1/namespaces/dv1/configmaps/c-db2u-dv-db2u-api": dial tcp xxx.xxx.xx.x:443: connect: no route to host

Hurricane pod logs :
Error: dial tcp: lookup c-db2u-dv-db2u-internal.dv1.svc on 172.30.0.10:53: read udp 10.254.16.131:40797->172.30.0.10:53: i/o timeout

oc -n <namespace> rsh c-db2u-dv-db2u-0 bash
su db2inst1
bigsql status

The bigsql status might show the following error:

[db2inst1@c-db2u-dv-db2u-0 - Db2U /]$ bigsql status
SERVICE              HOSTNAME                               NODE      PID STATUS
                     c-db2u-dv-db2u-1.c-db2u-dv-db2u-internal.zen.svc.cluster.local    -        - Unreachable
Big SQL Master       c-db2u-dv-db2u-0.c-db2u-dv-db2u-internal.zen.svc.cluster.local    0        - DB2 Not Running
Error: dial tcp: lookup c-db2u-dv-db2u-internal.zen.svc on 172.30.0.10:53: read udp 10.254.20.103:45777->172.30.0.10:53: i/o timeout
Error: dial tcp: lookup c-db2u-dv-db2u-internal.zen.svc on 172.30.0.10:53: read udp 10.254.20.103:52983->172.30.0.10:53: i/o timeout

Causes

After a restore, pods use the same IP address as what is used in the backup but the IP address gets assigned to different worker nodes. As a result, pods get assigned IP addresses that do not belong to the subnet of their worker node.

Diagnosing the problem

Note:

Get the worker node to which the pod is assigned:

oc -n <namespace> describe po <podname>  | grep Node

Get the IP address that is assigned to the pod:

oc  -n <namespace> describe po <podname>  | grep  IP

Check the IP address and subnet of the worker node that you obtained in step 1:

oc describe node <workernodename>| grep IP
oc describe node <workernodename>| grep subnet

Download the Kubernetes resources backup file (for example, tenant-offline-b2).

To get the backup file name, run the following command:
```
cpd-cli oadp backup ls
```
To download the backup file, run the following command:
```
cpd-cli oadp backup download <backupname> 
```
Unzip the backup tar file <backupname>.tar.gz:
```
tar -zxvf  <backupname>.tar.gz
```

Get the IP address and worker node details from the backup file of a pod that has the issue:

For the head pod c-db2u-dv-db2u-0, run the following commands:

cat resources/pods/namespaces/<namespace>/c-db2u-dv-db2u-0.json |python -m json.tool | grep nodeName

cat resources/pods/namespaces/<namespace>/c-db2u-dv-db2u-0.json |python -m json.tool | grep podIP

For the worker pod c-db2u-dv-db2u-1, run the following commands:

cat resources/pods/namespaces/<namespace>/c-db2u-dv-db2u-1.json |python -m json.tool | grep nodeName

cat resources/pods/namespaces/<namespace>/c-db2u-dv-db2u-1.json |python -m json.tool | grep podIP

For the hurricane pod, run the following commands:

cat resources/pods/namespaces/<namespace>/<hurricane-podname>.json |python -m json.tool | grep nodeName

cat resources/pods/namespaces/<namespace>/<hurricane-podname>.json |python -m json.tool | grep podIP

For the dv-utils pod c-db2u-dv-dvutils-0, run the following commands:

cat resources/pods/namespaces/<namespace>/c-db2u-dv-dvutils-0.json |python -m json.tool | grep nodeName

cat resources/pods/namespaces/<namespace>/c-db2u-dv-dvutils-0.json |python -m json.tool | grep podIP

Compare the worker node and IP address that you obtained in step 6 with the worker node and IP address that you obtained in Steps 1 and 2. If the IP address is same and worker node is different, then the issue is as described above. Proceed to complete the resolution steps.
Repeat steps 1 - 7 for all Data Virtualization pods that have this issue.

Resolving the problem

Restart the Data Virtualization pods to refresh the IP address:
```
oc -n <namespace> delete pod <podaname> 
```

Re-run the Data Virtualization post-restore hooks if they have failed:

Run the following commands for head, worker, and hurricane pods:

oc -n <namespace> rsh <podname> bash 
 su db2inst1
 /db2u/scripts/bigsql-exec.sh /usr/ibmpacks/current/bigsql/bigsql/bigsql-cli/BIGSQL/package/scripts/bigsql-db2ubar-hook.sh -H POST -M RESTORE

Run the following commands for the dv-util pod:

oc -n <namespace> rsh <dvutil-podname> bash 
su db2inst1
/opt/dv/current/dv-utils.sh -o start --is-bar