Data Gate instance failed after the upgrade from version 2.6.0 to 4.1.0

After you upgrade from version 2.6.0 to 4.1.0, the Data Gate instance becomes unavailable. Make sure that all upgrade-related jobs complete successfully and manually restart the instance.

Symptoms

At the end of the upgrade process, the Data Gate instance CR shows a Failed status, and the instance pod is not Ready to start.

You can retrieve status details by issuing the following get pod command with the instance ID as shown below:

oc get pod -n ${PROJECT_CPD_INST_OPERANDS} -l icpdsupport/app=dg-instance-server,\
icpdsupport/serviceInstanceId=`echo $DG_INSTANCE_ID | sed 's/^dg//'`

Where $DG_INSTANCE_ID is the instance identifier that you can retrieve by issuing the get dginstance command.

The get pod command will return an output that is similar to the following example:

NAME                                                      READY     STATUS     RESTARTS   AGE
dg-1691644067377701-data-gate-658c74788b-c7rgj            4/5       Running    0          3d

Only 4 of the 5 containers for the instance pod are currently running, and the instance pod cannot be started.

Causes

The Data Gate upgrade process automatically executes two jobs, one for data backup and another for code migration. The missing container and the instance failure indicate that one or both of the jobs did not run or complete successfully. You can check the job status by issuing the following get pod commands:

oc get pod -n ${PROJECT_CPD_INST_OPERANDS} | grep `echo $DG_INSTANCE_ID | sed 's/^dg//'`-backup-head-job
oc get pod -n ${PROJECT_CPD_INST_OPERANDS} | grep `echo $DG_INSTANCE_ID | sed 's/^dg//'`-migrate-datagate-api-job

If the jobs were successfully completed, you should see outputs that are similar to the following examples:

zen1  dg-1691644067377701-backup-head-job-jvttt           0/1  Completed   0  2d17h
zen1  dg-1691644067377701-migrate-datagate-api-job-mxn5z  0/1  Completed   0  2d16h

If the command returns nothing, it means that the corresponding upgrade job did not run or complete successfully, which caused the instance to fail.

Resolving the problem

You can upgrade and start the Data Gate instance manually by completing the following steps:

  1. Set the instance pod identifier variable by issuing the following get pod command:
    DG_POD=$(oc get pod -n ${PROJECT_CPD_INST_OPERANDS} -l icpdsupport/app=dg-instance-server,\
    icpdsupport/serviceInstanceId=`echo $DG_INSTANCE_ID | sed 's/^dg//'` -o jsonpath='{.items[0].metadata.name}')
  2. Remove the remnant flag files from the previous upgrade process by issuing the following command:
    oc exec -n ${PROJECT_CPD_INST_OPERANDS} ${DG_POD} -c data-gate-api \
    -- rm /head/backup/2.6.0/clone-api/migration_status_from_2.6.0_to_4.1.0 /head/clone-api/.JETTY.INITIALIZED
  3. Stop the instance API service by running the stop-api.sh script:
    oc exec -n ${PROJECT_CPD_INST_OPERANDS} ${DG_POD} -c data-gate-api /head/tools/datagate-api/stop-api.sh
  4. Upgrade the instance by running the migrate_data.sh script:
    oc exec -n ${PROJECT_CPD_INST_OPERANDS} ${DG_POD} -c data-gate-api \
    -- bash -c "/opt/ibm/dwa/bin/installation/clone-api/upgrade_rollback/migrate_data.sh 2.6.0 4.1.0"
  5. Start the instance API service by running the start-api.sh script:
    oc exec -n ${PROJECT_CPD_INST_OPERANDS} ${DG_POD} -c data-gate-api /head/tools/datagate-api/start-api.sh
  6. Verify that the upgrade process has completed successfully and that the instance is successfully started by issuing the following get dginstance command:
    oc get dginstance -n ${PROJECT_CPD_INST_OPERANDS} $DG_INSTANCE_ID -o jsonpath='{.status.datagateInstanceStatus} {"\n"}'

    Where $DG_INSTANCE_ID is the identifier you specified earlier.