Upgrading Data Virtualization from Version 4.8 to Version 5.0

An instance administrator can upgrade Data Virtualization from Cloud Pak for Data Version 4.8 to Version 5.0.

Important: Starting in IBM Cloud Pak for Data Version 5.0, Data Virtualization is not supported on Red Hat® OpenShift® Container Platform Version 4.12. You must ensure your cluster is running Red Hat OpenShift Container Platform Version 4.14 or later.
Who needs to complete this task?

Instance administrator To upgrade Data Virtualization, you must be an instance administrator. An instance administrator has permission to manage software in the following projects:

The operators project for the instance

The operators for this instance of Data Virtualization are installed in the operators project. In the upgrade commands, the ${PROJECT_CPD_INST_OPERATORS} environment variable refers to the operators project.

The operands project for the instance

The custom resources for the control plane and Data Virtualization are installed in the operands project. In the upgrade commands, the ${PROJECT_CPD_INST_OPERANDS} environment variable refers to the operands project.

The tethered projects for the instance
If any projects are tethered to the operands project, you have permission to manage the software in the tethered projects.
When do you need to complete this task?

Review the following options to determine whether you need to complete this task:

  • If you want to upgrade the Cloud Pak for Data control plane and one or more services at the same time, follow the process in Upgrading an instance of Cloud Pak for Data instead.
  • If you didn't upgrade Data Virtualization when you upgraded the Cloud Pak for Data control plane, complete this task to upgrade Data Virtualization.

    Repeat as needed If you are responsible for multiple instances of Cloud Pak for Data, you can repeat this task to upgrade more instances of Data Virtualization on the cluster.

Information you need to complete this task

Review the following information before you upgrade Data Virtualization:

Version requirements

All the components that are associated with an instance of Cloud Pak for Data must be installed at the same release. For example, if the Cloud Pak for Data control plane is at Version 5.0.3, you must upgrade Data Virtualization to Version 5.0.3.

Environment variables
The commands in this task use environment variables so that you can run the commands exactly as written.
  • If you do not have the script that defines the environment variables, see Setting up installation environment variables.
  • To use the environment variables from the script, you must source the environment variables before you run the commands in this task. For example, run:
    source ./cpd_vars.sh
Common core services
Data Virtualization requires the Cloud Pak for Data common core services.

If the common core services are not at the correct version in the operands project for the instance, the common core services are automatically upgraded when you upgrade Data Virtualization. The common core services upgrade increases the amount of time the upgrade takes to complete.

Storage requirements
You don't need to specify storage when you upgrade Data Virtualization.

Before you begin

  • If you have Databricks connections prior to upgrade, then you must complete the Databricks pre-upgrade steps before you upgrade Data Virtualization. See Databricks pre-upgrade steps.
  • If you are upgrading Data Virtualization from a version older than Cloud Pak for Data 4.8.2 to a version newer than 4.8.2, you must temporarily update the Duplicate asset handling settings of your catalogs to Allow duplicates. After the upgrade, you can revert the Duplicate asset handling configuration. For steps on updating this setting, see Changing catalog settings.
  • Data Virtualization version numbers and upgrade paths
    Verify the Data Virtualization version numbers that you are upgrading from and to. See Supported upgrade paths in Data Virtualization.

This task assumes that the following prerequisites are met:

Prerequisite Where to find more information
The cluster meets the minimum requirements for Data Virtualization. If this task is not complete, see System requirements.
The workstation from which you will run the upgrade is set up as a client workstation and has the following command-line interfaces:
  • Cloud Pak for Data CLI: cpd-cli
  • OpenShift CLI: oc
If this task is not complete, see Updating client workstations.
The Cloud Pak for Data control plane is upgraded. If this task is not complete, see Upgrading an instance of Cloud Pak for Data.
For environments that use a private container registry, such as air-gapped environments, the Data Virtualization software images are mirrored to the private container registry. If this task is not complete, see Mirroring images to a private container registry.
For environments that use a private container registry, such as air-gapped environments, the cpd-cli is configured to pull the olm-utils-v3 image from the private container registry. If this task is not complete, see Pulling the olm-utils-v3 image from the private container registry.

Prerequisite services

Before you upgrade Data Virtualization, ensure that the following services are upgraded and running:

  • Db2 Data Management Console: If you do not manually upgrade Db2 Data Management Console, Data Virtualization upgrades it for you. If you have already upgraded Db2 Data Management Console, make sure that a Db2 Data Management Console instance has been provisioned. For more information, see Upgrading Db2 Data Management Console.

Procedure

Complete the following tasks to upgrade Data Virtualization:

  1. Upgrading the service
  2. Validating the upgrade
  3. Upgrading existing service instances
  4. Upgrading any remote connectors that are installed
  5. What to do next

Upgrading the service

Important: The Operator Lifecycle Manager (OLM) objects for Data Virtualization were updated when you upgraded the Cloud Pak for Data platform. The cpd-cli manage apply-olm updates all of the OLM objects in the operators project at the same time.

To upgrade Data Virtualization:

  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Update the custom resource for Data Virtualization.
    cpd-cli manage apply-cr \
    --components=dv \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
    --license_acceptance=true \
    --upgrade=true

Validating the upgrade

Data Virtualization is upgraded when the apply-cr command returns:
[SUCCESS]... The apply-cr command ran successfully

If you want to confirm that the custom resource status is Completed, you can run the cpd-cli manage get-cr-status command:

cpd-cli manage get-cr-status \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
--components=dv

Upgrading existing service instances

Attention: During the Data Virtualization instance upgrade, the caching pod initially enters a CrashLoop state. This is expected behavior because Big SQL stops at the beginning of the upgrade process. The caching pod remains in this state until the Data Virtualization pods restart and load the new docker images. If you suspect that the Data Virtualization upgrade has stalled, then check the Data Virtualization head pod logs.

Do not do the following without consulting IBM Support:
  • Shut down the Data Virtualization pods
  • Manually start or stop Big SQL or Db2
After the Data Virtualization pods restart with the updated docker images, the caching pod switches to a 0/1 Init state. It stays in this state until the Data Virtualization head pod completes the upgrade successfully.

After you upgrade Data Virtualization, you must upgrade any service instances that are associated with Data Virtualization.

Before you begin

Create a profile on the workstation from which you will upgrade the service instances.

The profile must be associated with a Cloud Pak for Data user who has either the following permissions:

  • Create service instances (can_provision)
  • Manage service instances (manage_service_instances)

For more information, see Creating a profile to use the cpd-cli management commands.

Attention: During the Data Virtualization instance upgrade, the caching pod initially enters a CrashLoop state. This is expected behavior because Big SQL stops at the beginning of the upgrade process. The caching pod remains in this state until the Data Virtualization pods restart and load the new docker images. If you suspect that the Data Virtualization upgrade has stalled, then check the Data Virtualization head pod logs.
Do not do the following without consulting IBM Support:
  • Shut down the Data Virtualization pods
  • Manually start or stop Big SQL or Db2
After the Data Virtualization pods restart with the updated docker images, the caching pod switches to a 0/1 Init state. It stays in this state until the Data Virtualization head pod completes the upgrade successfully.
  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Change to the project where Data Virtualization pods are installed.
    oc project ${PROJECT_CPD_INST_OPERANDS}
  3. Get the list of Data Virtualization service instances:
    cpd-cli service-instance list \
    --service-type=dv \
    --profile=${CPD_PROFILE_NAME}
  4. Upgrade one instance at a time. For each instance that you want to upgrade, complete the following steps:
    1. Set the WQ_INSTANCE_NAME environment variable to the name of the service instance that you want to upgrade:
      export WQ_INSTANCE_NAME=<instance-name>
    2. Run the following command to upgrade the instance:
      cpd-cli service-instance upgrade \
      --instance-name=${WQ_INSTANCE_NAME} \
      --service-type=dv \
      --profile=${CPD_PROFILE_NAME}
    3. Run one of the following commands to verify that the version now reads 3.0.3 for the upgraded instances:
      • cpd-cli service-instance list --service-type=dv --profile=${CPD_PROFILE_NAME}
      • oc get bigsql db2u-dv -o jsonpath='{.status.version}{"\n"}'
  5. Wait until the instance upgrades are complete before you proceed to the next step.

Upgrading remote connectors

If you installed remote connectors, you can upgrade the remote connectors by using the UPDATEREMOTECONNECTOR stored procedure. You run this procedure by using the SQL editor or the Db2 console on the cluster.

  • To update all remote connectors, run the following stored procedure.
    call dvsys.updateremoteconnector('',?,?)
  • If you need to upgrade a set of remote connectors, pass in a comma-separated list.
    call dvsys.updateremoteconnector('<REMOTE_CONNECTOR_NODES>',?,?)
    1. In the datavirtualization.env file, change the export JAVA_HOME file path to the Java 21 JRE file path.
      export JAVA_HOME=<Java 21 JRE file path>
    2. Start the agent by running this Linux® command:
      nohup ./datavirtualization_start.sh &
  • You can obtain the <REMOTE_CONNECTOR_NODES> by running the following command.

    select node_name from dvsys.listnodes where AGENT_CLASS='R'

What to do next

  1. After you upgrade to Data Virtualization on Cloud Pak for Data version 5.0.3, you must manually edit data sources that use SSL connections. Otherwise, the data sources are invalid and your queries on the data will fail. Complete the following steps to edit the SSL-enabled data sources so that they are valid for use:
    1. On the Data Virtualization Data sources page, find the SSL-enabled data sources that show a status of Invalid.
    2. Complete the following steps to edit each invalid data source:
      1. Select Edit connection at the end of the data source row.
      2. Make a minor change to the name or description, but don’t change any other values.
      3. Save the change to trigger an update to the data source.
      The data sources that you edited are now valid and you can proceed to use them in your queries.
  2. After you upgrade, all active or inactive caches with refresh schedules are reset. You must edit the active caches and set the refresh rate again. For more information, see Adding data caches in Data Virtualization.

Data Virtualization is ready to use. For more information, see Virtualizing data.