Upgrading Data Virtualization from Version 3.5 to 4.0

A project administrator can upgrade Data Virtualization after upgrading IBM® Cloud Pak for Data from Version 3.5 to Version 4.0.x.

Important: You must export users from Data Virtualization on Cloud Pak for Data 3.5.3 before you upgrade to Cloud Pak for Data 4.0.1 or later. If you do not want to export and import existing users, you must create an empty /mnt/PV/versioned/dv_data/dv_instance_users.txt file. You must also copy any JAR files from JDBC drivers that you downloaded when you configured data source connections in Cloud Pak for Data 3.5.3. For more information, see Exporting users and custom JARs before you upgrade Data Virtualization.

Permissions you need for this task: You must be an administrator of the OpenShift® project (Kubernetes namespace) where Data Virtualization is installed.

Information you need to confirm before you start this task

Before you upgrade Data Virtualization, confirm the following information:

The name of the project where Data Virtualization is installed.
In Version 3.5, Data Virtualization is installed in the same project as Cloud Pak for Data.
The storage class or classes that you are using for your existing Data Virtualization installation. The storage must be the same or equivalent to the storage classes listed in Information you need to confirm before you start this task.

Remember: In most cases, the name of the storage class is the same as the currently used volume (run

oc -n project-name get pvc
dv-engine-data-dv-engine-0

to identify it). If you are using Portworx storage, the storage class that is supported by Data Virtualization is changing in Cloud Pak for Data 4.0.x to portworx-db2-rwx-sc. If you are using Red Hat® OpenShift Container Storage, starting with Data Virtualization 1.7.6, the recommended storage class for OpenShift Container Storage is ocs-storagecluster-ceph-rbd.

Information you need to complete this task

Data Virtualization requires a custom security context constraint (SCC). For details, see Creating required SCCs.
Data Virtualization requires the Cloud Pak for Data common core services. If the common core services are not installed in the project or are not at the correct version, the common core services will be automatically installed when you upgrade Data Virtualization, which will increase the amount of time the upgrade takes to complete.
Data Virtualization uses the following storage classes. If you don't use these storage classes on your cluster, ensure that you have a storage class with an equivalent definition:
- OpenShift Container Storage: ocs-storagecluster-ceph-rbd
- IBM Spectrum®: ibm-spectrum-scale-sc
- NFS: managed-nfs-storage
- Portworx: portworx-db2-rwx-sc
- IBM Cloud File Storage: ibmc-file-gold-gid or ibm-file-custom-gold-gid

Pre-upgrade tasks

You must export users from Data Virtualization on Cloud Pak for Data 3.5.3 before you upgrade to Cloud Pak for Data 4.0.2. If you do not want to export and import existing users, you must create an empty /mnt/PV/versioned/dv_data/dv_instance_users.txt file.
You must also copy any JAR files from JDBC drivers that you downloaded when you configured data source connections in Cloud Pak for Data 3.5.3. Copy the custom JAR files to /mnt/PV/versioned/private. For more information, see Exporting users and custom JARs before you upgrade Data Virtualization.
If you installed Db2® Data Management Console Version 3.5, you must upgrade it to the same version as Data Virtualization. For more information, see Upgrading Db2 Data Management Console from the Version 3.5 release.

Before you begin

Ensure that the cluster meets the minimum requirements for Data Virtualization. For details, see System requirements.

Additionally, ensure that a cluster administrator completed the required Upgrade preparation tasks for your environment. Specifically, verify that a cluster administrator completed the following tasks:

Cloud Pak for Data was upgraded. For details, see Upgrading Cloud Pak for Data.
For environments that use a private container registry, such as air-gapped environments, the Data Virtualization software images are mirrored to the private container registry. For details, see Mirroring images to your container registry.
The cluster is configured to pull the Data Virtualization software images. For details, see Configuring your cluster to pull images.
The Data Virtualization catalog source exists. For details, see Creating catalog sources.
The Data Virtualization operator subscription exists. For details, see Creating operator subscriptions.
The security context constraints (SCCs) required to run Data Virtualization exist. For details, see Creating required SCCs.
The node settings are adjusted for Data Virtualization. For details, see Changing required node settings.

If these tasks are not complete, the Data Virtualization upgrade will fail.

About this task

Upgrading Data Virtualization involves a fresh installation of Data Virtualization and then a custom migration of Data Virtualization data sources.

Prerequisite services

Before you upgrade Data Virtualization, ensure that the following services are upgraded and running:

Db2U: If you have not already upgraded the ibm-db2uoperator-catalog, create the Db2U catalog source. For more information, see Configuring your cluster to pull software images before upgrading from Version 3.5. Then, create the Db2U operator subscription. For more information, see Creating operator subscriptions before upgrading from Version 3.5.
Db2 Data Management Console: If you do not manually install Db2 Data Management Console, Data Virtualization installs it for you. If you have already installed Db2 Data Management Console, make sure that a Db2 Data Management Console instance has been upgraded and provisioned. For more information, see Upgrading Db2 Data Management Console.
Cloud Pak for Data common core services: Data Virtualization installs Common core services on your Cloud Pak for Data cluster if you do not have it installed. Data Virtualization upgrades Common core services if you have it installed.

Procedure

Complete the following tasks to upgrade Data Virtualization:

Prepare for the upgrade.
Upgrade the service.
Upgrade the Data Virtualization instance.
Verify the upgrade.
Upgrade remote connectors.
Find out what to do next.

Preparing for the upgrade

Prepare for the upgrade by backing up the Data Virtualization service and then deleting it.

Log in to your OpenShift cluster as a project administrator:
```
oc login OpenShift_URL:port
```
Change to the project where the Cloud Pak for Data control plane is installed:
```
oc project project-name
```

Back up the addon and the service provider:

oc get deployment dv-addon -o yaml > dv-addon-bak.yaml
oc get deployment dv-service-provider -o yaml > dv-service-provider-bak.yaml
oc get service dv-addon -o yaml > dv-addon-svc.yaml
oc get service dv-service-provider -o yaml > dv-service-provider-svc.yaml

Delete the addon and service provider:

oc delete deployment dv-addon
oc delete deployment dv-service-provider
oc delete service dv-addon
oc delete service dv-service-provider

Upgrading the Data Virtualization service

Follow the procedure for installing Data Virtualization in the upgraded IBM Cloud Pak for Data control plane.
Important: Do not provision the Data Virtualization instance from the IBM Cloud Pak for Data user interface. Stop after the Data Virtualization service pods, dv-addon and dv-service-provider, are up and running.
Get the status of Data Virtualization (dv-service):
```
oc get DvService dv-service -o jsonpath='{.status.conditions[?(@.type == "Successful")].status} {"\n"}'
```
Data Virtualization has been upgraded successfully when the command returns True.

Upgrading the Data Virtualization instance

To upgrade a Data Virtualization instance, do the following steps:

Download and extract the Data Virtualization migration .tar file.
1. In your browser, log in to the Cloud Pak for Data web client.
  By logging in to the web client, the download step does not require a login.
2. Download the .tar file by copying the following URL in a browser:
```
https://Cloud Pak for Data web client URL/icp4data-addon/dv/add-ons/upgrade.tgz
```
3. Copy the .tar file to your OpenShift infrastructure node.
4. Extract the .tar file:
```
tar -xzf upgrade.tgz
```
Update the values.template file in the templates subdirectory from the extracted .tar file and change the following settings:
headStorageSize
This value must be a PersistentVolume capacity specification that matches the current setting. You can check the current setting by running the following command:
```
oc -n project-name get pvc dv-engine-data-dv-engine-0 
```
Where project-name is the OpenShift project where the Data Virtualization pods were created.

Make sure to include the capacity unit (for example, Gi, after the numeric value).
storageClassName

This value must be the name of the storage class to use by the persistent volumes of the upgraded instance. In most cases, the name of the storage class is the same as the currently used volume (run oc -n project-name get pvc dv-engine-data-dv-engine-0 to identify it). If you are using Portworx storage, the storage class that is supported by Data Virtualization is changing in Cloud Pak for Data 4.0.x to portworx-db2-rwx-sc. If you are using Red Hat OpenShift Container Storage, starting with Data Virtualization 1.7.6, the recommended storage class for OpenShift Container Storage is ocs-storagecluster-ceph-rbd.
Ensure that you have completed the pre-upgrade steps:
1. You must export users from Data Virtualization on Cloud Pak for Data 3.5.3 before you upgrade to Cloud Pak for Data 4.0.2. If you do not want to export and import existing users, you must create an empty /mnt/PV/versioned/dv_data/dv_instance_users.txt file.
2. You must also copy any JAR files from JDBC drivers that you downloaded when you configured data source connections in Cloud Pak for Data 3.5.3. Copy the custom JAR files to /mnt/PV/versioned/private. For more information, see Exporting users and custom JARs before you upgrade Data Virtualization.
3. If you installed Db2 Data Management Console Version 3.5, you must upgrade it to the same version as Data Virtualization. For more information, see Upgrading Db2 Data Management Console from the Version 3.5 release.
Run the upgrade script. This makes a backup of Data Virtualization that is used during the db2u provisioning stage. For more information, see Provisioning the Data Virtualization service.
```
./dv-migration.sh
```
Provision Data Virtualization from the user interface. Data Virtualization automatically runs the upgrade as soon as you provision a Data Virtualization instance.

Verifying the upgrade

When you create the custom resource, the Data Virtualization operator processes the contents of the custom resource and updates the microservices that comprise Data Virtualization, including DvService. (The DvService microservice is defined by the dv-service custom resource.) Data Virtualization is upgraded when the DvService status is True.

To check the status of the upgrade:

Change to the project where the Cloud Pak for Data control plane is installed:
```
oc project project-name
```
Log in to the Data Virtualization head pod.
```
oc rsh c-db2u-dv-db2u-0 bash
```
Verify that the following files exist:
- /mnt/bludata0/dv/versioned/marker_files/.upgraded
- db2uctl markers list, which contains the following content.
```
(Db2u) QP_START_PERFORMED
(Db2u) DV_CACHE_INITIALIZED
```
Verify that the following files do not exist.
- /mnt/bludata0/dv/versioned/marker_files/.fgac_state
- /mnt/bludata0/dv/versioned/marker_files/.is_dv_upgrade

Upgrading remote connectors

You can upgrade remote connectors by using the UPDATEREMOTECONNECTOR stored procedure. You can run this procedure by using the SQL editor or the Db2 console on the cluster.

To update all remote connectors, run the following stored procedure.
```
call dvsys.updateremoteconnector('',?,?)
```
If you need to upgrade a set of remote connectors, pass in a comma-separated list.
```
call dvsys.updateremoteconnector('<REMOTE_CONNECTOR_NODES>',?,?)
```
You can obtain the <REMOTE_CONNECTOR_NODES> by running the following command.
```
select node_name from dvsys.listnodes where AGENT_CLASS='R'
```
If you notice that remote connectors do not appear in the user interface after the upgrade, run the following stored procedure on the head pod.
```
CALL DVSYS.DEFINEGATEWAYS('<hostname>:<port>')
```
Where <hostname> is the hostname of the remote connector and <port> is the port number used by the remote connector to connect to Data Virtualization. After you run this stored procedure, the remote connector appears in the user interface and when you run dvsys.listnodes.

See also Defining gateway configuration to access isolated remote connectors.
To troubleshoot issues, see Updating remote connectors might fail with a Java™ exception after you upgrade Data Virtualization.

Rolling back the upgrade

Rolling back the upgrade is not supported when upgrading from the Version 3.5 release to a Version 4.0.x release.

What to do next

Set the OwnerReference for any manually created PVCs. This step ensures that the PVC is deleted when the instance is deleted in the future. This step is not required for upgrade to work and can be done at any time before you delete the instance.
```
oc patch pvc/bigsql-c-db2u-dv-db2u-0 -p '{"metadata":{"ownerReferences":
[{"apiVersion":"db2u.databases.ibm.com/v1","kind":"Formation","name":"db2u-dv","uid":"'$(oc get db2u db2u-dv -o 
jsonpath='{.metadata.uid}')'"}]}}' --type=merge
```
Update the host and port of all connections to Data Virtualization to new connection values, which can be retrieved from the Connection Information page. You must apply these values to all external connections, connections in Notebooks, and connections in Cloud Pak for Data that weren't created with the Data Virtualization connection type.
After Data Virtualization is upgraded, a new certificate is used for making SSL connections to Data Virtualization. You can download the new SSL certificate from the Data Virtualization instance page in the IBM Cloud Pak for Data web interface to update the configuration of applications connecting to Data Virtualization.
Configure network requirements, including HAPROXY settings, if required. For more information, see Network requirements for Data Virtualization.
Optionally, set up automatic pruning of the archive log.

Data Virtualization is ready to use. For more information, see Virtualizing data with Data Virtualization.