Provisioning the service (Data Virtualization)

Before you use Data Virtualization, you must provision an instance of the service to your IBM Cloud Pak for Data.

Before you begin

Before you provision the Data Virtualization service, you must:

Ensure that you meet system, service, and semaphore requirements. See Preparing to install the service for more information.
Install and deploy the service.
Ensure that you create the storage classes to use in Data Virtualization. For details, see Storage considerations.

About this task

The Data Virtualization service is provisioned to any compute node in the Cloud Pak for Data cluster that has the specified resources (cores and memory) available.

Important: To complete this task, you must have the Provision databases permission. The default Cloud Pak for Data administrator role, Admin, has this permission.

Procedure

To provision the Data Virtualization service:

From the main menu, click the Services > Instances.
From the list of instances, locate the Data Virtualization service, click the action menu, and select Provision instance.
If you manually set the kernel semaphore parameter, check the You must check this box if you updated the kernel semaphore parameter box.
If the Linux® Kernel version on the cluster nodes is less than 4.6, you must update the kernel semaphore parameter. For details, see Preparing to install the service.
If you manually update the kernel semaphore parameter and you do not check the corresponding box, the Data Virtualization service provisioning will fail.
To configure the service, specify the resources that you want to allocate to the Data Virtualization worker nodes in the Nodes step.
1. Specify the number of Data Virtualization worker nodes to allocate to the service.
  
  Recommended: One worker node is sufficient for many workloads.
  
  To understand the difference between compute nodes and worker nodes, see Preparing to install the service.
2. Specify the number of cores to allocate per node.
  You are constrained by the total number of available cores on the OpenShift® compute nodes.
3. Specify the amount of memory in GB to allocate per node.
  You are constrained by the total amount of memory on the OpenShift compute nodes.
You can scale the Data Virtualization service up and down at any time after you provision it. For details, see Scaling services.
In the Storage step, specify the storage classes and persistent volume sizes that you want to use for the service nodes and caching storage.

If you use Portworx for your storage class, select portworx-dv-shared-gp3 for the Storage class option. For more information, see Storage considerations.
1. In the Head storage section, select the storage class and specify the amount of storage to allocate to the head node.
  
  In Data Virtualization, a Data Virtualization head node corresponds to a dv-engine pod that runs on your Red Hat® OpenShift cluster.
2. In the Worker storage section, select the storage class and specify the amount of storage to allocate to your worker nodes.
  
  The term worker node in Data Virtualization refers to the worker service component that runs on each dv-worker pod. You can allocate multiple worker nodes, which are effectively multiple dv-worker pods, to the Data Virtualization service instance.
3. In the Caching storage section, select the storage class and specify the amount of storage to allocate to your data caches
  
  Note: Part of the total cache storage space is used for refreshing active caches that have a periodic refresh schedule. This refresh schedule impacts the storage space that is available for creating new cache entries.
Click Next.
Ensure that the summary is correct and click Configure.
Wait for the service to be provisioned.
Optional: If you want to use Cloud Pak for Data while you wait for the Data Virtualization provisioning process to complete, click Home.

What to do next

Checking for available patches

Determine whether there are any patches available for the version of Data Virtualization that you installed:

Clusters connected to the internet

Run the following command to check for patches:

./cpd-cli status \
--repo ./repo.yaml \
--namespace Project \ 
--assembly dv \
--patches \
--available-updates

Air-gapped clusters

See the list of Available patches for Data Virtualization.

If you need to apply patches to the service, follow the guidance in Applying patches.

Additionally, you must follow these steps:

When you provision the Data Virtualization service you are automatically assigned the Data Virtualization Admin role. After you provision the service, you must give other users access to the service. For more information, see Managing users in Data Virtualization.
To connect to the Data Virtualization service, use the JDBC URL that is provided in the Connection details page for the service. Additionally, if you have a load balancer, you must open the port in your load balancer and your firewall. For more information, see Network requirements for Data Virtualization.

You can start using the Data Virtualization service. For more information, see Virtualizing data.