Table of contents

Provisioning the service (Data Virtualization)

Before you use Data Virtualization, you must provision the service to your IBM Cloud Pak for Data.

Before you begin

Before you provision the Data Virtualization service, you must:

About this task

The Data Virtualization service is provisioned to any compute node in the Cloud Pak for Data cluster that has the specified resources (cores and memory) available.

Important: To complete this task, you must have the Provision Databases permission. The default Cloud Pak for Data administrator role, Admin, has this permission.

Procedure

To provision the Data Virtualization service:

  1. Click the Services icon (services icon) from the Cloud Pak for Data web user interface.
  2. From the list of services, locate the Data Virtualization service under the Data sources category. Click the action menu and select Provision instance.
  3. If you manually set the kernel semaphore parameter, check the You must check this box if you updated the kernel semaphore parameter box.
    You must update the kernel semaphore parameter if the Linux® Kernel version on the cluster nodes is less than 4.6. For details, see Preparing to install the service.

    If you manually update the kernel semaphore parameter and you do not check the corresponding box, the Data Virtualization service provisioning fail.

  4. To configure the service, specify the resources that you want to allocate to the Data Virtualization worker nodes in the Nodes section.
    1. Specify the number of Data Virtualization worker nodes to allocate to the service.
      Recommended: One worker node is sufficient for most workloads.

      To understand the difference between compute nodes and worker nodes, see Preparing to install the service.

    2. Specify the number of cores to allocate to each worker node.
      You are constrained by the total number of available cores on the OpenShift® compute nodes.
    3. Specify the amount of memory to allocate to each worker node.
      You are constrained by the total amount of memory on the OpenShift compute nodes.
  5. In the Storage section, specify the resources that you want to use for persistent storage and cache storage.

    In the Persistent storage section, you can configure persistent storage for external libraries.

    In the Cache storage section, you can configure storage for your data caches.
    Note: Part of the total cache storage space is used for refreshing active caches that have a periodic refresh schedule. This impacts the storage space that is available for creating new cache entries.
    • If you want to create a new persistent volume claim (PVC), select Create new claim. Then, select the storage class to use and specify the amount of storage to allocate to the persistent volume.

      If you use Portworx for your persistent storage, select portworx-dv-shared-gp in the Storage class option. For more information about storage considerations for Cloud Pak for Data, see Storage considerations.

    • If you want to use an existing PVC, you must first create the claim against a storage class that is supported in Data Virtualization.

      After you create the PVC, refresh your browser, select Use existing claim, and select the PVC from the drop-down list.

      You can use an empty PVC or a PVC that you used in an older Data Virtualization instance. If you want to use a PVC from an older service instance, ensure that the instance used the same service version and was on the same Cloud Pak for Data cluster.

  6. Click Next.
  7. Ensure that the summary is correct and click Provision.
  8. Optional: It can take approximately 10 minutes for the service to be provisioned. If you want to use Cloud Pak for Data while you wait for the provisioning process to complete, click Home.

What to do next

  1. Determine whether there are any patches available for your installation:
    • To check for patches on a cluster that can connect to the internet:
      Run the following command to check for patches:
      ./cpd-Operating_System --repo ./repo.yaml status \
      --namespace Project \ 
      --assembly dv \
      --patches \
      --available-updates 
    • To check for patches on an air-gapped cluster:

      See the list of available patches for Data Virtualization.

    If you need to apply patches to the service, follow the guidance in Applying patches.

  2. You can start using the Data Virtualization service. For more information, see Virtualizing data.
  3. When you provision the Data Virtualization service you are automatically assigned the Data Virtualization Admin role. After you provision the service, you must give other users access to the service. For more information, see Managing users in Data Virtualization.
  4. To connect to the Data Virtualization service, use the JDBC URL that is provided in the Connection details page for the service. Additionally, if you have a load balancer, you must open the port in your load balancer and your firewall. For more information, see Network requirements for Data Virtualization.