Table of contents

Preparing to install the service (Data Virtualization)

Plan and prepare to deploy the Data Virtualization service.

About this task

Data Virtualization pods: The Data Virtualization service instance has two main pods: c-db2u-dv-db2u-0 and c-db2u-dv-db2u-x.

The c-db2u-dv-db2u-0 pod runs the Data Virtualization head component (also known as the engine).

The term worker pod in Data Virtualization refers to the worker service component that runs on each c-db2u-dv-db2u-x pod, where x starts at 1. You can allocate multiple worker pods, which are effectively multiple c-db2u-dv-db2u-x pods, to the Data Virtualization service instance.

You must not confuse Data Virtualization worker pods with compute pods, which are the physical nodes that compose the Red Hat® OpenShift® cluster. For more information about cluster compute nodes, see Architecture for Cloud Pak for Data.

Procedure

Before you install the Data Virtualization service, you must meet the following requirements.

System requirements

Ensure that you complete the pre-installation tasks for Cloud Pak for Data installations on Red Hat OpenShift.

The Data Virtualization service runs on x86_64 hardware only.

Service resource requirements

Ensure that you meet the service requirements listed in System requirements.

Additionally, the Cloud Pak for Data cluster must accommodate the initial provisioning request for Data Virtualization service pods. The Data Virtualization service pods have the following default resource requirements.
Table 1. Default requirements for service resources
Service pod CPU Memory
c-db2u-dv-db2u-0 4* 16Gi*
c-db2u-dv-db2u-x 4* 16Gi*
c-db2u-dv-dvutils-0 1 2Gi
c-db2u-dv-dvapi 1 1Gi
c-db2u-dv-dvcaching 0.5 1Gi
dv-addon 0.1 100Mi
dv-service-provider 0.1 100Mi

* You can configure the settings that are marked with an asterisk (*). For more information about editing a service instance, see Provisioning the service.

The Data Virtualization service is provisioned to any compute node in the Cloud Pak for Data cluster that has the specified resources (cores and memory) available.

Important: If you try to provision a Data Virtualization service instance and there aren’t enough resources, the provisioning fails.
Scaling Data Virtualization: You can scale the Data Virtualization service up and down at any time after you provision it. For more information, see Scaling Data Virtualization.

Work with IBM Sales to get a more accurate sizing based on your expected workload.

IBM Sales helps you estimate the total demand for Data Virtualization. The service then redistributes resources internally. When you provision Data Virtualization, you can size the settings that are marked with an asterisk (*) on the Default requirements for service resources table.

Storage requirements

Data Virtualization supports NFS, OpenShift container storage, IBM Cloud File Storage (on Red Hat OpenShift Kubernetes Service), and Portworx for persistent storage. For information about storage requirements for Cloud Pak for Data, see Storage considerations.

At a minimum, the persistent storage must meet the following requirements for Data Virtualization:
  • Persistent volume for Data Virtualization engine node: 50Gi.
  • Persistent volume for Data Virtualization caching: 100Gi.
  • XFS formatted
To target the storage that you want to use for the service, you must specify a storage class to create a persistent volume.
Supported storage types:
  • NFS
    Required storage class:
    • nfs-client
    • managed-nfs-storage
  • Portworx
    Required storage class:
    • portworx-db2-rwx-sc
  • OpenShift Container Storage
    Required storage class:
    • ocs-storagecluster-cephfs
  • IBM Cloud File Storage (on Red Hat OpenShift Kubernetes Service)
    Required storage class:
    • ibmc-file-gold-gid
External libraries
External libraries (that is, libraries that are not included in the Data Virtualization service) are stored on a persistent volume. Data Virtualization automatically creates persistent volume claims during the provisioning process.

The persistent volume claim for external libraries must have at least 50 GB available.

Cache storage
A data cache holds temporary data that is used frequently. By using data cache, you can reduce processing and loading time that is required when you use this data.
Ephemeral storage: Ensure that you have a minimum of 100GB of ephemeral storage available for Data Virtualization.

Kernel parameter settings

To ensure that Data Virtualization can run correctly, you must verify the kernel parameters:
  1. Complete the steps in Kernel parameter settings to specify the following parameters:
    • Virtual memory limit (vm.max_map_count)
    • Message limits (kernel.msgmax, kernel.msgmnb, and kernel.msgmni)
    • Shared memory limits (kernel.shmmax, kernel.shmall, and kernel.shmmni)
  2. If the Linux® Kernel version on the nodes is less than 4.6, you must set the kernel semaphore limits. In Data Virtualization, the kernel semaphore limit must meet the minimum required values on all cluster compute nodes:
    kernel.sem="250 256000 100 4096"
    To obtain the Linux Kernel version on each compute node in the cluster, run the uname -r command on each compute node. Alternatively, run the following command to see the kernel version of each OpenShift compute node:
    oc describe node Compute-node-name | grep -i kernel

    To set the kernel semaphore parameter on compute nodes in the cluster, see Changing node settings.