Table of contents

Preparing to install the service (Data Virtualization)

Use this information to plan and prepare to deploy the Data Virtualization service.

Before you install the Data Virtualization service, you must meet the following requirements:
Data Virtualization worker nodes: The Data Virtualization service instance has two main nodes: dv-engine and dv-worker. The term worker node in Data Virtualization refers to the worker service component that runs on each dv-worker pod. You can allocate multiple worker nodes, which are effectively multiple dv-worker pods, to the Data Virtualization service instance.

You must not confuse Data Virtualization worker nodes with compute nodes, which are the physical nodes that compose the Red Hat® OpenShift® cluster. For more information about cluster compute nodes, see Architecture for Cloud Pak for Data.

System requirements

Ensure to meet the system requirements for Cloud Pak for Data installations on Red Hat OpenShift.

The Data Virtualization service runs on x86_64 hardware only.

Service resource requirements

Ensure to meet the service requirements listed in System requirements for services. In these requirements, a Data Virtualization head node corresponds to a dv-engine pod that runs on your Red Hat OpenShift cluster.

Additionally, the Cloud Pak for Data cluster must accommodate the initial provisioning request for Data Virtualization service pods. The Data Virtualization service pods have the following default resource requirements.
Table 1. Default requirements for Data Virtualization service resources
Service pod CPU Memory
dv-engine 4* 16Gi*
dv-worker 4* 16Gi*
dv-utils 1 3Gi
dv-metastore 1 256Mi
dv-unified-console 2 4Gi
dv-api 1 1Gi
dv-caching 0.5 1Gi
dv-addon 0.1 100Mi
dv-service-provider 0.1 100Mi
Important: If you try to provision a Data Virtualization service instance and there aren’t enough resources, the provisioning fails.

* You can configure the settings that are marked with an asterisk (*). For more information about editing a service instance, see Administering the service.

Storage requirements

Data Virtualization supports NFS, OpenShift container storage, and Portworx for persistent storage. The storage class that you use for Data Virtualization must support the ReadWriteMany access mode. For more information, see Access modes in the Red Hat OpenShift documentation.

At a minimum, the persistent storage must meet the following requirements:
  • 100 GB
  • XFS formatted
You can use one of the following methods to target the storage that you want to use for the service:
  • You can specify an existing persistent volume claim that you created specifically for your Data Virtualization service. The existing persistent volume claim must meet storage class requirements for Data Virtualization.
  • You can use a storage class to create a persistent volume.
To use Portworx for persistent storage, select the portworx-dv-shared-gp storage class.
Note: You must create the portworx-dv-shared-gp storage class when you install Cloud Pak for Data. For more information, see Creating Portworx storage classes.

To use OpenShift container storage, select the ocs-storagecluster-cephfs storage class.

For more information about storage requirements for Cloud Pak for Data, see Storage considerations.

The Data Virtualization service is provisioned to any compute node in the Cloud Pak for Data cluster that has the specified resources (cores and memory) available.

Two persistent volumes and associated persistent volume claims are required for external libraries, cache entries, and queries. You can use any physical storage in your environment for the persistent volumes. The persistent volume claim must be scoped to the namespace or OpenShift project that you choose.
External libraries
External libraries (that is, libraries that are not included in the Data Virtualization service) are stored on a persistent volume. You can have the provisioning process create the persistent volume claim by specifying a storage class or you can choose an existing persistent volume claim. You can use the persistent volume claim from a Data Virtualization service instance that is deleted to create a new service instance.

The persistent volume claim for external libraries must have at least 10 GB available.

Cache storage
A data cache holds temporary data that is used frequently. By using data cache, you can reduce processing and loading time that is required when you use this data.

Cleaning up IPC resources

To remove existing interprocess communication (IPC) resources that are owned by the Big SQL user, you must run the following commands on each compute node:
  1. To check for any remaining Big SQL IPC resources, run the following command:
    ls /dev/shm/sem.bigsql_*
    If the command lists Big SQL IPC resources, run the following command to remove these resources:
    rm -f /dev/shm/sem.bigsql_*
  2. To remove any remaining IPC resources, run the following command:
    BIGSQL_UID="1000322824"
    IPCS_S=`ipcs -s | egrep "0x[0-9a-f]+ [0-9]+" | grep $BIGSQL_UID | cut -f2 -d" "`
    IPCS_M=`ipcs -m | egrep "0x[0-9a-f]+ [0-9]+" | grep $BIGSQL_UID | cut -f2 -d" "`
    IPCS_Q=`ipcs -q | egrep "0x[0-9a-f]+ [0-9]+" | grep $BIGSQL_UID | cut -f2 -d" "`
    for id in $IPCS_M; do
      echo "Remove shared memory segment ${id}"
      ipcrm -m $id;
    done
    for id in $IPCS_S; do
      echo "Remove semaphore set ${id}"
      ipcrm -s $id;
    done
    for id in $IPCS_Q; do
      echo "Remove message queue ${id}"
      ipcrm -q $id;
    done

Setting the kernel semaphore parameter

If the Linux® Kernel version on the nodes is less than 4.6, you must set the kernel semaphore parameter. In Data Virtualization, the kernel semaphore parameter must meet the minimum required values on all cluster compute nodes:
kernel.sem="250 256000 100 4096"
To obtain the Linux Kernel version on each compute node in the cluster, run the uname -r command on each compute node. Alternatively, run the following command to see the kernel version of each OpenShift compute node:
oc describe node Compute-node-name | grep -i kernel
To set the kernel semaphore parameter on compute nodes in the cluster, see Changing node settings.
Important: If you manually set the kernel semaphore parameter, you must check the You must check this box if you updated the kernel semaphore parameter box on the service provisioning Start page. Otherwise, the Data Virtualization service provisioning fails.

What to do next?

  1. Optional: Set up node affinity.
  2. Optional: Configure network requirements.
  3. Install the service.