Setting up the cluster for Data Virtualization

If you plan to install Data Virtualization on IBM® Cloud Pak for Data, a cluster administrator must set up the cluster for Data Virtualization.

Before you begin

Required role: To complete this task, you must be a cluster administrator.

Before you set up the cluster for Data Virtualization, ensure that:

Tip: For a list of all available options, enter the following command:
./cpd-cli adm --help
Requirement: Ensure that you configure the OpenShift cluster default thread count to permit containers with 2000 pids on every compute node. For more information, see Software requirements.

Procedure

  1. Set up the cluster for Data Virtualization by completing the appropriate task for your environment:
  2. Complete the tasks listed in What to do next

Preparing clusters connected to the internet

From your installation node:

  1. Change to the directory where you placed the Cloud Pak for Data command-line interface and the repo.yaml file.
  2. Log in to your Red Hat OpenShift cluster as an administrator:
    oc login OpenShift_URL:port
  3. Run the following command to see a preview of the list of resources that must be created on the cluster:
    ./cpd-cli adm \
    --repo ./repo.yaml \
    --assembly dv \
    --arch Cluster_architecture \  
    --namespace Project \
    --latest-dependency
    Important: By default, this command gets the latest version of the assembly. If you want to install a specific version of Data Virtualization, add the following line to your command after the --assembly flag:
    --version Assembly_version \

    Additionally, you can remove the --latest-dependency flag to get the minimum required version of any software that Data Virtualization is dependent on.

    Tell the person who will install Data Virtualization whether you used either of these flags. The install command must be run with the same flags.

    Replace the following values:

    Variable Replace with
    Assembly_version
    The version of Data Virtualization that you want to install. The assembly versions are listed in System requirements for services.
    Cluster_architecture Specify the architecture of your cluster hardware:
    • For x86-64 hardware, remove this flag or specify x86_64
    Project The project where the Cloud Pak for Data control plane is installed.

    The command returns a list of the changes that you must make to your cluster to ensure that Data Virtualization can run on your cluster.

  4. Make the necessary changes to your cluster. You can choose one of the following methods to make the changes:
    To automatically apply the changes to your cluster:
    Re-run the cpd adm command with the --apply flag:
    ./cpd-cli adm \
    --repo repo.yaml \
    --assembly dv \
    --arch Cluster_architecture \
    --namespace Project \
    --apply

    Replace the variables with the same values that you used the last time you ran the command.

    To manually apply the changes to your cluster:
    Follow the appropriate procedures from the Red Hat OpenShift documentation to complete the required tasks.

Preparing air-gapped clusters

From your installation node:

  1. Change to the directory where you placed the Cloud Pak for Data command-line interface.
  2. Log in to your Red Hat OpenShift cluster as an administrator:
    oc login OpenShift_URL:port
  3. Run the following command to see a preview of the list of resources that must be created on the cluster:
    ./cpd-cli adm \
    --assembly dv \
    --arch Cluster_architecture \
    --namespace Project \
    --load-from Image_directory_location \
    --latest-dependency
    Note: If the assembly was downloaded using the delta-images command, remove the --latest-dependency flag from the command. If you don't remove the --latest-dependency flag you will get an error indicating that the flag cannot be used.

    Tell the person who will install Data Virtualization whether you used the --latest-dependency flag. If you run this command with the --latest-dependency flag, the install command must also be run with the flag.

    Replace the following values:

    Variable Replace with
    Cluster_architecture Specify the architecture of your cluster hardware:
    • For x86-64 hardware, remove this flag or specify x86_64
    Project The project where the Cloud Pak for Data control plane is installed.
    Image_directory_location The location of the cpd-cli-workspace directory.

    The command returns a list of the changes that you must make to your cluster to ensure that Data Virtualization can run on your cluster.

  4. Make the necessary changes to your cluster. You can choose one of the following methods to make the changes:
    To automatically apply the changes to your cluster:
    Re-run the cpd adm command with the --apply flag:
    ./cpd-cli adm \
    --assembly dv \
    --arch Cluster_architecture \
    --namespace Project \
    --load-from Image_directory_location \
    --latest-dependency \
    --apply

    Replace the variables with the same values that you used the last time you ran the command.

    To manually apply the changes to your cluster:
    Follow the appropriate procedures from the Red Hat OpenShift documentation to complete the required tasks.

Results

When you run the cpd-cli adm command with the --apply flag, the following security resources are created:
Service accounts
dv-sa
This service account is bound to dv-role and is associated with the dv-scc security context constraint. It is used for preparing persistent volume mounts, setting file permissions, etc.
dv-bar-sa
The following table lists the Data Virtualization service account permissions:
Service account GET permissions PUT/POST/DELETE permissions Elevated security context
dv-sa Y Y Y
dv-bar-sa Y Y Y
Security context constraints (SCCs)
dv-scc

The dv-scc security context constraint enables the IPC_OWNER capability that Data Virtualization requires for controlling the process privilege.

To get a description of Data Virtualization SCCs (dv-scc), run the following command:
oc describe scc dv-scc
dv-metastore-scc
To get a description of the dv-metastore-scc security context constraint, run the following command:
oc describe scc dv-metastore-scc

For more information, see Security in Data Virtualization.

Roles
dv-role
The dv-role role allows to get and list pods and services within the namespace. It also allows Data Virtualization pods to run commands inside a separate pod.
dv-bar-role
The dv-bar-role is used for backup and restore.
RoleBindings
dv-rb
dv-bar-rb

What to do next

Ensure that you meet the service prerequisites. See Preparing to install the service for details.