Setting up the cluster for Data Virtualization

If you plan to install Data Virtualization on IBM® Cloud Pak for Data, a cluster administrator must set up the cluster for Data Virtualization.

Before you begin

Required role: To complete this task, you must be a cluster administrator.

Before you set up the cluster for Data Virtualization, ensure that:

The Cloud Pak for Data control plane is already installed on your Red Hat® OpenShift® cluster. For details, see Installing IBM Cloud Pak for Data.
The cluster meets the minimum requirements for installing Data Virtualization. For details, see System requirements for services.
The machine from which you will run the commands meets the requirements described in Preparing your installation node.
On air-gapped clusters: You completed the steps in Preparing for air-gapped installations to download the required files for the service.

Tip: For a list of all available options, enter the following command:

./cpd-cli adm --help

Requirement: Ensure that you configure the OpenShift cluster default thread count to permit containers with 2000 pids on every compute node. For more information, see Software requirements.

Procedure

Set up the cluster for Data Virtualization by completing the appropriate task for your environment:
- Preparing clusters connected to the internet
- Preparing air-gapped clusters
Complete the tasks listed in What to do next

Preparing clusters connected to the internet

From your installation node:

Change to the directory where you placed the Cloud Pak for Data command-line interface and the repo.yaml file.
Log in to your Red Hat OpenShift cluster as an administrator:
```
oc login OpenShift_URL:port
```

Run the following command to see a preview of the list of resources that must be created on the cluster:

./cpd-cli adm \
--repo ./repo.yaml \
--assembly dv \
--arch Cluster_architecture \  
--namespace Project \
--latest-dependency

Important: By default, this command gets the latest version of the assembly. If you want to install a specific version of Data Virtualization, add the following line to your command after the --assembly flag:

--version Assembly_version \

Additionally, you can remove the --latest-dependency flag to get the minimum required version of any software that Data Virtualization is dependent on.

Tell the person who will install Data Virtualization whether you used either of these flags. The install command must be run with the same flags.

Replace the following values:

Variable	Replace with
`Assembly_version`	The version of Data Virtualization that you want to install. The assembly versions are listed in System requirements for services.
`Cluster_architecture`	Specify the architecture of your cluster hardware: For x86-64 hardware, remove this flag or specify `x86_64`
`Project`	The project where the Cloud Pak for Data control plane is installed.

The command returns a list of the changes that you must make to your cluster to ensure that Data Virtualization can run on your cluster.

Make the necessary changes to your cluster. You can choose one of the following methods to make the changes:
To automatically apply the changes to your cluster:
Re-run the cpd adm command with the --apply flag:
```
./cpd-cli adm \
--repo repo.yaml \
--assembly dv \
--arch Cluster_architecture \
--namespace Project \
--apply
```
Replace the variables with the same values that you used the last time you ran the command.
To manually apply the changes to your cluster:

Follow the appropriate procedures from the Red Hat OpenShift documentation to complete the required tasks.

Preparing air-gapped clusters

From your installation node:

Change to the directory where you placed the Cloud Pak for Data command-line interface.
Log in to your Red Hat OpenShift cluster as an administrator:
```
oc login OpenShift_URL:port
```

Run the following command to see a preview of the list of resources that must be created on the cluster:

./cpd-cli adm \
--assembly dv \
--arch Cluster_architecture \
--namespace Project \
--load-from Image_directory_location \
--latest-dependency

Note: If the assembly was downloaded using the delta-images command, remove the --latest-dependency flag from the command. If you don't remove the --latest-dependency flag you will get an error indicating that the flag cannot be used.

Tell the person who will install Data Virtualization whether you used the --latest-dependency flag. If you run this command with the --latest-dependency flag, the install command must also be run with the flag.

Replace the following values:

Variable	Replace with
`Cluster_architecture`	Specify the architecture of your cluster hardware: For x86-64 hardware, remove this flag or specify `x86_64`
`Project`	The project where the Cloud Pak for Data control plane is installed.
`Image_directory_location`	The location of the cpd-cli-workspace directory.

The command returns a list of the changes that you must make to your cluster to ensure that Data Virtualization can run on your cluster.

Make the necessary changes to your cluster. You can choose one of the following methods to make the changes:
To automatically apply the changes to your cluster:
Re-run the cpd adm command with the --apply flag:
```
./cpd-cli adm \
--assembly dv \
--arch Cluster_architecture \
--namespace Project \
--load-from Image_directory_location \
--latest-dependency \
--apply
```
Replace the variables with the same values that you used the last time you ran the command.
To manually apply the changes to your cluster:

Follow the appropriate procedures from the Red Hat OpenShift documentation to complete the required tasks.

Results

When you run the cpd-cli adm command with the --apply flag, the following security resources are created:

Service accounts

dv-sa: This service account is bound to dv-role and is associated with the dv-scc security context constraint. It is used for preparing persistent volume mounts, setting file permissions, etc.
dv-bar-sa

The following table lists the Data Virtualization service account permissions:

Service account	GET permissions	PUT/POST/DELETE permissions	Elevated security context
`dv-sa`	Y	Y	Y
`dv-bar-sa`	Y	Y	Y

Security context constraints (SCCs)

dv-scc

The dv-scc security context constraint enables the IPC_OWNER capability that Data Virtualization requires for controlling the process privilege.

To get a description of Data Virtualization SCCs (dv-scc), run the following command:

oc describe scc dv-scc

dv-metastore-scc

To get a description of the dv-metastore-scc security context constraint, run the following command:

oc describe scc dv-metastore-scc

For more information, see Security in Data Virtualization.

Roles

dv-role: The dv-role role allows to get and list pods and services within the namespace. It also allows Data Virtualization pods to run commands inside a separate pod.
dv-bar-role: The dv-bar-role is used for backup and restore.

RoleBindings

dv-rb
dv-bar-rb

What to do next

Ensure that you meet the service prerequisites. See Preparing to install the service for details.