Creating a service instance for Data Virtualization programmatically

After you install Data Virtualization, you must create at least one Data Virtualization service instance. Each service instance must be in a different Red Hat® OpenShift® Container Platform project. You can create a service instance in the operands project or in a project that is tethered to the operands project. If you are a IBM® Software Hub user, you can use the /v3/service_instances REST API call to programmatically create service instances.

Who needs to complete this task?: To create a service instance programmatically by using the /v3/service_instances REST API call, you must have the Create service instances (can_provision) permission in IBM Software Hub.

When do you need to complete this task?

Complete this task only if you want to create a service instance programmatically by using the /v3/service_instances REST API call.

Alternative methods for creating a service instance

From the web client. For more information, see Creating a service instance for Data Virtualization from the web client.
By using the cpd-cli service-instance create command. For more information, see Creating a service instance for Data Virtualization with the cpd-cli service-instance create command.

Information you need to complete this task

Review the following information before you create a service instance for Data Virtualization:

Version requirements: All of the components that are associated with an instance of IBM Software Hub must be installed or created at the same release. For example, if Data Virtualization is installed at Version 5.2.2, you must create the service instance at Version 5.2.2.

Important: Data Virtualization uses a different version number from IBM Software Hub. This topic includes a table that shows the Data Virtualization version for each refresh of IBM Software Hub. Use this table to find the correct version based on the version of IBM Software Hub that is installed.

Environment variables

The commands in this task use environment variables so that you can run the commands exactly as written.

If you don't have the script that defines the environment variables, see Setting up installation environment variables.
To use the environment variables from the script, you must source the environment variables before you run the commands in this task. For example, run:
```
source ./cpd_vars.sh
```

Before you begin

This task assumes that the following prerequisites are met:

Prerequisite	Where to find more information
Data Virtualization is installed.	If this task is not complete, see Installing Data Virtualization.
You generated an API key. The API key must be associated with a user who has the Create service instances (`can_provision`) permission in IBM Software Hub.	If this task is not complete, see Generating an API authorization token.

Procedure

Complete the following tasks to create a service instance:

Creating a service instance
Validating that the service instance was created
What to do next

Creating a service instance

To create a service instance:

Change to the directory on your workstation where you want to create the JSON file that defines the service instance payload.
Set the environment variables that are used to populate the JSON payload for the service instance:
1. Set the INSTANCE_SHORT_NAME environment variable to the unique name that you want to use to identify the service instance:
```
export INSTANCE_SHORT_NAME="<display-name>"
```
  The short name is a string and can contain alphanumeric characters (a-z, A-Z, 0-9), dashes (-), and underscores (_).
2. Set the INSTANCE_PROJECT to the project where you want to create the service instance:
  Create the service instance in the operands project
```
export INSTANCE_PROJECT=${PROJECT_CPD_INST_OPERANDS}
```
  The command uses the PROJECT_CPD_INST_OPERANDS variable, which is already defined in your installation environment variables script.
  Create the service instance in a tethered project
  Important: If multiple tethered projects are associated with this instance of IBM Software Hub, make sure that the ${PROJECT_CPD_INSTANCE_TETHERED} environment variable is set to the correct project name before you run the export command:
```
echo $PROJECT_CPD_INSTANCE_TETHERED
```
```
export INSTANCE_PROJECT=${PROJECT_CPD_INSTANCE_TETHERED}
```
  Remember: You can create only one service instance in each project.
3. Set the INSTANCE_NAME environment variable:
```
export INSTANCE_NAME="watson-query-${INSTANCE_PROJECT}-${INSTANCE_SHORT_NAME}"
```
4. Set the INSTANCE_DESCRIPTION environment variable to the description that you want to use for the service instance:
```
export INSTANCE_DESCRIPTION="<description>"
```
  This description is displayed on the Instances page of the IBM Software Hub web client.
  
  The description is a string and can contain alphanumeric characters, spaces, dashes, underscores, and periods. Make sure that you surround the display name with quotation marks, as shown in the preceding export command.
5. Set the INSTANCE_VERSION environment variable to the version that corresponds to the version of IBM Software Hub on your cluster:
```
export INSTANCE_VERSION=<version>
```
  Use the following table to determine the appropriate value:
  
  IBM Software Hub version Service instance version
  
  5.2.2 3.2.2
  
  5.2.1 3.2.1
  
  5.2.0 3.2.0
6. Set the INSTANCE_AUTOSCALING environment variable base on whether you want the instance to scale automatically by increasing or decreasing the number of pods in response to CPU or memory consumption.
  Automatically scaling the instance
  If you want to automatically scale the service instance, run:
```
export INSTANCE_AUTOSCALING=true
```
  Important: If you automatically scale the service, you must use a predefined size for the INSTANCE_PREDEFINED_SIZE environment variable.
  Using only the specified resources
  If you do not want the service instance to scale automatically, run:
```
export INSTANCE_AUTOSCALING=false
```
7. Set the INSTANCE_PREDEFINED_SIZE environment variable.
  Using custom settings
  If you want to use custom settings rather than a predefined size, run:
```
export INSTANCE_PREDEFINED_SIZE=""
```
  Using a predefined size
  If you want to use a predefined size for the instance, run:
```
export INSTANCE_PREDEFINED_SIZE=<size>
```
  Valid values are:
  - extrasmall
  - small
  - medium
  - large
  For more information about the resources associated with each size, see the component scaling guidance PDF, which you can download from the IBM Entitled Registry.
  
  Important: You must set the INSTANCE_CPU and INSTANCE_MEMORY environment variables even if you want to use a predefined size. If you do not specify a value for these environment variables, the service instance provisioning will fail.
8. Set the INSTANCE_CPU environment variable to the amount of CPU to allocate to the service instance:
```
export INSTANCE_CPU=<integer>
```
  Specify a value between 4 and 64.
  
  Tip: If you are using a predefined instance size, set this parameter to 4.
  
  Size the instance based on your workload. For more information about the number of CPU to allocate to the service instance, see the component scaling guidance PDF, which you can download from the IBM Entitled Registry.
9. Set the INSTANCE_MEMORY environment variable to the amount of memory to allocate to the service instance:
```
export INSTANCE_MEMORY=<integer>
```
  Specify a value between 16 Gi and 512 Gi. Specify the value as an integer. Omit the unit of measurement.
  
  Tip: If you are using a predefined instance size, set this parameter to 16.
  
  Size the instance based on your workload. For more information about the amount of memory to allocate to the service instance, see the component scaling guidance PDF, which you can download from the IBM Entitled Registry.
10. Set the INSTANCE_WORKERS environment variable to the number of worker nodes to run the service instance on:
```
export INSTANCE_WORKERS=<integer>
```
  The maximum number of workers that you can specify depends on whether Db2U is configured to run with elevated privileges:
  - If Db2U is configured to run with limited privileges, you can specify a value between 1 and the total number of worker nodes on the cluster.
  - If Db2U is configured to run with elevated privileges, you can specify a value between 1 and 999.
  Most workloads can run on 1 to 3 nodes. For more information about the number of nodes recommended based on your workload, see the component scaling guidance PDF, which you can download from the IBM Entitled Registry.
11. Set the PV_SIZE environment variable to the amount of storage that you want to allocate to the service instance:
```
export PV_SIZE=<integer>
```
  Specify a value between 50 Gi and 10240 Gi. The default recommendation is 50 Gi. Specify the value as an integer. Omit the unit of measurement.
  
  Size the volume based on the size of the queries that you plan to run. For guidance, see the component scaling guidance PDF, which you can download from the IBM Entitled Registry.
12. Set the PV_SIZE_CACHE environment variable to the amount of storage that you want to allocate to caching for the service instance:
```
export PV_SIZE_CACHE=<integer>
```
  Specify a value between 100 Gi and 10240 Gi. The default recommendation is 100 Gi. Specify the value as an integer. Omit the unit of measurement.
  
  Size the volume based on the size of the data cache. For guidance, see the component scaling guidance PDF, which you can download from the IBM Entitled Registry.
13. Set the PV_SIZE_AUDITING environment variable to the amount of storage that you want to allocate to audit logs for the service instance:
```
export PV_SIZE_AUDITING=<integer>
```
  Specify a value between 1 Gi and 10240 Gi. The default recommendation is 30 Gi. Specify the value as an integer. Omit the unit of measurement.
  
  Size the volume based on the number of auditable events that are logged. For guidance, see the component scaling guidance PDF, which you can download from the IBM Entitled Registry.

IBM Software Hub version	Service instance version
5.2.2	3.2.2
5.2.1	3.2.1
5.2.0	3.2.0

Create the data-virtualization-instance.json payload file.

The command that you need to run depends on the type of storage on your cluster.

Portworx storage

cat << EOF > ./data-virtualization-instance.json
{
    "addon_type": "dv",
    "display_name": "${INSTANCE_NAME}",
    "addon_version": "${INSTANCE_VERSION}",
    "namespace": "${INSTANCE_PROJECT}",
    "create_arguments": {
        "description": "${INSTANCE_DESCRIPTION}",
        "metaData": {},
        "parameters" : {
            "autoscaling": ${INSTANCE_AUTOSCALING},
            "tshirtsize": "${INSTANCE_PREDEFINED_SIZE}",
            "resources.dv.requests.cpu": "${INSTANCE_CPU}",
            "resources.dv.requests.memory": "${INSTANCE_MEMORY}Gi",
            "image.pullPolicy": "IfNotPresent",
            "workerCount": "${INSTANCE_WORKERS}",
            "persistence.storageClass": "portworx-db2-rwo-sc",
            "persistence.size": "${PV_SIZE}Gi",
            "persistence.cachingpv.storageClass": "portworx-db2-rwx-sc",
            "persistence.cachingpv.size": "${PV_SIZE_CACHE}Gi",
            "persistence.auditpv.storageClass": "portworx-db2-rwx-sc",
            "persistence.auditpv.size": "${PV_SIZE_AUDITING}Gi"
        },
    "resources": {
        "cpu": "$(( (${INSTANCE_WORKERS} + 1) * ${INSTANCE_CPU} ))",
        "memory": "$(( (${INSTANCE_WORKERS} + 1) * ${INSTANCE_MEMORY} ))"
    },
    "transientFields": {}
  }
}
EOF

Amazon Elastic storage

cat << EOF > ./data-virtualization-instance.json
{
    "addon_type": "dv",
    "display_name": "${INSTANCE_NAME}",
    "addon_version": "${INSTANCE_VERSION}",
    "namespace": "${INSTANCE_PROJECT}",
    "create_arguments": {
        "description": "${INSTANCE_DESCRIPTION}",
        "metaData": {},
        "parameters" : {
            "autoscaling": ${INSTANCE_AUTOSCALING},
            "tshirtsize": "${INSTANCE_PREDEFINED_SIZE}",
            "resources.dv.requests.cpu": "${INSTANCE_CPU}",
            "resources.dv.requests.memory": "${INSTANCE_MEMORY}Gi",
            "image.pullPolicy": "IfNotPresent",
            "workerCount": "${INSTANCE_WORKERS}",
            "persistence.storageClass": "${STG_CLASS_FILE}",
            "persistence.size": "${PV_SIZE}Gi",
            "persistence.cachingpv.storageClass": "${STG_CLASS_FILE}",
            "persistence.cachingpv.size": "${PV_SIZE_CACHE}Gi",
            "persistence.auditpv.storageClass": "${STG_CLASS_FILE}",
            "persistence.auditpv.size": "${PV_SIZE_AUDITING}Gi"
        },
    "resources": {
        "cpu": "$(( (${INSTANCE_WORKERS} + 1) * ${INSTANCE_CPU} ))",
        "memory": "$(( (${INSTANCE_WORKERS} + 1) * ${INSTANCE_MEMORY} ))"
    },
    "transientFields": {}
  }
}
EOF

The following environment variables use the values that are already defined in your installation environment variables script:

${STG_CLASS_FILE}

All other storage

cat << EOF > ./data-virtualization-instance.json
{
    "addon_type": "dv",
    "display_name": "${INSTANCE_NAME}",
    "addon_version": "${INSTANCE_VERSION}",
    "namespace": "${INSTANCE_PROJECT}",
    "create_arguments": {
        "description": "${INSTANCE_DESCRIPTION}",
        "metaData": {},
        "parameters" : {
            "autoscaling": ${INSTANCE_AUTOSCALING},
            "tshirtsize": "${INSTANCE_PREDEFINED_SIZE}",
            "resources.dv.requests.cpu": "${INSTANCE_CPU}",
            "resources.dv.requests.memory": "${INSTANCE_MEMORY}Gi",
            "image.pullPolicy": "IfNotPresent",
            "workerCount": "${INSTANCE_WORKERS}",
            "persistence.storageClass": "${STG_CLASS_BLOCK}",
            "persistence.size": "${PV_SIZE}Gi",
            "persistence.cachingpv.storageClass": "${STG_CLASS_BLOCK}",
            "persistence.cachingpv.size": "${PV_SIZE_CACHE}Gi",
            "persistence.auditpv.storageClass": "${STG_CLASS_FILE}",
            "persistence.auditpv.size": "${PV_SIZE_AUDITING}Gi"
        },
    "resources": {
        "cpu": "$(( (${INSTANCE_WORKERS} + 1) * ${INSTANCE_CPU} ))",
        "memory": "$(( (${INSTANCE_WORKERS} + 1) * ${INSTANCE_MEMORY} ))"
    },
    "transientFields": {}
  }
}
EOF

The following environment variables use the values that are already defined in your installation environment variables script:

${STG_CLASS_BLOCK}
${STG_CLASS_FILE}

Set the PAYLOAD_FILE environment variable to the fully qualified name of the JSON payload file on your workstation:
```
export PAYLOAD_FILE=<fully-qualified-JSON-file-name>
```
Set the environment variables that are used to connect to the instance of IBM Software Hub where you want to create the service instance:
1. Set the CPD_ROUTE environment variable:
```
export CPD_ROUTE=$(oc get route cpd -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath={".spec.host"})
```
  The command uses the PROJECT_CPD_INST_OPERANDS variable, which is already defined in your installation environment variables script.
2. Set the API_KEY environment variable to the API key that you created:
```
export API_KEY=<your_api_key>
```
Create the service instance from the payload file.
The command that you run depends on whether the instance of IBM Software Hub where you want to create the service instance uses a self-signed certificate or a certificate signed by a trusted certificate authority.
The instance uses a certificate signed by a trusted certificate authority
```
curl --request POST \
--url "https://${CPD_ROUTE}/zen-data/v3/service_instances" \
--header "Authorization: ZenApiKey ${API_KEY}" \
--header 'Content-Type: application/json' \
--data @${PAYLOAD_FILE}
```
The instance uses a self-signed certificate (default)
```
curl -k --request POST \
--url "https://${CPD_ROUTE}/zen-data/v3/service_instances" \
--header "Authorization: ZenApiKey ${API_KEY}" \
--header 'Content-Type: application/json' \
--data @${PAYLOAD_FILE}
```
If the request was successful, the command returns one of the following HTTP response codes:
- 200 - The request was successfully completed and the service instance was provisioned.
- 202 - The request was successfully submitted. The service instance is being provisioned.
If the request was not successful, use the HTTP response code to determine the reason.

Note: The c-db2u-dv-dvcaching pod remains in the "0/1 Init" state during the entire Data Virtualization instance-provisioning process. The pod switches to the "1/1 Running" state after the process is complete.

Validating that the service instance was created

To validate that the service instance was created:

Set the INSTANCE_ID environment variable to the ID that was returned by the POST cURL command:
```
export INSTANCE_ID=<ID-from-response>
```
Get the status of the service instance.
The command that you run depends on whether the instance of IBM Software Hub where you want to create the service instance uses a self-signed certificate or a certificate signed by a trusted certificate authority.
The instance uses a certificate signed by a trusted certificate authority
```
curl --request GET \
  --url "https://${CPD_ROUTE}/zen-data/v3/service_instances/${INSTANCE_ID}" \
  --header "Authorization: ZenApiKey ${API_KEY}" \
  --header 'Content-Type: application/json'
```
The instance uses a self-signed certificate (default)
```
curl -k --request GET \
  --url "https://${CPD_ROUTE}/zen-data/v3/service_instances/${INSTANCE_ID}" \
  --header "Authorization: ZenApiKey ${API_KEY}" \
  --header 'Content-Type: application/json'
```
- If the request was successful, the command returns the following HTTP response code: 200
  Find the provision_status parameter in the JSON response.
  - If the value is PROVISIONED, the service instance was successfully created.
  - If the value is PROVISION_IN_PROGRESS, wait a few minutes and run the command again.
  - If the value is FAILED, review the pod logs for the zen-core-api and zen-watcher pods for possible causes.
- If the request was not successful, use the HTTP response code to determine the reason.

What to do next

To connect to the Data Virtualization service, use the JDBC URL that is provided in the Configure connection page for the service. Additionally, if you have a load balancer, you must open the port in your load balancer and your firewall. For more information, see Configuring network requirements for Data Virtualization.
Optional: Configure dedicated OpenShift worker nodes.
Complete post-installation administration tasks to configure service instances.
Assign the Data Virtualization Admin user role for service set up. When you provision the Data Virtualization service, you are automatically assigned the Data Virtualization Admin role. After you provision the service, you must give at least one other user the Data Virtualization Admin role to configure the features of the Data Virtualization service. Alternatively complete those tasks yourself.

Now you can use the Data Virtualization service. For more information, see Getting started with Data Virtualization.