Installing DataStage

A project administrator can install DataStage on IBM® Cloud Pak for Data.

The installation process is the same for both DataStage® Enterprise and DataStage Enterprise Plus. The one that is installed is determined by the catalog source that you created.

Permissions you need for this task
You must be an administrator of the OpenShift® project (Kubernetes namespace) where you will deploy DataStage.
Information you need to complete this task
  • DataStage needs only the restricted security context constraint (SCC).
  • DataStage must installed in the same project as Cloud Pak for Data.
  • DataStage requires the Cloud Pak for Data common core services. If the common core services are not installed in the project where you plan to install DataStage, the common core services will be automatically installed when you install DataStage, which will increase the amount of time the installation takes to complete.
  • DataStage uses the following storage classes. If you don't use these storage classes on your cluster, ensure that you have a storage class with an equivalent definition:
    • OpenShift Container Storage: ocs-storagecluster-cephfs
    • IBM Spectrum®: ibm-spectrum-scale-sc
    • NFS: managed-nfs-storage
    • Portworx: portworx-shared-gp3
    • IBM Cloud File Storage: ibmc-file-gold-gid or ibm-file-custom-gold-gid

Before you begin

Ensure that the cluster meets the minimum requirements for installing DataStage. For details, see System requirements.

Additionally, ensure that a cluster administrator completed the required Pre-installation tasks for your environment. Specifically, verify that a cluster administrator completed the following tasks:

  1. Cloud Pak for Data is installed. For details, see Installing Cloud Pak for Data.
  2. For environments that use a private container registry, such as air-gapped environments, the DataStage software images are mirrored to the private container registry. For details, see Mirroring images to your container registry.
  3. The cluster is configured to pull the DataStage software images. For details, see Configuring your cluster to pull images.
  4. The DataStage catalog source exists. For details, see Creating catalog sources.
  5. The DataStage operator subscription exists. For details, see Creating operator subscriptions.

If these tasks are not complete, the DataStage installation will fail.

Procedure

Complete the following tasks to install DataStage:

  1. Installing the service
  2. Verifying the installation
  3. Choosing a service upgrade plan
  4. What to do next

Installing the service

To install DataStage:

  1. Log in to Red Hat® OpenShift Container Platform as a user with sufficient permissions to complete the task:
    oc login OpenShift_URL:port
  2. Create a DataStage custom resource to install DataStage. Follow the appropriate guidance for your environment.
    Important: By creating a DataStage custom resource with spec.license.accept: true, you are accepting the license terms for DataStage. You can find links to the relevant licenses in IBM Cloud Pak for Data License Information.
    cat <<EOF |oc apply -f -
    apiVersion: ds.cpd.ibm.com/v1alpha1
    kind: DataStage
    metadata:
      name: datastage     # This is the recommended name, but you can change it
      namespace: project-name     # Replace with the project where you will install DataStage
    spec:
      license:
        accept: true
        license: Enterprise | Standard     # Specify the license you purchased
      version: 4.0.9
      storageClass: storage-class-name     # See the guidance in "Information you need to complete this task"
    EOF

When you create the custom resource, the DataStage operator installs DataStage.

Verifying the installation

When you create the custom resource, the DataStage operator processes the contents of the custom resource and starts up the microservices that comprise DataStage, including DataStage. (The DataStage microservice is defined by the datastage custom resource.) DataStage is installed when the DataStage status is Completed.

To check the status of the installation:

  1. Change to the project where you installed DataStage:
    oc project project-name
  2. Get the status of DataStage (datastage):
    oc get DataStage datastage -o jsonpath='{.status.dsStatus} {"\n"}'

    DataStage is ready when the command returns Completed

  3. Get the status of the PXRuntime instance (ds-px-default)
    oc get pxruntime ds-px-default -o jsonpath='{.status.dsStatus} {"\n"}'

    The PXRuntime instance is ready when the command returns Completed.

Choosing a service upgrade plan

You can choose how DataStage is upgraded when you install a newer version of the DataStage operator on the cluster.

Automatic upgrade (recommended)

If you want DataStage to be automatically upgraded when you install a newer version of the DataStage operator on the cluster, remove the version entry from the DataStage custom resource.

To remove the version entry, run the following command. You must update the command with the appropriate project name before you run the command.

oc patch DataStage datastage \
--namespace project-name \
--type=json \
--patch '[{ "op": "remove", "path": "/spec/version" }]'
Manual upgrade

If you want to manually upgrade DataStage after you install a newer version of the DataStage operator, you can pin the installation at a specific version in the DataStage custom resource.

By default, when you create the DataStage custom resource, it includes the version entry, so no additional action is required.

If you removed the version entry from the DataStage custom resource, run the following command to pin the installation at Version 4.0.9. You must update the command with the appropriate project name before you run the command.

oc patch DataStage datastage \
--namespace project-name \
--type=merge \
--patch '{"spec": {"version":"4.0.9"}}'

For a list of operand versions supported by the DataStage operator, see Operator and operand versions.

What to do next

The service is ready to use. For details, see Transforming data (DataStage).