Upgrading DataStage from Version 4.0 to Version 4.6

A project administrator can upgrade DataStage from Cloud Pak for Data Version 4.0 to Version 4.6.

Important: To complete this task, you must be running DataStage Version 4.0.2 or later. (Version 4.0.2 was released with Cloud Pak for Data Version 4.0 Refresh 2.)
Supported upgrade paths
If you are running DataStage Version 4.0.2 or later, you can upgrade to Versions 4.6.0 - 4.6.2.
Unsupported upgrade paths
You cannot upgrade from Version 4.0 to Version 4.6.3 or later. You must upgrade to 4.6.2 before you upgrade to 4.6.3 or later.
What permissions do you need to complete this task?
The permissions that you need depend on which tasks you must complete:
  • To update the DataStage operators, you must have the appropriate permissions to create operators and you must be an administrator of the project where the Cloud Pak for Data operators are installed. This project is identified by the ${PROJECT_CPD_OPS} environment variable.
  • To upgrade DataStage, you must be an administrator of the project where DataStage is installed. This project is identified by the ${PROJECT_CPD_INSTANCE} environment variable.
When do you need to complete this task?
If you didn't upgrade DataStage when you upgraded the platform, you can complete this task to upgrade your existing DataStage installation.

If you want to upgrade all of the Cloud Pak for Data components at the same time, follow the process in Upgrading the platform and services instead.

Important: All of the Cloud Pak for Data components in a deployment must be installed at the same release.

Information you need to complete this task

Review the following information before you upgrade DataStage:

Environment variables
The commands in this task use environment variables so that you can run the commands exactly as written.
  • If you don't have the script that defines the environment variables, see Setting up installation environment variables.
  • To use the environment variables from the script, you must source the environment variables before you run the commands in this task, for example:
    source ./cpd_vars.sh
Installation location
DataStage is installed in the same project (namespace) as the Cloud Pak for Data control plane. This project is identified by the ${PROJECT_CPD_INSTANCE} environment variable.
Common core services
DataStage requires the Cloud Pak for Data common core services.

If the common core services are not at the required version for the release, the common core services will be automatically upgraded when you upgrade DataStage. This increases the amount of time the upgrade takes to complete.

Storage requirements
You must tell DataStage what storage you use in your existing installation. You cannot change the storage that is associated with DataStage during an upgrade. Ensure that the environment variables point to the correct storage classes for your environment.

Before you begin

This task assumes that the following prerequisites are met:

Prerequisite Where to find more information
The cluster meets the minimum requirements for DataStage. If this task is not complete, see System requirements.
The workstation from which you will run the upgrade is set up as a client workstation and includes the following command-line interfaces:
  • Cloud Pak for Data CLI: cpd-cli
  • OpenShift® CLI: oc
If this task is not complete, see Setting up a client workstation.
The Cloud Pak for Data control plane is upgraded. If this task is not complete, see Upgrading the platform and services.
For environments that use a private container registry, such as air-gapped environments, the DataStage software images are mirrored to the private container registry. If this task is not complete, see Mirroring images to a private container registry.

Procedure

Complete the following tasks to upgrade DataStage:

Logging in to the cluster

To run cpd-cli manage commands, you must log in to the cluster.

To log in to the cluster:

  1. Run the cpd-cli manage login-to-ocp command to log in to the cluster as a user with sufficient permissions to complete this task. For example:
    cpd-cli manage login-to-ocp \
    --username=${OCP_USERNAME} \
    --password=${OCP_PASSWORD} \
    --server=${OCP_URL}
    Tip: The login-to-ocp command takes the same input as the oc login command. Run oc login --help for details.

Removing patches before upgrading

If there are patches applied to DataStage using image digest overrides, then they need to be removed before upgrading.

Run the following steps to remove the patches:
  1. Remove digest overrides from all PX runtime custom resources (CRs) by running:
    oc -n $CPD_INSTANCE_NS get pxruntime | awk 'NR>1 { print $1 }' | xargs -I % oc -n $CPD_INSTANCE_NS patch pxruntime % --type='json' -p='[{"op": "remove", "path": "/spec/image_digests"}]'
    
  2. Remove digest overrides from the DataStage custom resources (CRs) by running:
    oc -n $CPD_INSTANCE_NS patch datastage datastage --type='json' -p='[{"op": "remove", "path": "/spec/image_digests"}, {"op": "remove", "path": "/spec/custom/image_digests"}]'

Specifying which edition to upgrade

DataStage is available in two different editions: DataStage Enterprise and DataStage Enterprise Plus. You must specify which edition to upgrade.

Set the ${DATASTAGE_TYPE} variable to the edition of DataStage that you want to upgrade:
  • For DataStage Enterprise, run
    export DATASTAGE_TYPE=datastage_ent
  • For DataStage Enterprise Plus, run
    export DATASTAGE_TYPE=datastage_ent_plus

Updating the operator

The DataStage operator simplifies the process of managing the DataStage service on Red Hat® OpenShift Container Platform.

To upgrade DataStage, ensure that all of the Operator Lifecycle Manager (OLM) objects in the ${PROJECT_CPD_OPS} project, such as the catalog sources and subscriptions, are upgraded to the appropriate release. All of the OLM objects must be at the same release.

Who needs to complete this task?
You must be a cluster administrator (or a user with the appropriate permissions to install operators) to create the OLM objects.
When do you need to complete this task?
Complete this task before you upgrade the DataStage service.

To update the operator:

  1. Update the OLM objects for Cloud Pak for Data, IBM Cloud Pak® foundational services, and any other services on the cluster.
    Important: Run this command only if you have not updated the OLM objects for these components.

    You do not need to run this command again if the OLM objects are already at the release that is specified by the --release=${VERSION} option.

    cpd-cli manage apply-olm \
    --release=${VERSION} \
    --cpd_operator_ns=${PROJECT_CPD_OPS} \
    --upgrade=true
  2. Note: Complete this step only if you are running DataStage Enterprise Plus.
    Update the OLM objects for DataStage:
    cpd-cli manage apply-olm \
    --release=${VERSION} \
    --cpd_operator_ns=${PROJECT_CPD_OPS} \
    --components=${DATASTAGE_TYPE}
    
    • If the command succeeds, it returns [SUCCESS]... The apply-olm command ran successfully.
    • If the command fails, it returns [ERROR] and includes information about the cause of the failure.
  3. If you ran step 2 and do not have Watson™ Knowledge Catalog installed, run the following command to remove the previous catalog source:
    oc -n openshift-marketplace delete catalogsource ibm-cpd-datastage-operator-catalog

What to do next: Upgrade the DataStage service.

Upgrading the service

After the DataStage operator is updated, you can upgrade DataStage.

Who needs to complete this task?
You must be an administrator of the project where DataStage is installed.
When do you need to complete this task?
Complete this task for each instance of DataStage that is associated with an instance of Cloud Pak for Data Version 4.6.

To upgrade the service:

  1. Update the custom resource for DataStage.

    The command that you run depends on the storage on your cluster:


    Red Hat OpenShift Data Foundation storage

    Run the following command to update the custom resource.

    cpd-cli manage apply-cr \
    --components=${DATASTAGE_TYPE} \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --block_storage_class=${STG_CLASS_BLOCK} \
    --file_storage_class=${STG_CLASS_FILE} \
    --license_acceptance=true \
    --upgrade=true

    IBM Storage Scale Container Native storage

    Run the following command to update the custom resource.

    Remember: When you use IBM Storage Scale Container Native storage, both ${STG_CLASS_BLOCK} and ${STG_CLASS_FILE} point to the same storage class.
    cpd-cli manage apply-cr \
    --components=${DATASTAGE_TYPE} \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --block_storage_class=${STG_CLASS_BLOCK} \
    --file_storage_class=${STG_CLASS_FILE} \
    --license_acceptance=true \
    --upgrade=true

    Portworx storage

    Run the following command to update the custom resource.

    cpd-cli manage apply-cr \
    --components=${DATASTAGE_TYPE} \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --storage_vendor=portworx \
    --license_acceptance=true \
    --upgrade=true

    NFS storage

    Run the following command to update the custom resource.

    Remember: When you use NFS storage, both ${STG_CLASS_BLOCK} and ${STG_CLASS_FILE} point to the same storage class.
    cpd-cli manage apply-cr \
    --components=${DATASTAGE_TYPE} \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --block_storage_class=${STG_CLASS_BLOCK} \
    --file_storage_class=${STG_CLASS_FILE} \
    --license_acceptance=true \
    --upgrade=true

    IBM Cloud with IBM Cloud File Storage only

    Run the following command to update the custom resource.

    Remember: When you use IBM Cloud File Storage storage, both ${STG_CLASS_BLOCK} and ${STG_CLASS_FILE} point to the same storage class.
    cpd-cli manage apply-cr \
    --components=${DATASTAGE_TYPE} \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --block_storage_class=${STG_CLASS_BLOCK} \
    --file_storage_class=${STG_CLASS_FILE} \
    --license_acceptance=true \
    --upgrade=true

    IBM Cloud with IBM Cloud File Storage and IBM Cloud Block Storage

    Run the following command to update the custom resource.

    cpd-cli manage apply-cr \
    --components=${DATASTAGE_TYPE} \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --block_storage_class=${STG_CLASS_BLOCK} \
    --file_storage_class=${STG_CLASS_FILE} \
    --license_acceptance=true \
    --upgrade=true

Validating the upgrade

DataStage is upgraded when the apply-cr command returns [SUCCESS]... The apply-cr command ran successfully.

However, you can optionally run the cpd-cli manage get-cr-status command if you want to confirm that the custom resource status is Completed:

cpd-cli manage get-cr-status \
--cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
--components=${DATASTAGE_TYPE}

What to do next

The service is ready to use. See Transforming data.