Upgrading Analytics Engine Powered by Apache Spark from Version 3.5 to Version 4.6.0

A project administrator can upgrade Analytics Engine Powered by Apache Spark from Cloud Pak for Data Version 3.5 to Version 4.6.0, 4.6.1, or 4.6.2.

Important: To complete this task, you must be running Analytics Engine Powered by Apache Spark Version 3.5.0 or later.
Supported upgrade paths
If you are running 3.5.0 or later, you can upgrade to Versions 4.6.0 - 4.6.2.
Unsupported upgrade paths
You cannot upgrade from Version 3.5 to Version 4.6.3 or later. You must upgrade to 4.6.2 before you upgrade to 4.6.3 or later.
What permissions do you need to complete this task?
The permissions that you need depend on which tasks you must complete:
  • To create the Analytics Engine Powered by Apache Spark operators, you must have the appropriate permissions to create operators and you must be an administrator of the project where the Cloud Pak for Data operators are installed. This project is identified by the ${PROJECT_CPD_OPS} environment variable.
  • To upgrade Analytics Engine Powered by Apache Spark, you must be an administrator of the project where Analytics Engine Powered by Apache Spark is installed. This project is identified by the ${PROJECT_CPD_INSTANCE} environment variable.
When do you need to complete this task?
If you didn't upgrade Analytics Engine Powered by Apache Spark when you upgraded the platform, you can complete this task to upgrade your existing Analytics Engine Powered by Apache Spark installation.

If you want to upgrade all of the Cloud Pak for Data components at the same time, follow the process in Upgrading the platform and services instead.

Important: All of the Cloud Pak for Data components in a deployment must be installed at the same release.

Information you need to complete this task

Review the following information before you upgrade Analytics Engine Powered by Apache Spark:

Environment variables
The commands in this task use environment variables so that you can run the commands exactly as written.
  • If you don't have the script that defines the environment variables, see Setting up installation environment variables.
  • To use the environment variables from the script, you must source the environment variables before you run the commands in this task, for example:
    source ./cpd_vars.sh
Security context constraint requirements
Analytics Engine Powered by Apache Spark uses the restricted security context constraint (SCC).
Installation location
Analytics Engine Powered by Apache Spark is installed in the same project (namespace) as the Cloud Pak for Data control plane. This project is identified by the ${PROJECT_CPD_INSTANCE} environment variable.
Storage requirements
You don't need to specify storage when you upgrade Analytics Engine Powered by Apache Spark.

Before you begin

This task assumes that the following prerequisites are met:

Prerequisite Where to find more information
The cluster meets the minimum requirements for Analytics Engine Powered by Apache Spark. If this task is not complete, see System requirements.
The workstation from which you will run the upgrade is set up as a client workstation and includes the following command-line interfaces:
  • Cloud Pak for Data CLI: cpd-cli
  • OpenShift® CLI: oc
If this task is not complete, see Setting up a client workstation.
The Cloud Pak for Data control plane is upgraded. If this task is not complete, see Upgrading the platform and services.
For environments that use a private container registry, such as air-gapped environments, the Analytics Engine Powered by Apache Spark software images are mirrored to the private container registry. If this task is not complete, see Mirroring images to a private container registry.

Procedure

Complete the following tasks to upgrade Analytics Engine Powered by Apache Spark:

  1. Logging in to the cluster
  2. Installing the operator
  3. Upgrading the service
  4. Validating the upgrade
  5. Upgrading existing service instances
  6. What to do next

Logging in to the cluster

To run cpd-cli manage commands, you must log in to the cluster.

To log in to the cluster:

  1. Run the cpd-cli manage login-to-ocp command to log in to the cluster as a user with sufficient permissions to complete this task. For example:
    cpd-cli manage login-to-ocp \
    --username=${OCP_USERNAME} \
    --password=${OCP_PASSWORD} \
    --server=${OCP_URL}
    Tip: The login-to-ocp command takes the same input as the oc login command. Run oc login --help for details.

Installing the operator

The Analytics Engine Powered by Apache Spark operator simplifies the process of managing the Analytics Engine Powered by Apache Spark service on Red Hat® OpenShift Container Platform.

To upgrade Analytics Engine Powered by Apache Spark, you must install the Analytics Engine Powered by Apache Spark operator and create the Operator Lifecycle Manager (OLM) objects, such as the catalog source and subscription, for the operator.

Who needs to complete this task?
You must be a cluster administrator (or a user with the appropriate permissions to install operators) to create the OLM objects.
When do you need to complete this task?
Complete this task if the Analytics Engine Powered by Apache Spark operator and other OLM artifacts have not been created for the current release.

It is not necessary to run this command multiple times for each service that you plan to upgrade. If you complete this task and the OLM artifacts already exist on the cluster, the cpd-cli will recreate the OLM objects for all of the existing components in the ${PROJECT_CPD_OPS} project.

To install the operator:

  1. Create the OLM objects for Analytics Engine Powered by Apache Spark:
    cpd-cli manage apply-olm \
    --release=${VERSION} \
    --cpd_operator_ns=${PROJECT_CPD_OPS} \
    --components=analyticsengine
    • If the command succeeds, it returns [SUCCESS]... The apply-olm command ran successfully.
    • If the command fails, it returns [ERROR] and includes information about the cause of the failure.

What to do next: Upgrade the Analytics Engine Powered by Apache Spark service.

Upgrading the service

After the Analytics Engine Powered by Apache Spark operator is installed, you can upgrade Analytics Engine Powered by Apache Spark.

Who needs to complete this task?
You must be an administrator of the project where Analytics Engine Powered by Apache Spark is installed.
When do you need to complete this task?
Complete this task for each instance of Analytics Engine Powered by Apache Spark that is associated with an instance of Cloud Pak for Data Version 4.6.

To upgrade the service:

  1. Create the custom resource for Analytics Engine Powered by Apache Spark.
    cpd-cli manage apply-cr \
    --components=analyticsengine \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --license_acceptance=true
    Remember: To specify advanced configuration options for Analytics Engine Powered by Apache Spark add the following line after the --cpd_instance_ns entry:
    --param-file=/tmp/work/install-options.yml \

Validating the upgrade

Analytics Engine Powered by Apache Spark is upgraded when the apply-cr command returns [SUCCESS]... The apply-cr command ran successfully.

However, you can optionally run the cpd-cli manage get-cr-status command if you want to confirm that the custom resource status is Completed:

cpd-cli manage get-cr-status \
--cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
--components=analyticsengine

Upgrading existing service instances

A service instance upgrade from Cloud Pak for Data 3.5 to 4.x.x is not supported. You will need to create a new instance to use new features introduced in Cloud Pak for Data 4.x.x

What to do next

  1. Before you can submit Spark jobs by using the Spark jobs API, you must provision a service instance. See Provisioning the service instance.
  2. If you used self-signed certificates or CA certificates to securely connect between the Spark runtime and your resources, you need to add these certificates to the Spark truststore again after upgrading Analytics Engine Powered by Apache Spark. For details, see Adding self-signed certificates in Analytics Engine Powered by Apache Spark.
  3. Analytics Engine Powered by Apache Spark is ready to use. For details, see Extending analytics using Spark.