Upgrading Analytics Engine Powered by Apache Spark from Version 3.5 to Version 4.6.0
A project administrator can upgrade Analytics Engine Powered by Apache Spark from Cloud Pak for Data Version 3.5 to Version 4.6.0, 4.6.1, or 4.6.2.
- Supported upgrade paths
- If you are running 3.5.0 or later, you can upgrade to Versions 4.6.0 - 4.6.2.
- Unsupported upgrade paths
- You cannot upgrade from Version 3.5 to Version 4.6.3 or later. You must upgrade to 4.6.2 before you upgrade to 4.6.3 or later.
- What permissions do you need to complete this task?
- The permissions that you need depend on which tasks you must complete:
- To create the Analytics Engine Powered by Apache
Spark operators, you must have the appropriate permissions to
create operators and you must be an administrator of the project where the Cloud Pak for Data operators are installed. This project is
identified by the
${PROJECT_CPD_OPS}
environment variable. - To upgrade Analytics Engine Powered by Apache
Spark, you must be an
administrator of the project where Analytics Engine Powered by Apache
Spark is installed. This project is identified by
the
${PROJECT_CPD_INSTANCE}
environment variable.
- To create the Analytics Engine Powered by Apache
Spark operators, you must have the appropriate permissions to
create operators and you must be an administrator of the project where the Cloud Pak for Data operators are installed. This project is
identified by the
- When do you need to complete this task?
- If you didn't upgrade Analytics Engine Powered by Apache
Spark when you upgraded the platform, you can complete this
task to upgrade your existing Analytics Engine Powered by Apache
Spark installation.
If you want to upgrade all of the Cloud Pak for Data components at the same time, follow the process in Upgrading the platform and services instead.
Important: All of the Cloud Pak for Data components in a deployment must be installed at the same release.
Information you need to complete this task
Review the following information before you upgrade Analytics Engine Powered by Apache Spark:
- Environment variables
- The commands in this task use environment variables so that you can run the commands exactly as
written.
- If you don't have the script that defines the environment variables, see Setting up installation environment variables.
- To use the environment variables from the script, you must source the environment variables
before you run the commands in this task, for
example:
source ./cpd_vars.sh
- Security context constraint requirements
- Analytics Engine Powered by Apache
Spark uses the
restricted
security context constraint (SCC).
- Installation location
- Analytics Engine Powered by Apache
Spark is installed in the same project
(namespace) as the Cloud Pak for Data control
plane. This
project is identified by the
${PROJECT_CPD_INSTANCE}
environment variable.
- Storage requirements
- You don't need to specify storage when you upgrade Analytics Engine Powered by Apache Spark.
Before you begin
This task assumes that the following prerequisites are met:
Prerequisite | Where to find more information |
---|---|
The cluster meets the minimum requirements for Analytics Engine Powered by Apache Spark. | If this task is not complete, see System requirements. |
The workstation from which you will run the upgrade is set up as a client workstation and
includes the following command-line interfaces:
|
If this task is not complete, see Setting up a client workstation. |
The Cloud Pak for Data control plane is upgraded. | If this task is not complete, see Upgrading the platform and services. |
For environments that use a private container registry, such as air-gapped environments, the Analytics Engine Powered by Apache Spark software images are mirrored to the private container registry. | If this task is not complete, see Mirroring images to a private container registry. |
Procedure
Complete the following tasks to upgrade Analytics Engine Powered by Apache Spark:
Logging in to the cluster
To run cpd-cli
manage
commands, you must log in to the cluster.
To log in to the cluster:
-
Run the
cpd-cli manage login-to-ocp
command to log in to the cluster as a user with sufficient permissions to complete this task. For example:cpd-cli manage login-to-ocp \ --username=${OCP_USERNAME} \ --password=${OCP_PASSWORD} \ --server=${OCP_URL}
Tip: Thelogin-to-ocp
command takes the same input as theoc login
command. Runoc login --help
for details.
Installing the operator
The Analytics Engine Powered by Apache Spark operator simplifies the process of managing the Analytics Engine Powered by Apache Spark service on Red Hat® OpenShift Container Platform.
To upgrade Analytics Engine Powered by Apache Spark, you must install the Analytics Engine Powered by Apache Spark operator and create the Operator Lifecycle Manager (OLM) objects, such as the catalog source and subscription, for the operator.
- Who needs to complete this task?
- You must be a cluster administrator (or a user with the appropriate permissions to install operators) to create the OLM objects.
- When do you need to complete this task?
- Complete this task if the Analytics Engine Powered by Apache
Spark operator and other OLM artifacts have not been created for the
current release.
It is not necessary to run this command multiple times for each service that you plan to upgrade. If you complete this task and the OLM artifacts already exist on the cluster, the
cpd-cli
will recreate the OLM objects for all of the existing components in the${PROJECT_CPD_OPS}
project.
To install the operator:
- Create
the OLM objects for Analytics Engine Powered by Apache
Spark:
cpd-cli manage apply-olm \ --release=${VERSION} \ --cpd_operator_ns=${PROJECT_CPD_OPS} \ --components=analyticsengine
- If the command succeeds, it returns [SUCCESS]... The apply-olm command ran successfully.
- If the command fails, it returns [ERROR] and includes information about the cause of the failure.
What to do next: Upgrade the Analytics Engine Powered by Apache Spark service.
Upgrading the service
After the Analytics Engine Powered by Apache Spark operator is installed, you can upgrade Analytics Engine Powered by Apache Spark.
- Who needs to complete this task?
- You must be an administrator of the project where Analytics Engine Powered by Apache Spark is installed.
- When do you need to complete this task?
- Complete this task for each instance of Analytics Engine Powered by Apache Spark that is associated with an instance of Cloud Pak for Data Version 4.6.
To upgrade the service:
- Create the custom resource for Analytics Engine Powered by Apache
Spark.
cpd-cli manage apply-cr \ --components=analyticsengine \ --release=${VERSION} \ --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \ --license_acceptance=true
Remember: To specify advanced configuration options for Analytics Engine Powered by Apache Spark add the following line after the--cpd_instance_ns
entry:--param-file=/tmp/work/install-options.yml \
Validating the upgrade
Analytics Engine Powered by Apache
Spark is upgraded when the apply-cr
command returns [SUCCESS]... The apply-cr command ran
successfully.
However, you can optionally run the cpd-cli
manage
get-cr-status
command if you want to confirm that the custom
resource status is Completed
:
cpd-cli manage get-cr-status \
--cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
--components=analyticsengine
Upgrading existing service instances
A service instance upgrade from Cloud Pak for Data 3.5 to 4.x.x is not supported. You will need to create a new instance to use new features introduced in Cloud Pak for Data 4.x.x
What to do next
- Before you can submit Spark jobs by using the Spark jobs API, you must provision a service instance. See Provisioning the service instance.
- If you used self-signed certificates or CA certificates to securely connect between the Spark runtime and your resources, you need to add these certificates to the Spark truststore again after upgrading Analytics Engine Powered by Apache Spark. For details, see Adding self-signed certificates in Analytics Engine Powered by Apache Spark.
- Analytics Engine Powered by Apache Spark is ready to use. For details, see Extending analytics using Spark.