Changing the Spark default runtime

IBM Cloud Pak for Data supports multiple Spark runtime versions. For more information about the supported Spark versions, see Supported Spark version. The default version is Spark 3.4. You can change the default Spark runtime version.

You can change the Spark runtime version by using the following ways:

At the service instance level
- By using the Cloud Pak for Data user interface. For more information about updating the default runtime, see Managing Analytics Engine powered by Apache Spark instances.
- By using the APIs. For more information about configurations, see Changing the Spark default runtime version by using APIs.
At the Spark job or kernel level
- By using Spark jobs. For more information, see Setting the Spark runtime by using Spark jobs.
- By using Spark kernels. For more information, see Setting the Spark runtime by using Spark kernels.

Changing default Spark runtime at the instance level

You can change the default value of Spark runtime version at the service instance level. IBM Cloud Pak for Data supports changing the default runtime version by using UI or API. The Spark job or kernel run considers the default Spark version set at instance level unless until specified at the job or kernel level.

Changing the Spark default runtime version by using APIs

To edit the Spark runtime version:

Get the service endpoint:
1. From the navigation menu , click Services > Instances, find the instance and click it to view the instance details.
2. Under the Configurations tab, from the Spark jobs V4 endpoint, get the instance ID. The format of the endpoint is: https://<CloudPakforData_URL>/v4/analytics_engines/<INSTANCE_ID>/spark_applications
Generate an access token. For more information on generating an access token, see Generating an API authorization token.

Set the Cloud Pak for Data URL, instance ID and access token by using the following commands:

export CPD_URL="<CloudPakforData_URL>"
export INSTANCE_ID="<INSTANCE_ID>"
export ACCESS_TOKEN="<ACCESS_TOKEN>"

Set the default Spark runtime version.

Example:

curl -X PUT \
"$CPD_URL/v4/analytics_engines/$INSTANCE_ID/default_runtime" \
--header "Authorization: ZenApiKey $ACCESS_TOKEN" \
--header 'Content-Type: application/json' \
--data-raw '{
"spark_version": "3.4"
}'

spark_version is the default Spark runtime version that is used to run the Spark applications. IBM Cloud Pak for Data supports Spark 3.4.

Getting the instance default runtime

Run the following command to get the default runtime environment on which all workloads of the instance run.

curl -X GET \
"$CPD_URL/v4/analytics_engines/$INSTANCE_ID/default_runtime" \
--header "Authorization: ZenApiKey $ACCESS_TOKEN"

Sample response:


{
"spark_version": "3.4"
}

Changing Spark runtime at Spark job or kernel level

You can override the Spark runtime version set at instance level by submitting a Spark job or kernel.

Setting the Spark runtime by using Spark jobs

You can set the runtime version by using Spark jobs. To do that:

Submit a Spark application by using the API. For more information on how to submit a Spark application, see Submitting Spark jobs via API.

Use the runtime.spark_version API parameter for the Spark application. For more information on the Spark jobs API parameters, see Spark jobs API parameters.

Setting the Spark runtime by using Spark kernels

You can set the runtime version by using kernel API. To do that:

Launch an interactive application by using the Spark kernel API. For more information on how to submit a Spark kernel API, see Using the Kernel API.

Use the engine.runtime.spark_version API parameter for the Spark application. For more information on the Spark jobs API parameters, see Spark jobs API parameters.