Spark environment variables can be configured at three distinct levels.

Defining service level configuration

The Spark environment variables defined at service level (both immutable and mutable) in the Analytics Engine Custom Resource are global configurations that affect all instances and jobs under the Analytics Engine service.

apiVersion: ae.cpd.ibm.com/v1
kind: AnalyticsEngine
metadata:
  name: analyticsengine-sample
  namespace: cpd-instance
spec:
  blockStorageClass: managed-nfs-storage
  fileStorageClass: managed-nfs-storage
  sparkDefaults:
    immutableConfigs:
      spark.ui.requestheadersize: "12k"
    mutableConfigs:
      ae.kernel.idle_timeout: "1000"
    immutableEnvVars:
      SPARK_WORKER_CORES: "4"
      SPARK_EXECUTOR_INSTANCES: "2"
    mutableEnvVars:
      SPARK_EXECUTOR_MEMORY: "8g"
      SPARK_DRIVER_MEMORY: "4g"
  license:
    accept: true

Defining instance level configuration

The Spark environment variables defined at instance level (mutable and immutable) are stored in the instance table database.

Use the following configuration for Immutable environment variables

curl -k -X PUT/PATCH <cpd-route>/v4/analytics_engines/<instance_id>/immutable_env_vars -H "Authorization: Bearer $TOKEN" --data-raw '{
  "SPARK_WORKER_CORES": "4",
  "SPARK_EXECUTOR_INSTANCES": "2"
}'

Use the following configuration for Mutable environment variables

curl -k -X PUT/PATCH <cpd-route>/v4/analytics_engines/<instance_id>/default_env_vars -H "Authorization: Bearer $TOKEN" --data-raw '{
"SPARK_EXECUTOR_MEMORY": "10g",
"SPARK_DRIVER_MEMORY": "6g"
}'

Defining kernel level configuration

The Spark environment variables defined at kernel level are specified while submitting Spark jobs.

curl -k -X POST "$job_endpoint" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -d '{
"name": "spark-job",
"engine": {
 "env": {
   "SPARK_EXECUTOR_MEMORY": "12g",   # Overrides mutable setting
   "SPARK_WORKER_CORES": "4"        # Ignored (immutable setting)
 }
}
}'