Spark environment variables can be configured at three distinct levels.
- Service Level (Service-Wide Configuration): Applies to all Spark instances and jobs and you cannot override the immutable variables defined here.
- Instance Level (Specific to Instances of Analytics Engine): Specific to a single instance of the Analytics Engine. You can ovverride the mutable variables defined here but you cannot ovverride the immutable variables.
- Job/Kernel Level (Specific to individual Spark jobs): Environment variables that are passed when submitting a Spark job. Only mutable variables can be set at this level.
Defining service level configuration
The Spark environment variables defined at service level (both immutable and mutable) in the Analytics Engine Custom Resource are global configurations that affect all instances and jobs under the Analytics Engine service.
apiVersion: ae.cpd.ibm.com/v1
kind: AnalyticsEngine
metadata:
name: analyticsengine-sample
namespace: cpd-instance
spec:
blockStorageClass: managed-nfs-storage
fileStorageClass: managed-nfs-storage
sparkDefaults:
immutableConfigs:
spark.ui.requestheadersize: "12k"
mutableConfigs:
ae.kernel.idle_timeout: "1000"
immutableEnvVars:
SPARK_WORKER_CORES: "4"
SPARK_EXECUTOR_INSTANCES: "2"
mutableEnvVars:
SPARK_EXECUTOR_MEMORY: "8g"
SPARK_DRIVER_MEMORY: "4g"
license:
accept: true
Defining instance level configuration
The Spark environment variables defined at instance level (mutable and immutable) are stored in the instance table database.
Use the following configuration for Immutable environment variables
curl -k -X PUT/PATCH <cpd-route>/v4/analytics_engines/<instance_id>/immutable_env_vars -H "Authorization: Bearer $TOKEN" --data-raw '{
"SPARK_WORKER_CORES": "4",
"SPARK_EXECUTOR_INSTANCES": "2"
}'
Use the following configuration for Mutable environment variables
curl -k -X PUT/PATCH <cpd-route>/v4/analytics_engines/<instance_id>/default_env_vars -H "Authorization: Bearer $TOKEN" --data-raw '{
"SPARK_EXECUTOR_MEMORY": "10g",
"SPARK_DRIVER_MEMORY": "6g"
}'
Defining kernel level configuration
The Spark environment variables defined at kernel level are specified while submitting Spark jobs.
curl -k -X POST "$job_endpoint" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -d '{
"name": "spark-job",
"engine": {
"env": {
"SPARK_EXECUTOR_MEMORY": "12g", # Overrides mutable setting
"SPARK_WORKER_CORES": "4" # Ignored (immutable setting)
}
}
}'