spark-env.sh
The spark-env.sh configuration file supports Spark on EGO in Platform ASC, setting up the environment for a Spark application on the local host. Use the environment variables in this file to define settings such as the resource group to start Spark executors and the resource requirements for a Spark application.
Location
SPARK_HOME=/opt/spark-1.3.1-bin-hadoop2.4
Environment variables
When setting environment variables, take note of the deployment mode. Some variables can be set only in the cluster deployment mode.
Environment variable | Description | Default | Deployment mode |
---|---|---|---|
MASTER | Specifies the deployment mode, which determines whether the Spark Driver runs on the client side or in the EGO cluster. Valid values are:
|
No default | Not applicable |
SPARK_EGO_APP_NAME | Specifies the application name, which forms part of the EGO client name. Define this variable only when the Spark Driver is running as a service, to distinguish it from other Spark drivers. Otherwise, the Spark Driver whose client name is already registered with EGO is rejected. |
An auto-generated UUID | Client and Cluster |
SPARK_EGO_CONSUMER | Specifies the consumer used to request resources from EGO. | SampleApplications/EclipseSamples | Client and Cluster |
SPARK_EGO_UNAME | Specifies the user name to log on to EGO. | Guest | Client and Cluster |
SPARK_EGO_PASSWD | Specifies the password used to authenticate the user name specified in SPARK_EGO_UNAME. | Guest | Client and Cluster |
SPARK_EGO_EXECUTOR_PLAN | Specifies the resource group used to start Spark executors. | ComputeHosts | Client and Cluster |
SPARK_EGO_EXECUTOR_RESREQ | Specifies resource requirements (expressed as a string) based on which requests for specific resources are made to EGO to start Spark executors. | No default | Client and Cluster |
SPARK_EGO_ENABLE_STANDBY | Enables services for Spark executors to be placed in standby mode, where Spark executors do not exit even when they do not occupy slots until the executor idle timeout expires. | true | Client and Cluster |
SPARK_EGO_EXECUTOR_IDLE_TIMEOUT | When standby services are enabled, specifies the duration (in seconds) that an executor stays alive when there is no workload on it. | 600 | Client and Cluster |
SPARK_EGO_EXECUTOR_SLOTS_MAX | Specifies the maximum number of tasks that can run concurrently in one Spark executor. To prevent the Spark executor process from running out of memory, define this variable only after evaluating Spark executor memory and memory usage per task. |
No default | Client and Cluster |
SPARK_EGO_CLIENT_TIMEOUT | Specifies the duration (in seconds) that a client stays registered to EGO even when no workload is submitted. | 900 | Client and Cluster |
SPARK_EGO_DRIVER_PLAN | Specifies the resource group to start the Spark Driver. | ManagementHosts | Cluster |
SPARK_EGO_DRIVER_RESREQ | Specifies resource requirements (based on which requests for specific resources are made to EGO) to start the Spark Driver. | No default | Cluster |
SPARK_EGO_SUBMIT_FILE_REPLICATION | Specifies an integer representing the replication number for the distribution file system (DFS) when uploading resource files to DFS. | If a value is not defined, the default setting of the dfs.replication property in the Hadoop core-site.xml configuration file is used. | Cluster |
SPARK_EGO_CLIENT_REPORT_INTERVAL | Specifies the interval (in milliseconds) at which the client reports status of the Spark Driver container. | 1000 | Cluster |
SPARK_EGO_CLIENT_CONTEXT_WAITTIME | Specifies the duration (in milliseconds) that the Spark Driver waits for context initialization before retrying to detect initialization. | 10000 | Cluster |
SPARK_EGO_CLIENT_CONTEXT_WAITTRIES | Specifies the maximum number of attempts that the Spark Driver tries to detect context initialization. The maximum interval is equal to SPARK_EGO_CLIENT_CONTEXT_WAITTRIES * SPARK_EGO_CLIENT_CONTEXT_WAITTIME. |
10 | Cluster |
SPARK_EGO_STAGING_DIR | Specifies the staging directory location in DFS/NFS. The client uploads .jar and other files to the sub-folder inside the staging directory for the Spark Driver and executors to fetch. | No default | Cluster |
SPARK_SUBMIT_CLASSPATH | Specifies an additional classpath into the client classpath. Define this variable to add an extra classpath when configuring (for example) GPFS. |
No default | Client and Cluster |
SPARK_SUBMIT_LIBRARY_PATH | Specifies an additional library path into the client library path. Define this variable to add an extra library path when configuring (for example) GPFS. |
No default | Client and Cluster |
Example
# The following line is generated by the installation script.
SPARK_EGO_NATIVE_LIBRARY=/opt/spark-1.3.1-bin-hadoop2.4/lib/native/libSparkVEMApi.so
# The following lines are optional. Change the settings as required.
SPARK_EGO_CONSUMER=/SampleApplications/EclipseSamples
SPARK_EGO_EXECUTOR_PLAN=ComputeHosts
SPARK_EGO_DRIVER_PLAN= ManagementHosts