spark-env.sh

The spark-env.sh configuration file supports Spark on EGO in Platform ASC, setting up the environment for a Spark application on the local host. Use the environment variables in this file to define settings such as the resource group to start Spark executors and the resource requirements for a Spark application.

Location

This file is located at $SPARK_HOME/conf/, where $SPARK_HOME is the location of your Apache Spark installation. For example:
SPARK_HOME=/opt/spark-1.3.1-bin-hadoop2.4

Environment variables

When setting environment variables, take note of the deployment mode. Some variables can be set only in the cluster deployment mode.

Environment variable Description Default Deployment mode
MASTER Specifies the deployment mode, which determines whether the Spark Driver runs on the client side or in the EGO cluster. Valid values are:
  • ego-client: Runs the Spark Driver on the client side.
  • ego-cluster: Runs the Spark Driver in the EGO cluster.
No default Not applicable
SPARK_EGO_APP_NAME Specifies the application name, which forms part of the EGO client name.

Define this variable only when the Spark Driver is running as a service, to distinguish it from other Spark drivers. Otherwise, the Spark Driver whose client name is already registered with EGO is rejected.

An auto-generated UUID Client and Cluster
SPARK_EGO_CONSUMER Specifies the consumer used to request resources from EGO. SampleApplications/EclipseSamples Client and Cluster
SPARK_EGO_UNAME Specifies the user name to log on to EGO. Guest Client and Cluster
SPARK_EGO_PASSWD Specifies the password used to authenticate the user name specified in SPARK_EGO_UNAME. Guest Client and Cluster
SPARK_EGO_EXECUTOR_PLAN Specifies the resource group used to start Spark executors. ComputeHosts Client and Cluster
SPARK_EGO_EXECUTOR_RESREQ Specifies resource requirements (expressed as a string) based on which requests for specific resources are made to EGO to start Spark executors. No default Client and Cluster
SPARK_EGO_ENABLE_STANDBY Enables services for Spark executors to be placed in standby mode, where Spark executors do not exit even when they do not occupy slots until the executor idle timeout expires. true Client and Cluster
SPARK_EGO_EXECUTOR_IDLE_TIMEOUT When standby services are enabled, specifies the duration (in seconds) that an executor stays alive when there is no workload on it. 600 Client and Cluster
SPARK_EGO_EXECUTOR_SLOTS_MAX Specifies the maximum number of tasks that can run concurrently in one Spark executor.

To prevent the Spark executor process from running out of memory, define this variable only after evaluating Spark executor memory and memory usage per task.

No default Client and Cluster
SPARK_EGO_CLIENT_TIMEOUT Specifies the duration (in seconds) that a client stays registered to EGO even when no workload is submitted. 900 Client and Cluster
SPARK_EGO_DRIVER_PLAN Specifies the resource group to start the Spark Driver. ManagementHosts Cluster
SPARK_EGO_DRIVER_RESREQ Specifies resource requirements (based on which requests for specific resources are made to EGO) to start the Spark Driver. No default Cluster
SPARK_EGO_SUBMIT_FILE_REPLICATION Specifies an integer representing the replication number for the distribution file system (DFS) when uploading resource files to DFS. If a value is not defined, the default setting of the dfs.replication property in the Hadoop core-site.xml configuration file is used. Cluster
SPARK_EGO_CLIENT_REPORT_INTERVAL Specifies the interval (in milliseconds) at which the client reports status of the Spark Driver container. 1000 Cluster
SPARK_EGO_CLIENT_CONTEXT_WAITTIME Specifies the duration (in milliseconds) that the Spark Driver waits for context initialization before retrying to detect initialization. 10000 Cluster
SPARK_EGO_CLIENT_CONTEXT_WAITTRIES Specifies the maximum number of attempts that the Spark Driver tries to detect context initialization.

The maximum interval is equal to SPARK_EGO_CLIENT_CONTEXT_WAITTRIES * SPARK_EGO_CLIENT_CONTEXT_WAITTIME.

10 Cluster
SPARK_EGO_STAGING_DIR Specifies the staging directory location in DFS/NFS. The client uploads .jar and other files to the sub-folder inside the staging directory for the Spark Driver and executors to fetch. No default Cluster
SPARK_SUBMIT_CLASSPATH Specifies an additional classpath into the client classpath.

Define this variable to add an extra classpath when configuring (for example) GPFS.

No default Client and Cluster
SPARK_SUBMIT_LIBRARY_PATH Specifies an additional library path into the client library path.

Define this variable to add an extra library path when configuring (for example) GPFS.

No default Client and Cluster

Example

# The following line is generated by the installation script. 
SPARK_EGO_NATIVE_LIBRARY=/opt/spark-1.3.1-bin-hadoop2.4/lib/native/libSparkVEMApi.so
# The following lines are optional. Change the settings as required.
SPARK_EGO_CONSUMER=/SampleApplications/EclipseSamples
SPARK_EGO_EXECUTOR_PLAN=ComputeHosts
SPARK_EGO_DRIVER_PLAN= ManagementHosts