spark-env.sh

The spark-env.sh configuration file supports Spark on EGO in Platform ASC, setting up the environment for a Spark application on the local host. Use the environment variables in this file to define settings such as the resource group to start Spark executors and the resource requirements for a Spark application.

Location

This file is located at $SPARK_HOME/conf/, where $SPARK_HOME is the location of your Apache Spark installation. For example:

SPARK_HOME=/opt/spark-1.3.1-bin-hadoop2.4

Environment variables

When setting environment variables, take note of the deployment mode. Some variables can be set only in the cluster deployment mode.

Environment variable	Description	Default	Deployment mode
MASTER	Specifies the deployment mode, which determines whether the Spark Driver runs on the client side or in the EGO cluster. Valid values are: ego-client: Runs the Spark Driver on the client side. ego-cluster: Runs the Spark Driver in the EGO cluster.	No default	Not applicable
SPARK_EGO_APP_NAME	Specifies the application name, which forms part of the EGO client name. Define this variable only when the Spark Driver is running as a service, to distinguish it from other Spark drivers. Otherwise, the Spark Driver whose client name is already registered with EGO is rejected.	An auto-generated UUID	Client and Cluster
SPARK_EGO_CONSUMER	Specifies the consumer used to request resources from EGO.	`SampleApplications/EclipseSamples`	Client and Cluster
SPARK_EGO_UNAME	Specifies the user name to log on to EGO.	`Guest`	Client and Cluster
SPARK_EGO_PASSWD	Specifies the password used to authenticate the user name specified in SPARK_EGO_UNAME.	`Guest`	Client and Cluster
SPARK_EGO_EXECUTOR_PLAN	Specifies the resource group used to start Spark executors.	`ComputeHosts`	Client and Cluster
SPARK_EGO_EXECUTOR_RESREQ	Specifies resource requirements (expressed as a string) based on which requests for specific resources are made to EGO to start Spark executors.	No default	Client and Cluster
SPARK_EGO_ENABLE_STANDBY	Enables services for Spark executors to be placed in standby mode, where Spark executors do not exit even when they do not occupy slots until the executor idle timeout expires.	`true`	Client and Cluster
SPARK_EGO_EXECUTOR_IDLE_TIMEOUT	When standby services are enabled, specifies the duration (in seconds) that an executor stays alive when there is no workload on it.	600	Client and Cluster
SPARK_EGO_EXECUTOR_SLOTS_MAX	Specifies the maximum number of tasks that can run concurrently in one Spark executor. To prevent the Spark executor process from running out of memory, define this variable only after evaluating Spark executor memory and memory usage per task.	No default	Client and Cluster
SPARK_EGO_CLIENT_TIMEOUT	Specifies the duration (in seconds) that a client stays registered to EGO even when no workload is submitted.	900	Client and Cluster
SPARK_EGO_DRIVER_PLAN	Specifies the resource group to start the Spark Driver.	`ManagementHosts`	Cluster
SPARK_EGO_DRIVER_RESREQ	Specifies resource requirements (based on which requests for specific resources are made to EGO) to start the Spark Driver.	No default	Cluster
SPARK_EGO_SUBMIT_FILE_REPLICATION	Specifies an integer representing the replication number for the distribution file system (DFS) when uploading resource files to DFS.	If a value is not defined, the default setting of the dfs.replication property in the Hadoop core-site.xml configuration file is used.	Cluster
SPARK_EGO_CLIENT_REPORT_INTERVAL	Specifies the interval (in milliseconds) at which the client reports status of the Spark Driver container.	1000	Cluster
SPARK_EGO_CLIENT_CONTEXT_WAITTIME	Specifies the duration (in milliseconds) that the Spark Driver waits for context initialization before retrying to detect initialization.	10000	Cluster
SPARK_EGO_CLIENT_CONTEXT_WAITTRIES	Specifies the maximum number of attempts that the Spark Driver tries to detect context initialization. The maximum interval is equal to `SPARK_EGO_CLIENT_CONTEXT_WAITTRIES * SPARK_EGO_CLIENT_CONTEXT_WAITTIME`.	10	Cluster
SPARK_EGO_STAGING_DIR	Specifies the staging directory location in DFS/NFS. The client uploads .jar and other files to the sub-folder inside the staging directory for the Spark Driver and executors to fetch.	No default	Cluster
SPARK_SUBMIT_CLASSPATH	Specifies an additional classpath into the client classpath. Define this variable to add an extra classpath when configuring (for example) GPFS.	No default	Client and Cluster
SPARK_SUBMIT_LIBRARY_PATH	Specifies an additional library path into the client library path. Define this variable to add an extra library path when configuring (for example) GPFS.	No default	Client and Cluster

Example

# The following line is generated by the installation script. 
SPARK_EGO_NATIVE_LIBRARY=/opt/spark-1.3.1-bin-hadoop2.4/lib/native/libSparkVEMApi.so
# The following lines are optional. Change the settings as required.
SPARK_EGO_CONSUMER=/SampleApplications/EclipseSamples
SPARK_EGO_EXECUTOR_PLAN=ComputeHosts
SPARK_EGO_DRIVER_PLAN= ManagementHosts