Spark on EGO instance group parameters
Configure an instance group for resource management and session scheduling using the Spark on EGO plug-in.
Spark on EGO settings
- There is a lowercase counterpart of the uppercase properties. Use the lowercase version in the spark-submit command.
- To configure host blocking, see the Host Blocking drop-down list. You can also check the settings for the BLOCKHOST parameters in the spark-env.sh configuration file.
- SPARK_EGO_ACCESS_NAMENODES
- In cluster mode, specifies the name nodes from which the application can get the delegation token.
- SPARK_EGO_APP_MAX_IN_CACHE
-
Specifies the maximum number of running applications that can be retained in the Spark master memory cache. Valid value is a positive integer.
- SPARK_EGO_APPS_MAX_SEARCH_NUM
-
Specifies the maximum number of applications to be searched. When the Spark master retains some applications, only the top 100k applications (default) are searchable.
- SPARK_EGO_APPS_MAX_RESULTS_NUM
-
Specifies the maximum number of applications to be returned in one request. If you change this parameter to return more than 200 applications (default), ensure that you increase the Spark master's JVM memory.
- SPARK_EGO_AUTH_MODE
-
Specifies the authentication and authorization mode that is used with EGO security plug-ins.Note: In the cluster management console, this parameter is controlled by selecting the Enable authentication and authorization for the submission user check box in the Basic Settings tab of an instance group. If selected, the Spark master authenticates and authorizes the specified user. Unselected, the Spark master trusts all specified users.Valid values for SPARK_EGO_AUTH_MODE are:
- EGO_TRUST: The Spark master trusts all specified users. No
password is required. When you run a Spark application from the spark-submit
command, you can add the spark.ego.uname with the user name. For
example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://MasterHost:7077 --conf spark.ego.uname=UserName $SPARK_HOME/lib/spark-examples-1.6.1-hadoop2.6.0.jar 100 - EGO_AUTH: The Spark master authenticates and authorizes the
specified user. When you run a Spark application from the spark-submit command,
you must add the parameters spark.ego.uname and
spark.ego.passwd.
For spark.ego.passwd, you can either specify the password in the spark-submit command, or you can leave the value empty (
Password included in the spark-submit command:--conf spark.ego.passwd=). If the value is empty, the spark-submit command expects a password from either running the command or stdin (a specified file). For example:./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://MasterHost:7077 --conf spark.ego.uname=UserName --conf spark.ego.passwd=Password $SPARK_HOME/lib/spark-examples-1.6.1-hadoop2.6.0.jar 100Password from the spark-submit command:
User is prompted to enter in the password../bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://MasterHost:7077 --conf spark.ego.uname=UserName --conf spark.ego.passwd= $SPARK_HOME/lib/spark-examples-1.6.1-hadoop2.6.0.jar 100Password from stdin (a specified file):cat password_file | bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://MasterHost:7077 --conf spark.ego.uname=UserName --conf spark.ego.passwd= $SPARK_HOME/lib/spark-examples-1.6.1-hadoop2.6.0.jar 100
- EGO_TRUST: The Spark master trusts all specified users. No
password is required. When you run a Spark application from the spark-submit
command, you can add the spark.ego.uname with the user name. For
example:
- SPARK_EGO_CACHED_EXECUTOR_IDLE_TIMEOUT
-
For shared
SparkContextapplications, specifies how long (in seconds) an executor with cached data blocks stays alive without any workload running on it. When an executor has cached data, this timeout overrides the value of the SPARK_EGO_EXECUTOR_IDLE_TIMEOUT parameter. Valid value is an integer starting from 1. - SPARK_EGO_CLIENT_TTL
-
When high availability is enabled for an instance group during creation, specifies how long resource allocations for the Spark master are retained after the Spark master goes down. If the Spark master restarts within the specified duration (defined in minutes), the Spark master retrieves those allocations for reuse.
- SPARK_EGO_DISTRIBUTE_FILES
-
Specifies whether to distribute files through a shared file system. Valid values are true and false. If set to true, resource files (.jar files and scripts) are uploaded to the staging directory.
- SPARK_EGO_DOCKER_CONTAINER_START_TIMEOUT
-
Specifies the wait duration (in seconds) for a Docker container to start. If the container is not started within this duration (default 60 seconds), the Docker controller terminates the container process. Valid value is a positive integer.
- SPARK_EGO_DOCKER_CONTAINER_STOP_TIMEOUT
-
Specifies the wait duration (in seconds) for a Docker container to stop. If the container is not stopped within this duration (default 60 seconds), the Docker controller terminates the container SPARK_EGO_SHARED_CONTEXT_POOL_SIZE process. Valid value is a positive integer.
- SPARK_EGO_DRIVER_BLOCKHOST_CONF
-
Defines rules to block hosts when Spark drivers in an instance group fail a specified number of times within a specified duration of time on those hosts. When a host is blocked, the Spark master does not allocate resources from the host for driver startup.
Specify this parameter with one or more keywords that are enclosed within single quotation marks (') that can contain uppercase and lowercase letters, numbers, underscores (_), and dashes (-). Multiple keywords within each rule must be separated by a space ( ). Multiple rules must be separated by a semi-colon (;). Multiple phrases within each rule must be separated by a comma that represents an AND condition.
For example, when the configuration is defined as '[127];[100];rule1,rule2;rule3;rule4', a semi-colon (;) separates three rule sets. The numbers in the first bracket are built-in driver exit code rules. The numbers in the second bracket are user driver program exit codes. The other numbers are user-defined rules that are separated by a semicolon.
Assuming the following conditions, if the driver's exit reason on a host matches one or more of these rules, that host is blocked:- The driver does not start on a host because the startup command does not exist. The host is blocked and the resources are returned to the resource orchestrator. The resource orchestrator cannot allocate resources for the driver on this host until the user unblocks this host from the command line or from the cluster management console.
- The driver starts on a host and then fails; the exit reason contains the phrase rule1 and rule2, which meets the host blocking criteria. The host is blocked and the resources are returned to the resource orchestrator. The resource orchestrator cannot allocate resources for the driver on this host until the user unblocks this host from the command line or from the cluster management console.
- The driver starts on a host and then fails; the user program exit code is 100, which meets the host blocking criteria. The resource orchestrator cannot allocate resources for the driver on this host until the user unblocks this host from the command line or from the cluster management console.
The following example shows the SPARK_EGO_DRIVER_BLOCKHOST_CONF parameter, which is defined with additional rules:SPARK_EGO_EXECUTOR_BLOCKHOST_CONF='[6,7,16,17,18,25,126,127];[d1,d2,d3];rule1,rule2;rule3;rule4'In this example, the numbers in first bracket are built-in rules, while the numbers in second bracket are user-defined driver program exit codes. Other user-defined rules can be added after the second bracket, separated by a semicolon. Rules that are separated by a comma represent AND conditions, while rules that are separated by a semicolon represent OR conditions.Note: The exit codes 0-40 are occupied by system. When you add program exit codes as host blocking rules, specify numbers outside of this scope.Default: [6,7,16,17,18,25,126,127];[]
- SPARK_EGO_DRIVER_BLOCKHOST_DURATION
-
When host blocking is enabled, this parameter specifies the duration in minutes within which a Spark driver fails to start or a driver exits due to an error before the host is blocked for the instance group. Within the specified duration, the host is blocked if it fails due to an error hitting the same rule as many times as specified for the SPARK_EGO_DRIVER_BLOCKHOST_MAX_FAILURE_TIMES parameter.
- SPARK_EGO_DRIVER_BLOCKHOST_MAX_FAILURE_TIMES
-
When host blocking is enabled, this parameter specifies the maximum number of times that a driver fails to start or a driver exits due to an error hitting the same rule before the host is blocked. The host is blocked if it fails as many times as specified for this parameter within the duration of time that is specified for the SPARK_EGO_DRIVER_BLOCKHOST_DURATION parameter. If the values for this parameter and the SPARK_EGO_EXECUTOR_BLOCKHOST_MAX_FAILURE_TIMES parameter are 0, host blocking is disabled.
- SPARK_EGO_DRIVER_RESREQ
-
Resource requirement for executors when requesting resources from EGO.
- SPARK_EGO_ENABLE_BLOCKHOST
-
Enable or disable host blocking. To enable configure host blocking, ensure that the SPARK_EGO_ENABLE_BLOCKHOST parameter is set to true. Valid values are true and false.
- SPARK_EGO_ENABLE_PREEMPTION
-
Enables preemption. This setting disables reclaim for performance tuning purposes. Valid values are true and false.
- SPARK_EGO_EXECUTOR_BLOCKHOST_CONF
-
Defines rules to block hosts when Spark executors in an instance group fail a specified number of times within a specified duration of time on those hosts. When a host is blocked, the Spark master does not allocate resources from the host for executor startup.
Specify this parameter with one or more keywords that are enclosed within single quotation marks (') that can contain uppercase and lowercase letters, numbers, underscores (_), and dashes (-). Multiple keywords within each rule must be separated by a space ( ). Multiple rules must be separated by a semi-colon (;). Multiple phrases within each rule must be separated by a comma that represents an AND condition.
For example, when the configuration is defined as '[127];[100];rule1,rule2;rule3;rule4', a semi-colon (;) separates three rule sets. The numbers in the first bracket are built-in executor exit code rules. The numbers in the second bracket are user executor program exit codes. The other numbers are user-defined rules that are separated by a semicolon.
Assuming the following conditions, if the executor's exit reason on a host matches one or more of these rules, that host is blocked:- The executor does not start on a host because the startup command does not exist. The host is blocked and the resources are returned to the resource orchestrator. The resource orchestrator cannot allocate resources for the executor on this host until the user unblocks this host from the command line or from the cluster management console.
- The executor starts on a host and then fails; the exit reason contains the phrase rule1 and rule2, which meets the host blocking criteria. The host is blocked and the resources are returned to the resource orchestrator. The resource orchestrator cannot allocate resources for the executor on this host until the user unblocks this host from the command line or from the cluster management console.
- The executor starts on a host and then fails; the user program exit code is 100, which meets the host blocking criteria. The resource orchestrator cannot allocate resources for the executor on this host until the user unblocks this host from the command line or from the cluster management console.
The following example shows the SPARK_EGO_EXECUTOR_BLOCKHOST_CONF parameter, which is defined with additional rules:SPARK_EGO_EXECUTOR_BLOCKHOST_CONF='[6,7,16,17,18,25,126,127];[d1,d2,d3];rule1,rule2;rule3;rule4'In this example, the numbers in first bracket are built-in rules, while the numbers in second bracket are user-defined executor program exit codes. Other user-defined rules can be added after the second bracket, separated by a semicolon. Rules that are separated by a comma represent AND conditions, while rules that are separated by a semicolon represent OR conditions.Note: The exit codes 0-40 are occupied by system. When you add program exit codes as host blocking rules, specify numbers outside of this scope. - SPARK_EGO_EXECUTOR_BLOCKHOST_DURATION
-
When host blocking is enabled, this parameter specifies the duration in minutes within which a Spark executor fails to start or an executor exits due to an error before the host is blocked for the instance group. Within the specified duration, the host is blocked if it fails due to an error hitting the same rule as many times as specified for the SPARK_EGO_EXECUTOR_BLOCKHOST_MAX_FAILURE_TIMES parameter.
- SPARK_EGO_EXECUTOR_BLOCKHOST_MAX_FAILURE_TIMES
-
When host blocking is enabled, this parameter specifies the maximum number of times that an executor fails to start or an executor exits due to an error hitting the same rule before the host is blocked. The host is blocked if it fails as many times as specified for this parameter within the duration of time that is specified for the SPARK_EGO_EXECUTOR_BLOCKHOST_DURATION parameter. If the values for this parameter and the SPARK_EGO_DRIVER_BLOCKHOST_MAX_FAILURE_TIMES parameter are 0, host blocking is disabled.
- SPARK_EGO_EXECUTOR_RESREQ
-
Resource requirement for drivers when requesting resources from EGO.
- SPARK_EGO_EXECUTORS_MAX_RESULTS_NUM
-
Specifies the maximum number of executors to be returned for each application. If you change this parameter to return more than 200 executors per application (default), ensure that you increase the Spark master's JVM memory.
- SPARK_EGO_FAILOVER_INDEX_CLEANUP_TIMEOUT
-
Specifies the timeout (in hours) for indices/records in memory to be cleaned up after Spark master failover if the application/driver/executor status remains unknown. The status is assumed to be FINISHED for applications and drivers, and EXITED for executors. Indices are not cleaned up if the Spark master receives an update within the timeout.
- SPARK_EGO_FREE_SLOTS_IDLE_TIMEOUT
-
Specifies how long (in seconds) the Spark driver must retain free slots before releasing them back to the Spark master. If new tasks are generated within the specified duration, the slots are allocated for those tasks. Valid value is an integer, starting from 0 up to the value of the SPARK_EGO_EXECUTOR_IDLE_TIMEOUT parameter.
- SPARK_EGO_HOST_CLOSE_GRACE_PERIOD
-
When a host is manually closed for maintenance, specifies the duration (in seconds) to wait for Spark drivers and executors that are running on the host to complete. After the period is passed, any running drivers and executors are terminated. Valid value is a positive integer, starting from 0.
- SPARK_EGO_IMPERSONATION
-
Specifies the designated account to run Spark applications. Valid values are true and false. If set to true, Spark applications submitted from the command line run as the submission user. If set to false, Spark applications run as the consumer execution user for the driver and executor.
Note: Consider the following notes when you configure SPARK_EGO_IMPERSONATION:- The owner of the driver and executor log is the execution user of the driver and executor.
- If you are submitting Spark applications on a host inside the cluster in client mode, you must ensure that the user who is logged in to the client host and either the submission user (impersonation is enabled) or the consumer execution user (impersonation is disabled) for the Spark executors are the same user or you receive a permission issue.
- If SPARK_EGO_IMPERSONATION is set to true and SPARK_EGO_AUTH_MODE is set to EGO_AUTH, the user of spark.ego.uname or spark.ego.credential must be the LDAP or OS execution user, rather than an EGO user such as Admin.
- In the cluster management console, this parameter is controlled by selecting the Enable impersonation to have Spark applications run as the submission user check box in the Basic Settings tab of a instance group. If selected, Spark applications run as the submission user. Unselected, Spark applications run as the consumer execution user for the driver and executor.
Default: false
- SPARK_EGO_LAST_UPDATED_TIMESTAMP
-
Specifies the timestamp for the instance group deployment in the Spark deployment script (spark-env.sh).
- SPARK_EGO_LOG_GROUP_READABLE
-
Enables group read permission to read Spark service logs.
This parameter does not display by default. To show it in the Spark on EGO drop-down list:- Click .
- In the Spark tab, click Configuration.
- Scroll to the Additional Environment Variables section, click Add an Environment Variable, and add an environment variable with the name SPARK_EGO_LOG_GROUP_READABLE and the value true.
- Click Save.
Any user that should have read access to the Spark service logs should be either a member of the primary user group of the execution user of the instance group, or a member of the user group that is provided as input in the Administrator user group parameter of the instance group.
Default: false
- SPARK_EGO_LOG_OTHER_READABLE_ENABLED
-
Enables others the ability to read Spark driver, executor, and application event logs.
Note that if you modify an existing instance group and set SPARK_EGO_LOG_OTHER_READABLE_ENABLED=true, the setting will only affect logs for new Spark applications.
This parameter does not display by default. To show it in the Spark on EGO drop-down list, configure SPARK_EGO_LOG_OTHER_READABLE_ENABLED=ON in the ascd.conf file.
To enable the read permission for others, set the value of this parameter to true.
- SPARK_EGO_LOGSERVICE_PORT
-
Specifies the UI port for the EGO log service.
- SPARK_EGO_MAX_SNAP_FILE_SIZE
-
Limits the maximum size of the snap file that is used to recover the application history. It is best practice to increase this value along with SPARK_DAEMON_MEMORY, which controls the total amount of memory that is allocated to the Spark master. Enter a positive integer, followed by any one of the following units:
- b for bytes. For example, 100b specifies a limit of 100 bytes.
- k for kibibytes. For example, 10k specifies 10 kibibytes.
- m for mebibytes. For example, 1m specifies one mebibyte.
- g for gibibytes. For example, 1g specifies one gibibyte.
- SPARK_EGO_SHARED_CONTEXT_MAX_JOBS
-
Specifies the number of concurrent jobs that can be started in a shared context. New job submissions that exceed this limit (default 100) are rejected. Valid value is a positive integer greater than 0.
- SPARK_EGO_SHARED_CONTEXT_POOL_SIZE
-
For shared
SparkContextapplications, specifies the size of the shared context pool, wherein a pool of pre-started shared contexts is maintained. When the context pool reaches this specified limit and new shared contexts are submitted, the least recently used context is stopped. Valid value is an integer that starts from 0. - SPARK_EGO_SHARED_RDD_WAIT_TIMEOUT
-
When a shared RDD is being created, this parameter specifies the duration that others must wait for the RDD to be created before the operation times out. Valid value is an integer greater than 0. During this RDD creation period (default 60000 milliseconds), other access to this RDD is blocked.
- SPARK_EGO_SHARED_PYTHON
-
For shared
SparkContextapplications, enables or disables support for Python applications to use the sharable RDD API. When enabled, shared RDDs can be reused by both Scala and Python applications. Valid values are true or false. - SPARK_EGO_SLOTS_REQUIRED_TIMEOUT
-
The time, in seconds, to wait for a Spark application to get the required number of slots, including CPU and GPU slots, before launching tasks. After this time, any slots that are held are released and the application fails.
- SPARK_EGO_STAGING_DIR
-
Specifies the staging directory for submission in a distributed file system (DFS). The staging directory is responsible for transferring .jar and data files.
- SPARK_EGO_SUBMIT_FILE_REPLICATION
-
Specifies the number of replicas that the uploaded .jar file uses. If not defined, the default setting of dfs.replication in the Hadoop core-site.xml file is used.
- spark.ssl.ego.gui.enabled
-
Specifies that the configured instance group uses HTTPS for its web UIs.
- spark.ssl.ego.workload.enabled
-
Specifies that the configured instance group uses SSL communication between the following Spark processes: Spark master and Spark driver, Spark driver and executor, and executor to the shuffle service.
Session scheduler settings
With session scheduling, an application does not connect to the resource orchestrator directly. Instead, it registers to the Spark master and acquires resources. The Spark master holds a connection to the resource orchestrator and consolidates resources from all applications. It then acquires resources from the resource orchestrator and offers these resources to different applications.
When you edit the configuration for an instance group, the following settings are available from the Session Scheduler drop-down list:
- SPARK_EGO_AUTOSCALE_GPU_SLOTS_PER_TASK
-
Specifies a comma-separated list of SPARK_EGO_GPU_SLOTS_PER_TASK values. When specified, the Spark master service is scaled up to accommodate (at a minimum) one service instance for the total number of values specified. This parameter does not affect the Spark notebook master instance. With no list specified (default), the SPARK_EGO_GPU_SLOTS_PER_TASK value takes effect for all Spark master service instances. A maximum of five values are supported.
- SPARK_EGO_APP_SCHEDULE_POLICY
-
Specifies the scheduling policy for the application. The following scheduling policies are supported:
- First-in, first-out scheduling (fifo): Meets the requirements of applications submitted earlier first.
- Fair-share scheduling (fairshare): Shares the entire resource pool among all submitted applications. Applications get their deserved number of resources according to their priority.
- Priority (priority): Meets the requirements of applications that are submitted with higher priority.
- SPARK_EGO_CONF_DIR_EXTRA
-
Specifies an optional directory containing an additional spark-env.sh configuration file to load on startup. Use this spark-env.sh file to set additional third-party environment variables that you require beyond available Spark environment variables.
- SPARK_EGO_DRIVER_RELAUNCH_MAX_TIMES
-
Sets the maximum number of attempts to relaunch a Spark driver in the failed or error state. Valid value is 1- 10.
- SPARK_EGO_EXECUTOR_IDLE_TIMEOUT
-
Specifies the duration (in seconds) that an executor stays alive when there is no workload on it. This variable requires SPARK_EGO_EXECUTOR_SLOTS_RESERVE to be defined.
- SPARK_EGO_EXECUTOR_SLOTS_MAX
-
Specifies the maximum number of tasks that can run concurrently in one Spark executor. To prevent the Spark executor process from running out of memory, define this variable only after you evaluate Spark executor memory and memory usage per task.
- SPARK_EGO_GPU_ADAPTIVE_ENABLE
-
Enables adaptive scheduling wherein tasks run first on a portion of GPU resources in the cluster. When GPU resources are no longer available, the remaining tasks run on CPU resources.
- SPARK_EGO_GPU_ADAPTIVE_EST_RATIO
-
When adaptive scheduling is enabled, specifies the estimated speedup ratio of CPU tasks to GPU tasks, based on the average duration of CPU tasks versus GPU tasks. Valid value is a positive integer, starting from 1.
- SPARK_EGO_GPU_ADAPTIVE_PERCENTAGE
-
When adaptive scheduling is enabled, specifies the maximum number of tasks in the GPU stage (as a percentage of total GPU tasks) that are transferred to CPU resources when GPU resources are no longer available. The number of transferred GPU tasks depend on a combination of factors and could be zero, less than, or equal to the percentage value of this parameter, but it does not exceed the specified value. Valid values are between 0 and 1 (including decimal digits).
- SPARK_EGO_GPU_EXECUTOR_SLOTS_MAX
-
For GPU scheduling, specifies the maximum number of GPU tasks that can run concurrently in one GPU executor.
- SPARK_EGO_EXECUTOR_SLOTS_RESERVE
-
Specifies the minimum number of slots to reserve once the executor is started. The number of executor slots is the minimum of the reserved number and the owned number. The timeout for the reserve slots is set by SPARK_EGO_EXECUTOR_IDLE_TIMEOUT. For Spark 1.5.2, look for this parameter under the Spark on EGO configuration group.
- SPARK_EGO_GPU_EXECUTOR_SLOTS_RESERVE
-
For GPU scheduling, specifies the minimum number of GPU slots to reserve once the executor is started. The number of GPU executor slots is the minimum of the reserved number and the owned number. The timeout for the reserve GPU slots is set by SPARK_EGO_EXECUTOR_IDLE_TIMEOUT.
- SPARK_EGO_PRIORITY
-
Specifies the priority of Spark driver and executor scheduling for the instance group. The valid range for scheduling priority is 1 to 10000, with 10000 being the highest priority.
- SPARK_EGO_RECLAIM_GRACE_PERIOD
-
Specifies the duration (in seconds) that the Spark master waits to reclaim resources from applications. If set to 0, the Spark driver kills any running tasks and returns resources at once to the Spark master. If set to (for example) 10, the Spark driver waits 10 seconds for running tasks to complete. After 10 seconds, it kills the tasks and returns the resources.
Valid value is an integer in the range 0 - 8640000.
- SPARK_EGO_SLOTS_MAX
-
Specifies the maximum number of slots that an application can get in Spark master mode.
- SPARK_EGO_GPU_SLOTS_MAX
-
For GPU scheduling, specifies the maximum number of slots that an application can get for GPU tasks in Spark master mode.
- SPARK_EGO_SLOTS_PER_TASK
-
For CPU scheduling, specifies the number of slots that are allocated to Spark tasks:
- To enable one CPU Spark task to run with multiple EGO slots, specify a positive integer that is greater than or equal to one (for instance, 1, 2, or 3 are valid values). As an example, setting SPARK_EGO_SLOTS_PER_TASK=2 means that each task can run on a maximum of two EGO slots.
- To enable multiple CPU Spark tasks to run with a single EGO slot, specify a negative integer that is less than -1 (for instance, -2, -3, or -4 are valid values).As an example, setting SPARK_EGO_SLOTS_PER_TASK=-2 means that there are two tasks running on the single EGO slot.
- SPARK_EGO_GPU_SLOTS_PER_TASK
-
For GPU scheduling, specifies the number of slots that are allocated to Spark tasks:
- To enable one GPU Spark task to run with multiple EGO slots, specify a positive integer that is greater than or equal to one (for instance, 1, 2, or 3 are valid values). As an example, setting SPARK_EGO_GPU_SLOTS_PER_TASK=2 means that each task can run on a maximum of two EGO slots.
- To enable multiple GPU Spark tasks to run with a single EGO slot, specify a negative integer that is less than -1 (for instance, -2, -3, or -4 are valid values). As an example, setting SPARK_EGO_GPU_SLOTS_PER_TASK=-2 means that there are two tasks running on the single EGO slot.
Modified Spark properties
The following Spark property configuration settings are modified in IBM® Spectrum Conductor from their configuration settings within the Spark community.
- spark.locality.wait
-
Specifies how long to wait to launch a data-local task before giving up and launching it on a less-local node. The same wait is used to step through multiple locality levels (process-local, node-local, rack-local and then any). For Spark on Mesos, if the locality does not match, the allocated resources can be returned. If possible, slots with better locality can be assigned on the next schedule. In IBM Spectrum Conductor, resources are not exchanged for better locality if the allocated slot number is greater than or equal to the demanded slot number.
- spark.shuffle.service.enabled
-
Enables or disables the external shuffle service. This service preserves the shuffle files that are written by executors so the executors can be safely removed. This property has two default configurations depending on your cluster configuration:
- When your cluster is installed to a local file system, spark.shuffle.service.enabled is set to true by default.
- When you cluster is installed to a shared file system (such as IBM Spectrum Scale), spark.shuffle.service.enabled is set to false by default. You can optionally enable the shuffle service when you create the instance group. To enable this service through the RESTful API, set this value to true and ensure that spark.local.dir is set to a directory in the shared file system.