Memory and CPU configuration options
You can configure a variety of memory and CPU options within Apache Spark, IBM® Java™, and z/OS®.
Apache Spark configuration options
There are two major categories of Apache Spark configuration options: Spark properties and environment variables.
- Directly on a SparkConf passed to your SparkContext
- At run time, as command line options passed to spark-shell and spark-submit
- In the spark-defaults.conf properties file
In addition, you can configure certain Spark settings through environment variables, which are read from the $SPARK_CONF_DIR/spark-env.sh script.
Table 1 through Table 4 summarize some of the Spark properties and environment variables that control memory and CPU usage by Spark applications.1,2
Environment variable | Default value | Description |
---|---|---|
SPARK_WORKER_MEMORY | Total memory on host system minus 1 GB | Total amount of memory that a worker can give to executors. Note: This value should equal the
product of
spark.executor.memory times the number of executor
instances. |
SPARK_DAEMON_MEMORY | 1 GB | Amount of memory to allocate to the Spark master and worker daemons. |
Spark property | Default value | Description |
---|---|---|
spark.driver.memory |
1 GB | Amount of memory to use for the driver process (that is, where SparkContext is initialized).
Note: In client mode, this property must not be set through the SparkConf directly in your
application. Instead, set this through the --driver-memory command line option
or in your default properties file.
|
spark.driver.maxResultSize |
1 GB | Limit of the total size of serialized results of all partitions for each Spark action (for
instance, collect). Note: Jobs will fail if the size of the results is above this limit. Having a
high limit may cause out-of-memory errors in the driver.
|
spark.executor.memory |
1 GB | Amount of memory to use per executor process. Note: This sets the heap size of the executor
JVM.
|
spark.memory.fraction |
0.6 | Fraction of (heap space - 300 MB) used for execution and storage. The lower this value is, the more frequently spills and cached data eviction occur. The purpose of this property is to set aside memory for internal metadata, user data structures, and imprecise size estimation in case of sparse, unusually large records. |
spark.memory.storageFraction |
0.5 | Amount of storage memory that is immune to eviction, expressed as a fraction of the size of
the region set aside by spark.memory.fraction . The higher this value is, the less
working memory may be available to execution and tasks may spill to disk more often. |
spark.memory.offHeap.enabled |
false | If set to true , Spark attempts to use off-heap memory for certain
operations. If off-heap memory use is enabled, spark.memory.offHeap.size must be
positive. |
spark.memory.offHeap.size |
0 | The absolute amount of memory, in bytes, that can be used for off-heap allocation. This
setting has no impact on heap memory usage, so if your executors' total memory consumption must fit
within some hard limit, be sure to shrink the JVM heap size accordingly. This must be set to a
positive value when spark.memory.offHeap.enabled is set to
true . |
Environment variable | Default value | Description |
---|---|---|
SPARK_WORKER_CORES | All cores on host system | Number of cores to use for all executors. Note: This value should equal the product of
spark.executor.cores times the number of executor instances. |
Spark property | Default value | Description |
---|---|---|
spark.deploy.defaultCores |
Infinite | Default number of cores to give to applications if they do not set a
spark.cores.max value. |
spark.cores.max |
(Not set) | Maximum number of cores to give to an application across the entire cluster. If not set, the
default is the spark.deploy.defaultCores value. |
spark.driver.cores |
1 | Number of cores to use for the driver process, only in cluster mode. |
spark.executor.cores |
All cores on host system | Number of cores to use on each executor. Setting this parameter allows an application to run multiple executors, provided that there are enough cores on the worker (SPARK_WORKER_CORES); otherwise, only one executor per application runs on each worker. |
Spark property | Default value | Description |
---|---|---|
spark.zos.master.app.alwaysScheduleApps |
false | Use this property to enable the Spark Master to start applications, even when the current applications appear to be using all the CPU and memory. This enables the system to start and then manage the resources across the applications, according to the WLM policy. |
spark.zos.maxApplications |
5 | When spark.zos.master.app.alwaysScheduleApps is set to true, specifies the
maximum number of applications that can be scheduled to run concurrently. This value should be set
based on resources available according to the current WLM policy. |
spark.dynamicAllocation.enabled |
false | Specifies the Spark Master can request that the Spark Worker add or remove executors based on the workload and available resources. |
spark.shuffle.service.enabled |
false | Specifies that the worker should start the Spark Shuffle service, which allows executors to
transfer (shuffle) data across executors, as needed. This is required when
spark.dynamicAllocation.enabled is true. |
For more information about the Apache Spark configuration options, see http://spark.apache.org/docs/2.4.8/configuration.html. For more information about tuning the Apache Spark cluster, see http://spark.apache.org/docs/2.4.8/tuning.html.
IBM Java configuration options
In addition to the Apache Spark settings, you can use IBM JVM runtime options to manage resources used by Apache Spark applications.
- IBM_JAVA_OPTIONS
- Use this environment variable to set generic JVM options.
- SPARK_DAEMON_JAVA_OPTS
- Use this environment variable to set additional JVM options for the Apache Spark master and worker daemons.
spark.driver.extraJavaOptions
- Use this Apache Spark property to set additional JVM options for the Apache Spark driver process.
spark.executor.extraJavaOptions
- Use this Apache Spark property to set additional JVM options for the Apache Spark executor process. You cannot use this option to set Spark properties or heap sizes.
Table 6 summarizes some of the IBM JVM runtime options that control resource usage.
Option | Default value | Description |
---|---|---|
-Xmx | Half of the available memory, with a minimum of 16 MB and a maximum of 512 MB | Maximum heap size. Note: Set Spark JVM heap sizes via Spark properties (such as
spark.executor.memory ), not via Java
options. |
-Xms | 4 MB | Initial heap size. |
-Xss | 1024 KB | Maximum stack size. |
-Xcompressdrefs | Enabled by default when the value of the -Xmx option is less than or equal to 57 GB | Enables the use of compressed references. |
-Xlp | 1M pageable pages, when available, are the default size for the object heap and the code cache. If -Xlp is specified without a size, 1M non-pageable is the default for the object heap. | Requests that the JVM allocate the Java object heap
using large page sizes (1M or 2G). Instead of -Xlp, you can use
-Xlp:codecache and -Xlp:objectcache to set the JIT code
cache and object heap separately. Note: There is a limit to the number of large pages
that are available on a z/OS system. Consult your system administrator before changing this
setting.
|
-Xmn | (Not set) | Sets the initial and maximum size of the new area to the specified value when using the -Xgcpolicy:gencon option. |
For more information about IBM Java on z/OS, see IBM SDK, Java Technology Edition z/OS User Guide.
z/OS configuration parameters
The
z/OS operating system provides additional configuration
options that allow the system administrator to ensure that no workloads use more resources than they
are allowed. Be sure that any application-level configuration does not conflict with the z/OS system settings. For example, the executor JVM will not
start if you set spark.executor.memory=4G
but the MEMLIMIT parameter for the user
ID that runs the executor is set to 2G.
- Use the ALTUSER RACF® command to modify the resource settings in the OMVS segment of the security profile for the user ID under which Apache Spark runs.
- Use the IEFUSI exit, which receives control before each job step starts.
- Set the system-wide defaults in the appropriate SYS1.PARMLIB member.
Table 7 summarizes some of the z/OS configuration options that might be relevant to your Open Data Analytics for z/OS environment.
Parameter | Default value | Description | Where to set |
---|---|---|---|
MEMLIMIT | 2 GB | Amount of virtual storage above the bar that an address space is allowed to use. Note: Set the MEMLIMIT for the Spark user ID to the largest JVM heap size (executor memory
size) plus the amount of native memory that Spark processing might require (typically, at least 2
GB).
|
|
MAXMMAPAREA | 40960 | Maximum amount of data space storage, in pages, that can be allocated for memory mapping of z/OS UNIX files. |
|
MAXASSIZE | 200 MB | Maximum address space size for a new process. Note: The JVM will fail to start if the
MAXASSIZE for the Spark user ID is too small. The recommended minimum for Java is 2,147,483,647 bytes.
|
|
MAXCPUTIME | 1000 | Maximum CPU time, in seconds, that a process can use. |
|
MAXPROCESSOR | 25 | Maximum number of concurrently active processes for a single z/OS UNIX user ID. |
|
For more information about z/OS parmlib members, see z/OS MVS Initialization and Tuning Reference.