Tuning parameters for shared Spark batch applications
Tune shared Spark batch application settings to prevent loss of RDD block data and to set the size of the shared Spark context pool.
Before you begin
- You can tune only shared Spark batch application settings with certain Spark versions. Spark versions not supported: 1.5.2 and 3.0.0.
- You must be a cluster administrator, consumer administrator, or have the Spark Instance Groups Configure permission to modify the configuration of the shared Spark batch application's instance group.
About this task
You might also want to configure the size of the shared context pool, wherein a pool of prestarted contexts is maintained. Maintaining a pool of prestarted contexts gives you instant access to an existing context. When the number of shared contexts in the pool reaches a specified limit and new shared contexts are submitted, the shared context that is the least recently used is stopped.
To specify a longer timeout for executors or to set the shared context pool size, you must modify the configuration of the instance group to which the shared Spark batch application is submitted. Optionally, you can also change the defaults for the number of concurrent jobs that can be started in a shared context and the timeout duration for RDD creation.
The sharable RDD API provides a data caching layer, wherein the shared RDD data is computed once and cached for reuse. To avoid losing RDD cached block data, you can set a larger memory amount for Spark executors. For information on Spark memory tuning, click the Spark Documentation link when you are configuring the Spark version and then search for the Memory Management Overview in the Tuning Guide.