Tuning Spark application tasks

For a Spark application, a task is the smallest unit of work that Spark sends to an executor. Monitoring tasks in a stage can help identify performance issues. To view detailed information about tasks in a stage, click the stage's description on the Jobs tab on the application web UI.

A task's execution time can be broken up as Scheduler Delay + Deserialization Time + Shuffle Read Time (optional) + Executor Runtime + Shuffle Write Time (optional) + Result Serialization Time + Getting Result Time. Tuning these aspects can help optimize performance.

Scheduler Delay

Spark relies on data locality and tries to execute tasks as close to the data as possible to minimize data transfer. Task location can either be a host or a pair of a host and an executor. If an available executor does not satisfy its data locality, it keeps waiting until a timeout is reached. To control this timeout, use the spark.locality.wait parameter. For tasks where data is read from a distributed file system, the locality level could significantly impact data transfer time. In this case, configure a longer time to wait for better locality.

To view scheduler delay, on the Stages of the application web UI, click the Show Additional Metrics link and select Scheduler Delay to include this metric in the summary table. You can also use the Event Timeline link to visualize the timeline of tasks in a stage. Gaps between tasks demonstrate scheduler delays.

Task Deserialization Time

Spark by default uses the Java serializer for object serialization. To enable Kyro serializer, which outperforms the default Java serializer on both time and space, set the spark.serializer parameter to org.apache.spark.serializer.KryoSerializer.

Shuffle Read Time and Shuffle Write Time

Data shuffle negatively impacts application performance, so minimizing the amount of shuffle reads and writes can be helpful. If the Shuffle Read Time and the Shuffle Write Time is high, it indicates network-intensive workload. In this case, check the LOCALITY LEVEL of the stage. Tasks with locality level PROCESS_LOCAL are typically fast. If the number of RACK_LOCAL or even ANY locality levels is high, extend the spark.locality.wait parameter value to avoid shuffles over the network. You can also change to a faster serializer or enable compacting to reduce the data size being shuffled, thus reducing the duration for shuffle data read/write operations.

Executor Running Time

The executor running time consists of the following parts: data read/write time from and to the file system, CPU execution time, and Java garbage collector (GC) time.

For data read/write, Spark tries to place intermediate files in local directories. To control the location of these directories, set the spark.local.dir parameter to a local disk, instead of a network disk, for best performance.
For Java GCs, use the Show Additional Metrics to check GC Time from the application web UI. If the proportion is high, tune Java GC to optimize performance. For information on tuning GC behavior, see the following resources:
- Oracle GC tuning document.
- Best Practices for Tuning Java 5.0 Garbage Collection on IBM Platforms. Though the paper discusses best practices for Java 5.0, the methodologies can be generalized.