Configuring networking for Apache Spark
Complete this task to configure the port access and other networking customization that Apache Spark requires.
About this task
Apache Spark makes heavy use of the network for communication between various processes, as shown in Figure 1.
These ports are further described in Table 1 and Table 2, which list the ports that Spark uses, both on the cluster side and on the driver side.
Port name | Default port number | Configuration property* | Notes |
---|---|---|---|
Master web UI | 8080 | spark.master.ui.port or SPARK_MASTER_WEBUI_PORT |
The value set by the spark.master.ui.port property takes precedence. |
Worker web UI | 8081 | spark.worker.ui.port or SPARK_WORKER_WEBUI_PORT |
The value set by the spark.worker.ui.port takes precedence. |
History server web UI | 18080 | spark.history.ui.port |
Optional; only applies if you use the history server. |
Master port | 7077 | SPARK_MASTER_PORT | SPARK_MASTER_PORT (or the default, 7077) is the starting point
for connection attempts and not the actual port that might be connected. In addition, the value can
be 0, which means it uses a random port number. Therefore, SPARK_MASTER_PORT (or
the default, 7077) might not be the port that is used for the master. This statement is true for all
methods of starting the master, including BPXBATCH, the start*.sh scripts, and
the started task procedure. Note: You should not choose 0 for
SPARK_MASTER_PORT if you intend to use client authentication. |
Master REST port | 6066 | spark.master.rest.port |
Not needed if the REST service is disabled. If you wish to use the REST service and are planning to use authentication, you should configure AT-TLS port authentication for this port. Note that as of APAR PH03469, it is disabled by default. |
Worker port | (random) | SPARK_WORKER_PORT | |
Block manager port | (random) | spark.blockManager.port |
|
External shuffle server | 7337 | spark.shuffle.service.port |
Optional; only applies if you use the external shuffle service. |
PySpark daemon | (random) | spark.python.daemon.port |
Optional; only applies if you use PySpark. APAR PI98042 (for Spark 2.2.0) is required to use this property. |
Port name | Default port number | Configuration property* | Notes |
---|---|---|---|
Application web UI | 4040 | spark.ui.port |
|
Driver port | (random) | spark.driver.port |
|
Block manager port | (random) | spark.blockManager.port |
The value set by the spark.driver.blockManager.port property
takes precedence |
Driver block manager port | (Value of spark.blockManager.port ) |
spark.driver.blockManager.port |
If spark.driver.blockManager.port is not set, the
spark.blockManager.port configuration is used. |
*The Spark properties in the Configuration property column can either be set in the spark-defaults.conf file (if listed in lower case) or in the spark-env.sh file (if listed in upper case).
Spark must
be able to bind to all the required ports. If Spark cannot bind to a specific
port, it tries again with the next port number. (+1). The maximum number of retries is controlled
by the spark.port.maxRetries
property (default: 16) in the
spark-defaults.conf file.
spark.port.maxRetries
property is at default (16), here
are a few examples:- If the Spark application web UI is enabled, which it is by default, there can be no more than 17 Spark applications running at the same time, due to the 18th Spark driver process will fail to bind to an Application UI port.
- When both
spark.blockManager.port
andspark.driver.blockManager.port
are set, there can be no more than 17 executor processes running at the same time, because the 18th executor process will fail to bind to a Block manager port. - When
spark.blockManager.port
is set butspark.driver.blockManager.port
is not set, the combined total of executor and driver processes cannot exceed 17, as the 18th process will fail to bind to a Block manager port.
Careful consideration is needed and you may need to increase the
spark.port.maxRetries
value if you are going to run multiple Spark applications at the same time,
and/or planning to utilize a high number of executors within the cluster simultaneously.
Procedure
What to do next
Continue with Configuring z/OS Spark client authentication.