When your instances (instance groups, Anaconda distribution
instances, and application instances) are deployed to a local file system , the shuffle service is
enabled by default. When your instances are is deployed to a shared file system, the shuffle service
is disabled by default to increase application performance.
About this task
When your instances (instance groups, Anaconda distribution
instances, and application instances) are deployed to a shared file system, data that is
written to a shared file system does not need to be redistributed across the network. Therefore, the
shuffle service is not available by default. You can optionally enable the shuffle service to run on
each host in your cluster independent of your Spark applications and their executors. When the
shuffle service is enabled, Spark executors fetch shuffle files from the service instead of from
each other. Thus, any shuffle state that is written by an executor continues to be served beyond the
executor’s lifetime.
Procedure
- From the cluster management console, click .
- In
the Spark tab, click Configuration to define the
following Spark shuffle settings for the instance group:
| Spark parameter |
Description |
| spark.shuffle.service.port |
Define an exclusive port for use by the Spark shuffle service (default 7337). To ensure a
unique environment for each instance group, the default port number
increments by 1 for each instance group
that you subsequently create. So the shuffle service for your second instance group runs by default on port 7338, the
next on port 7339, and so on. However, when you create a instance group by using RESTful APIs, the
shuffle service uses port 7337 by default for all instance groups. Ensure that the shuffle
service port is not already used by the system or by processes external to IBM® Spectrum
Conductor. This port is not used on all
hosts that start the shuffle service; you need to only check hosts in the shuffle service resource
group.
|
| spark.local.dir |
When your
instances (instance groups, Anaconda
distribution instances, and application instances) are deployed to a local file system (with
the shuffle service enabled), specify a directory for scratch space (including map output
files and RDDs) that is on a fast, local disk in your system (default is /tmp).
It can also be a comma-separated list of multiple directories on different disks.
If spark.local.dir is specified to an existing directory, ensure that the
execution user for the instance group
and the execution user for the Spark shuffle service consumer have read/write/execute
(rwx) permissions for that directory.
If you change the default consumer for each component, the consumer execution user of the driver
and executor processes for any submitted applications, the batch Spark master service, and the notebook Spark master service must have execute
(x) permission to the spark.local.dir directory.
Note: With the shuffle service enabled, you must manually clean up data under the
spark.local.dir location when the instance group is removed.
|
| SPARK_EGO_LOGSERVICE_PORT |
Specify the UI port for the EGO log service (default 28082). To ensure a unique environment
for each instance group, the default
port number increments by 1 for each instance group that you subsequently create. So
the log service for your second instance group runs by default on port 28083,
the next on port 28084, and so on. However, when you create a instance group by using RESTful APIs, the log
service uses port 28082 by default for all instance groups. Ensure that the log service
UI port is not already used by the system or by processes external to IBM Spectrum
Conductor. This port is not used on all
hosts that start the log service; you need to only check hosts in the shuffle service resource
group.
|
-
If you want to change the default shuffle service consumer, click the consumer for the shuffle service, and
select one of the following options:
- To select an existing child consumer, click Select an existing child
consumer, select the consumer under the top-level consumer, and click
Select.
- To create a new consumer, click Create a new consumer under
top_level_consumer, enter a consumer name, and click
Create.
If you plan to use exclusive slot allocation, the shuffle service consumer and the Spark
executors consumer must be different.
-
If you want the shuffle service and executors to use different resource groups, in the Resource
Groups and Plans section, select a different resource group for the shuffle service. Both resource
groups must use the same set of hosts; otherwise, applications that are submitted to this instance group fail.
- Click Save.
- Click Modify Instance Group.
What to do next
Set up the SparkCleanup service to run on all hosts in the cluster.
- Edit the SparkCleanup service profile at
$EGO_CONFDIR/../../eservice/esc/conf/services/sparkcleanup_service.xml.
- Modify the consumer. Find the
<ego:ConsumerID> element and change it to use
the /ClusterServices/EGOClusterService consumer:
<ego:ConsumerID>/ClusterServices/EGOClusterServices</ego:ConsumerID>
- Modify the resource group: Find the
<ego:ResourceGroupName> element and
change it to use the InternalResourceGroup resource group:
<ego:ResourceGroupName>InternalResourceGroup</ego:ResourceGroupName>
- Modify the number of service instances: Find the
<sc:MaxInstances> element
and change it to use 5000
instances:<sc:MaxInstances>5000</sc:MaxInstances>
- Save the changes.
- From the command console, restart the
cluster:
egosh service stop all
egosh ego shutdown all
egosh ego start all