Modify the configuration for an instance group, including Spark version
settings, notebook configuration, and container definitions.
Before you begin
- You must be a cluster administrator, consumer administrator, or have the Instance Groups Configure permission. To
change the user name that is used to request resources for the instance group, you must be a cluster
administrator or have the Services Assign Impersonation (Any User) permission.
- The instance group must be in the
Registered, Ready, Register
Error, or Deploy Error state. If the instance group is running workload, stop the
instance group and all associated
notebooks before you change its configuration. See Stopping instance groups and Stopping notebooks in an instance group.
- To add a notebook or change the properties of an existing notebook, the notebook must be added
to the cluster before you modify the
instance group. See Notebooks.
- To add a package that the instance group requires to run, the package must
be created and ready for upload. See Creating dependent packages. If
you want to upload the package directly to the repository and select it when modifying the instance group, the package must exist in the
repository. See Adding packages to the Service Packages repository.
As a convenience, you can also create a package from a
single-file.
- If you are using a recovery directory that is pointing to a shared file system on NFS version 4
and you want to change the consumers for the Spark masters to have a different execution
user, then you must delete the instance group folders that are under the
recovery directory before you modify the instance group.
Procedure
-
From the cluster management console, click .
-
Select the instance group to
modify and click Configure. If required, click the all
consumers link to filter instance groups by consumer.
-
If prompted to update Spark version and notebook packages, click
Continue to apply these updates when the instance group is modified. When updated Spark
version and notebook packages are added to the cluster, those packages must be deployed for the
instance group.
Note: For updated notebook packages, the notebook is undeployed and the
new version is deployed. Therefore, if you specified the notebook base data directory to be either
in the same location or under the notebook's deployment directory, the base data directory is
removed. To retain your data, manually back up the contents of the base data directory before you
update the instance group
configuration.
-
On the Basic Settings tab, modify instance group settings as required. For a
description of each field, hover over the field to view a tooltip.
You can modify almost all settings for an instance group, including Spark version
parameters, high availability, and the duration to store application monitoring data. However, you
cannot change the instance group
name.
If you want to disable SSL for the instance group, change the Encryption settings
for the Spark version and select Disable SSL. Selected, this instance group will not use SSL for workload and
Spark UIs.
You can also add or remove a notebook and update the parameters for an existing notebook in the
instance group. To modify the base
notebook, including its Docker properties, the notebook with the required changes must be added as a
new notebook at the cluster level. Then, when you modify the instance group, select that notebook as you
would a new notebook and remove the one that you no longer required.
Important: Take note of the following considerations:
- When you change the execution user:
- If you are not modifying the administrator user group, the new user must belong to the
administrator user group of the instance group.
- Changing the execution user of an instance group does not change the execution
user for existing consumers. If the execution user for any of the consumers in the instance group is not a member of the instance group's administrator user group, the
execution user of a consumer cannot be changed after the consumer is created. Therefore, to change
the execution user of an instance group's consumers, either create consumers with the correct execution user and modify the instance group to select them, or modify the
instance group and select the option to
Create a new consumer under the instance group top consumer for each
consumer (which will cause a new consumer with the new instance group execution user to be
created).
- Changing the execution user/administrator group of an instance group does not automatically change
the execution user/administrator group of notebooks. If you want to change this for notebooks also,
you must explicitly modify those values in the notebook while configuring the instance group.
- When you disable the history server or high availability, the old directory (along with its data) is
removed. If you want to retain your data, manually move data in the old directory to the new
directory before you update the instance group configuration.
- Optional:
Click the Containers tab to modify container definitions, including
enabling or disabling Docker and cgroup limits. You can modify all container settings for Spark
drivers, executors, and services in the instance group.
Note: The Containers tab is only available with certain Spark versions.
Spark versions not supported: 1.5.2.
- Optional:
Click the Packages tab to modify packages that are used by the instance group.
You can add application-specific packages that must be deployed with Spark to the
instance group hosts. If the package uses the
same name as an existing repository package, the package in the repository is replaced. You can add
packages in several ways:
- To add packages from the central repository, click Add packages from
Repository. All packages that are registered to the instance group consumer (and its children in the
consumer tree) are available for selection.
- To add a package from your local computer and upload it to the central repository, click
Upload Package.
- To create a single-file package, deploy it as a package, and upload it to the central
repository, click Create Single-File Packages.
Once uploaded, you can also set optional configuration for packages. Note that the optional
$DEPLOY_HOME and $SPARK_HOME check box options are
clear by default for packages that are already added to the hosts in the instance group. For details on these check boxes
and other optional configuration, see Adding dependent packages.
You can also remove any
packages that the instance group no
longer requires. Removing a package from the instance group does not remove the package from
the repository.
-
Click the Data Connectors tab to add data connectors, enable or disable existing
data connectors, or remove any
existing data connectors for the
instance group. You can also modify the
configuration settings for an existing data connector, and set the data connector that specifies
fs.defaultFS in the Hadoop configuration file.
-
Click Modify Instance Group.
Results
Based on your changes, the Spark package (along with other packages and notebooks) for the
instance group is redeployed. If
updated Spark version or notebook packages are available in the cluster, the updated versions are
deployed for the instance group. If
any notebooks were removed from the instance group, those notebooks (and associated
packages) are undeployed from the hosts.
What to do next
- If you updated the configuration to point to new directories, manually copy data from the old
directories in order to retain that data. This step is required especially for changes to the
notebook base data directory the event log directory for the history server, or the recovery
directory for high availability. If high availability is not configured, metadata is stored using
the ELK_HARVEST_LOCATION environment variable in the default /var/tmp/elk_logs
location.
To manually move data in the old directory (with the original file permissions) to the
new directory, run the following command before you start the instance group:
sudo cp -a old-dir/*uuid* new-dir/
Copy
the data before the instance group is
started but after its configuration is changed; otherwise, you may lose data.
- Start the instance group. See Starting instance groups.