Modifying instance groups

Modify the configuration for an instance group, including Spark version settings, notebook configuration, and container definitions.

Before you begin

  • You must be a cluster administrator, consumer administrator, or have the Instance Groups Configure permission. To change the user name that is used to request resources for the instance group, you must be a cluster administrator or have the Services Assign Impersonation (Any User) permission.
  • The instance group must be in the Registered, Ready, Register Error, or Deploy Error state. If the instance group is running workload, stop the instance group and all associated notebooks before you change its configuration. See Stopping instance groups and Stopping notebooks in an instance group.
  • To add a notebook or change the properties of an existing notebook, the notebook must be added to the cluster before you modify the instance group. See Notebooks.
  • To add a package that the instance group requires to run, the package must be created and ready for upload. See Creating dependent packages. If you want to upload the package directly to the repository and select it when modifying the instance group, the package must exist in the repository. See Adding packages to the Service Packages repository. As a convenience, you can also create a package from a single-file.
  • If you are using a recovery directory that is pointing to a shared file system on NFS version 4 and you want to change the consumers for the Spark masters to have a different execution user, then you must delete the instance group folders that are under the recovery directory before you modify the instance group.

Procedure

  1. From the cluster management console, click Workload > Instance Groups.
  2. Select the instance group to modify and click Configure. If required, click the all consumers link to filter instance groups by consumer.
  3. If prompted to update Spark version and notebook packages, click Continue to apply these updates when the instance group is modified. When updated Spark version and notebook packages are added to the cluster, those packages must be deployed for the instance group.
    Note: For updated notebook packages, the notebook is undeployed and the new version is deployed. Therefore, if you specified the notebook base data directory to be either in the same location or under the notebook's deployment directory, the base data directory is removed. To retain your data, manually back up the contents of the base data directory before you update the instance group configuration.
  4. On the Basic Settings tab, modify instance group settings as required. For a description of each field, hover over the field to view a tooltip.

    You can modify almost all settings for an instance group, including Spark version parameters, high availability, and the duration to store application monitoring data. However, you cannot change the instance group name.

    If you want to disable SSL for the instance group, change the Encryption settings for the Spark version and select Disable SSL. Selected, this instance group will not use SSL for workload and Spark UIs.

    You can also add or remove a notebook and update the parameters for an existing notebook in the instance group. To modify the base notebook, including its Docker properties, the notebook with the required changes must be added as a new notebook at the cluster level. Then, when you modify the instance group, select that notebook as you would a new notebook and remove the one that you no longer required.

    Important: Take note of the following considerations:
    • When you change the execution user:
      • If you are not modifying the administrator user group, the new user must belong to the administrator user group of the instance group.
      • Changing the execution user of an instance group does not change the execution user for existing consumers. If the execution user for any of the consumers in the instance group is not a member of the instance group's administrator user group, the execution user of a consumer cannot be changed after the consumer is created. Therefore, to change the execution user of an instance group's consumers, either create consumers with the correct execution user and modify the instance group to select them, or modify the instance group and select the option to Create a new consumer under the instance group top consumer for each consumer (which will cause a new consumer with the new instance group execution user to be created).
      • Changing the execution user/administrator group of an instance group does not automatically change the execution user/administrator group of notebooks. If you want to change this for notebooks also, you must explicitly modify those values in the notebook while configuring the instance group.
    • When you disable the history server or high availability, the old directory (along with its data) is removed. If you want to retain your data, manually move data in the old directory to the new directory before you update the instance group configuration.
  5. Optional: Click the Containers tab to modify container definitions, including enabling or disabling Docker and cgroup limits. You can modify all container settings for Spark drivers, executors, and services in the instance group.
    Note: The Containers tab is only available with certain Spark versions. Spark versions not supported: 1.5.2.
  6. Optional: Click the Packages tab to modify packages that are used by the instance group.
    You can add application-specific packages that must be deployed with Spark to the instance group hosts. If the package uses the same name as an existing repository package, the package in the repository is replaced. You can add packages in several ways:
    • To add packages from the central repository, click Add packages from Repository. All packages that are registered to the instance group consumer (and its children in the consumer tree) are available for selection.
    • To add a package from your local computer and upload it to the central repository, click Upload Package.
    • To create a single-file package, deploy it as a package, and upload it to the central repository, click Create Single-File Packages.

    Once uploaded, you can also set optional configuration for packages. Note that the optional $DEPLOY_HOME and $SPARK_HOME check box options are clear by default for packages that are already added to the hosts in the instance group. For details on these check boxes and other optional configuration, see Adding dependent packages.

    You can also remove any packages that the instance group no longer requires. Removing a package from the instance group does not remove the package from the repository.

  7. Click the Data Connectors tab to add data connectors, enable or disable existing data connectors, or remove any existing data connectors for the instance group. You can also modify the configuration settings for an existing data connector, and set the data connector that specifies fs.defaultFS in the Hadoop configuration file.
  8. Click Modify Instance Group.

Results

Based on your changes, the Spark package (along with other packages and notebooks) for the instance group is redeployed. If updated Spark version or notebook packages are available in the cluster, the updated versions are deployed for the instance group.

If any notebooks were removed from the instance group, those notebooks (and associated packages) are undeployed from the hosts.

What to do next

  1. If you updated the configuration to point to new directories, manually copy data from the old directories in order to retain that data. This step is required especially for changes to the notebook base data directory the event log directory for the history server, or the recovery directory for high availability. If high availability is not configured, metadata is stored using the ELK_HARVEST_LOCATION environment variable in the default /var/tmp/elk_logs location.

    To manually move data in the old directory (with the original file permissions) to the new directory, run the following command before you start the instance group:

    sudo cp -a old-dir/*uuid* new-dir/

    Copy the data before the instance group is started but after its configuration is changed; otherwise, you may lose data.

  2. Start the instance group. See Starting instance groups.