Setting Docker/cgroup container definitions for an instance group

Optionally set up the instance group to run within containers, either Docker or control groups (cgroups).

Before you begin

  • You can only set Docker/cgroup container definitions for an instance group with certain Spark versions. Spark versions not supported: 1.5.2.
  • Based on your requirements, ensure that you meet the requirements to create an instance group. See Prerequisites for an instance group.
  • If you are using Docker, see the Docker image requirements and limitations section in Docker overview.

About this task

You can enable Spark drivers and executors in an instance group to run within Docker or cgroup containers. You can also run Spark services in Docker containers.
  • With Docker, you enable the Spark drivers, executors, and services of an instance group to run in separate Docker containers. A Docker container holds everything that an instance group needs to run, including an operating system, user-added files, metadata, and other dependencies.
  • With cgroups, you enable per-resource limits (such as CPU shares and memory) for Spark drivers and executors in an instance group. You can also set resource limits for Docker containers through Spark parameters so that Docker containers run within those CPU and memory bounds.

Procedure

  1. In the Containers tab, specify the container settings for Spark drivers, Spark executors, or Spark services in the instance group. By default, Spark drivers, Spark executors, and Spark services run without containers.
    • To run Spark drivers, executors, or services in Docker containers, click Docker containers.
    • To run Spark drivers and executors in cgroup containers, click cgroups.
  2. If you chose to run Spark drivers, executors, or services in Docker containers, create a definition for the Docker container.
    1. Click New definition.
    2. Enter the Definition name and the Docker image name.
    3. Optional: If the Docker image that you specified is not loaded on the Docker installation, enter a Docker registry URL. The URL of the public Docker registry (https://hub.docker.com/) is used by default.
      Note: If your host is not connected to the Internet, specify either the URL to an internal Docker registry that all hosts can access or load the image locally on all hosts.
    4. Optional: Add one or more host-level data volumes, which are mount points for the Docker container.
      1. Click Add a data volume.
      2. Enter the host path to the directory in the host OS for use inside the container, for example: /myHostpath/path.
      3. Enter the container path to the directory that is to be mounted inside the container, for example: /container/myHostpath/path.
      4. Enter the environment variable that must be exported from the host to the container. For example, you could use the DIR_ENV_VAR variable to represent a directory path, such as /container/myHostpath/path, enabling access to the mounted directory.
      5. Click Writable to specify read/write (rw) access to the directory inside the container.
      6. Repeat the steps to mount all the directories that your applications require to run in the container.
      Note: If VEMKD is enabled, the file path of the SSL authentication certificate must be mounted to Docker, and the instance group execution user must have access to that directory.
    5. Click OK.

      The definition that you created is used as the default definition for Docker containers.

    6. Repeat the steps to create as many definitions as you require. With multiple Docker definitions, you can override the default container definition that Spark drivers or executors use when you submit applications.
  3. Optional: Change the default per-resource limits for the container (both Docker and cgroup). You can change the default Spark version configuration to set memory limits for drivers and executors (by default, 1 GB). CPU shares are dynamically adjusted according to the number of slots that are allocated to the executor.
    1. Click the Basic Settings tab in the wizard.
    2. Click the Configuration link for the Spark version.
    3. To set memory limits for drivers and executors in the container, set the spark.driver.memory and spark.executor.memory parameters. Both parameters by default use 1 GB.
    4. Click Save.

Results

The instance group is set up to run in Docker or cgroup containers.

What to do next

  1. Optionally, to add any extra packages that the instance group requires, see Adding dependent packages. To add data connectors to the instance group, see Adding data connectors.
  2. Create and deploy the instance group.
    • Click Create and Deploy Instance Group to create the instance group and deploy its packages simultaneously. In this case, the new instance group appears on the Instance Groups page in the Ready state. Verify your deployment and then start the instance group.
    • Click Create Only to create the instance group but manually deploy its packages later. In this case, the new instance group appears on the Instance Groups page in the Registered state. When you are ready to deploy packages, deploy the instance group and verify the deployment. Then, start the instance group.