Configuring Spark clusters for automatic scaling

After installing Analytics Engine powered by Apache Spark, you must edit the Analytics Engine powered by Apache Spark custom resource to configure Spark clusters to scale automatically.

Updating the Analytics Engine custom resource

  1. Log in to the instance as an administrator.
  2. Use the following command and update the AE (Analytics Engine) custom resource (CR) to enable Spark to pre-pull Spark images.
    oc edit ae -n {PROJECT_CPD_INSTANCE}
  3. Add or update to include the parameter enableImagePrepull value as true in the CR spec.serviceConfig.
  4. Ensure the following entry exists:
    spec:
      serviceConfig:
        enableImagePrepull: true
    Important:

    Ensure that enableImagePrepull: true is set as a boolean (without quotes).

Monitoring custom resource reconciliation

After updating the Analytics Engine custom resource (CR), the pre-pull operator reconciles automatically.

Run the following command to monitor the CR status. Ensure it reaches the Completed state.
oc get ae -n {PROJECT_CPD_INSTANCE} -w
The image pre-pull is successfully enabled when the status becomes Completed.

Configuring nodes at engine or application level

When the AE Custom Resource (CR) reaches the Completed state, the operator pulls all required container images onto the cluster nodes and labels each node with the following label:
"spark-image-prepulled.release-v1.0.0":"true"
Add the following configuration into the engine Default Spark configuration section to direct Analytics Engine powered by Apache Spark to schedule Spark runtimes only on nodes that have the new label.
ae.kubernetes.spec.nodeSelector=<encoded-string>
<encoded-string> is the base64 encoded value for {"spark-image-prepulled.release-v1.0.0":"true"}.
Use the following command to use multiple nodes.
ae.kubernetes.spec.nodeSelector=eyJub2RlLmt1YmVybmV0ZXMuaW8vaW5zdGFuY2UtdHlwZSI6ICJTdGFuZGFyZF9MNjRzX3YzIiwgInNwYXJrIjogInYxLjAuMCJ9
where the nodes are schedule on nodes with labels {"node.kubernetes.io/instance-type":"Standard_L64s_v3","spark-image-prepulled.release-v1.0.0":"true"}

Running Spark application

After enabling Spark and configuring the nodes, you can run the Spark application. For more information about different ways of submitting an application, see Submitting Spark application.