Configuring Spark clusters for automatic scaling
After installing Analytics Engine powered by Apache Spark, you must edit the Analytics
Engine powered by Apache Spark custom resource to configure Spark clusters to scale
automatically.
Updating the Analytics Engine custom resource
- Log in to the instance as an administrator.
- Use the following command and update the AE (Analytics Engine) custom resource (CR) to enable
Spark to pre-pull Spark images.
oc edit ae -n {PROJECT_CPD_INSTANCE} - Add or update to include the parameter
enableImagePrepullvalue astruein the CRspec.serviceConfig. - Ensure the following entry exists:
spec: serviceConfig: enableImagePrepull: trueImportant:Ensure that
enableImagePrepull: trueis set as a boolean (without quotes).
Monitoring custom resource reconciliation
After updating the Analytics Engine custom resource (CR), the pre-pull operator reconciles automatically.
Run the following command to monitor the CR status. Ensure it reaches the Completed
state.
oc get ae -n {PROJECT_CPD_INSTANCE} -wThe image pre-pull is
successfully enabled when the status becomes Completed.Configuring nodes at engine or application level
When the AE Custom Resource (CR) reaches the Completed state, the
operator pulls all required container images onto the cluster nodes and labels each node with the
following label:
"spark-image-prepulled.release-v1.0.0":"true"Add the following configuration into the engine
Default Spark configuration
section to direct Analytics Engine powered by Apache Spark to schedule Spark runtimes only on nodes
that have the new
label.ae.kubernetes.spec.nodeSelector=<encoded-string><encoded-string>
is the base64 encoded value for
{"spark-image-prepulled.release-v1.0.0":"true"}.Use the following command to use multiple
nodes.
ae.kubernetes.spec.nodeSelector=eyJub2RlLmt1YmVybmV0ZXMuaW8vaW5zdGFuY2UtdHlwZSI6ICJTdGFuZGFyZF9MNjRzX3YzIiwgInNwYXJrIjogInYxLjAuMCJ9
where the nodes are schedule on nodes with labels
{"node.kubernetes.io/instance-type":"Standard_L64s_v3","spark-image-prepulled.release-v1.0.0":"true"}
Running Spark application
After enabling Spark and configuring the nodes, you can run the Spark application. For more information about different ways of submitting an application, see Submitting Spark application.