Managing service and instance level configurations for Analytics Engine Powered by Apache Spark

After installing the Analytics Engine Powered by Apache Spark service, the IBM Cloud Pak for Data administrator can configure certain service level defaults by editing or adding the following properties in Analytics Engine custom resource, for example to update the maximum cpu-core and memory values for the Spark driver and executors. The administrator can also change the maximum workers that can be requested by a Spark application.

When a Spark application job is submitted, a Spark runtime is started and each Spark worker runs in one executor. Thus the number of executors must always match the number of workers requested in the Spark application payload.

To view the current service level configurations, use the following command:

kubectl get configmap spark-hb-resource-limit -o yaml -n ${PROJECT_CPD_INSTANCE}

Expected output:

apiVersion: v1
data:
  resource-limit-properties: |-
    max_driver_cpu_cores=5
    max_executor_cpu_cores=5
    max_driver_memory=40g
    max_executor_memory=40g
    max_num_workers=50
    default_instance_cpu_quota=20
    default_instance_memory_quota=80
kind: ConfigMap

Updating service level configurations

To change the service level default configurations:

  1. Log in to the Cloud Pak for Data cluster.

  2. Update the respective property in the Analytics Engine custom resource (CR) YAML file that was used to set up Analytics Engine Powered by Apache Spark. See Additional installation options. Then apply the changes to an existing deployed CR using the following command:

    oc apply -f analyticsengine-cr.yaml -n ${PROJECT_CPD_INSTANCE}
    
  3. Wait for the Analytics Engine CR to be in Completed state:

    oc get analyticsengine -n ${PROJECT_CPD_INSTANCE}
    

The configuration changes take effect in a few minutes. All Spark applications that are subsequently submitted by a user will use the changed configuration values.

Updating service instance level configurations

When a project administrator creates an Analytics Engine Powered by Apache Spark instance, the default resource quota for CPU and memory usage applies to each instance.

Although the project administrator can change the default resource quota at the time the instance is created, it can be changed once at the service instance level by the IBM Cloud Pak for Data administrator and apply to all subsequently created instances.

To change default_instance_cpu_quota and default_instance_memory_quota at the service instance level:

  1. Log in to the Cloud Pak for Data cluster.

  2. Update the default_instance_cpu_quota and default_instance_memory_quota properties in the Analytics Engine CR YAML file that was used to set up Analytics Engine Powered by Apache Spark. See Additional installation options. Then apply the changes to an existing deployed CR using the following command:

    oc apply -f analyticsengine-cr.yaml -n ${PROJECT_CPD_INSTANCE}
    
  3. Wait for the Analytics Engine CR to be in Completed state:

    oc get analyticsengine -n ${PROJECT_CPD_INSTANCE}
    

    The Analytics Engine Powered by Apache Spark instances that are subsequently created by project administrators will use the changed configuration values.

Parent topic: Administering Analytics Engine Powered by Apache Spark