Specifying additional configurations for Analytics Engine powered by Apache Spark

An instance administrator can specify additional configurations for Analytics Engine powered by Apache Spark on IBM Cloud Pak for Data.

You can set additional configurations, other than the default ones, as part of post-install steps.

Service-level configurations

The following specifications are optional and can be altered to change the service level configurations:

serviceConfig:
  schedulerForQuotaAndQueuing: "ibm-cpd-scheduler"
  sparkAdvEnabled: true                  
  jobAutoDeleteEnabled: true             
  kernelCullTime: 30                     
  imagePullCompletions: 20               
  imagePullParallelism: "40"             
  kernelCleanupSchedule: "*/30 * * * *"  
  jobCleanupSchedule: "*/30 * * * *"     
  skipSELinuxRelabeling: false           
  mountCustomizationsFromCCHome: false   
Property Description
schedulerForQuotaAndQueuing Set the scheduler for the resource quota and queuing features.
Valid values
ibm-cpd-scheduler
This is the supported scheduler.
analyticsengine_spark_adv_enabled Specify whether to display the job UI.
Default value
true
Valid values
false
Do not display the job UI.
true
Display the job UI.
analyticsengine_job_auto_delete_enabled Specify whether to automatically delete jobs after they reach a terminal state, such as FINISHED or FAILED. The default is true.
Default value
true
Valid values
true
Delete jobs after they reach a terminal state.
false
Retain jobs after they reach a terminal state.
analyticsengine_kernel_cull_time The amount of time, in minutes, idle kernels are kept.
Default value
30
Valid values
An integer greater than 0.
analyticsengine_image_pull_parallelism The number of pods that are scheduled to pull the Spark image in parallel.

For example, if you have 100 nodes in the cluster, set:

  • analyticsengine_image_pull_completions: "100"
  • analyticsengine_image_pull_parallelism: "150"

In this example, at least 100 nodes will pull the image successfully with 150 pods pulling the image in parallel.

Default value
"40"
Valid values
An integer greater than or equal to 1.

Increase this value only if you have a very large cluster and you have sufficient network bandwidth and disk I/O to support more pulls in parallel.

analyticsengine_image_pull_completions The number of pods that should be completed in order for the image pull job to be completed.

For example, if you have 100 nodes in the cluster, set:

  • analyticsengine_image_pull_completions: "100"
  • analyticsengine_image_pull_parallelism: "150"

In this example, at least 100 nodes will pull the image successfully with 150 pods pulling the image in parallel.

Default value
"20"
Valid values
An integer greater than or equal to 1.

Increase this value only if you have a very large cluster and you have sufficient network bandwidth and disk I/O to support more pulls in parallel.

analyticsengine_kernel_cleanup_schedule Override the analyticsengine_kernel_cull_time setting for the kernel cleanup CronJob.

By default, the kernel cleanup CronJob runs every 30 minutes.

Default value
"*/30 * * * *"
Valid values
A string that uses the CronJob schedule syntax.
analyticsengine_job_cleanup_schedule Override the analyticsengine_kernel_cull_time setting for the job cleanup CronJob.

By default, the job cleanup CronJob runs every 30 minutes.

Default value
"*/30 * * * *"
Valid values
A string that uses the CronJob schedule syntax.
analyticsengine_skip_selinux_relabeling Specify whether to skip the SELinux relabeling.

To use this feature, you must create the required MachineConfig and RuntimeClass definitions. For more information, see Enabling MachineConfig and RuntimeClass definitions for certain properties.

Default value
false
Valid values
false
Do not skip the SELinux relabeling.
true
Skip the SELinux relabeling.
analyticsengine_mount_customizations_from_cchome Specify whether to you want to enable custom drivers. These drivers need to be mounted from the cc-home-pvc directory.

Common core services This feature is available only when the Cloud Pak for Data common core services are installed.

Default value
false
Valid values
false
You do not want to use custom drivers.
true
You want to enable custom drivers.

Runtime-level configurations

The following specifications are optional and can be altered to change Analytics Engine powered by Apache Spark runtime level configurations:

sparkRuntimeConfig:                   
  maxDriverCpuCores: 5                         # If you want to create Spark jobs with drive CPUs more than 5, set this value accordingly 
  maxExecutorCpuCores: 5                       # If you want to create Spark jobs with more than 5 CPU per Executor, set this value accordingly
  maxDriverMemory: "50g"                       # If you want to create Spark jobs with Drive Memory more than 50g, set this value accordingly 
  maxExecutorMemory: "50g"                     # If you want to create Spark jobs with more than 50g Memory per Executor, set this value accordingly
  maxNumWorkers: 50                            # If you want to create Spark jobs with more than 50 workers/executors, set this value accordingly
  localDirScaleFactor: 10                      # If you want to increase local disk space for your Spark jobs, set this value accordingly.

Service instance-level configurations

The following specifications are optional and can be altered to change the services instance level configurations. Each Analytics Engine powered by Apache Spark service instance has a resource quota (CPU/memory) set by default. It can be changed via API for an instance, but to change default values for any new instance creation, update the following values.

serviceInstanceConfig:                   
  defaultCpuQuota: 20                       # defaultCpuQuota is the accumulative CPU consumption of Spark jobs created under an instance. It can be no more than 20 
  defaultMemoryQuota: 80                    # defaultMemoryQuota is the accumulative memory consumption of Spark jobs created under an instance. It can be no more than 80 gigabytes.
The following configurations are for jobs and kernels at the Analytics Engine powered by Apache Spark service level. The service administrator can specify certain configurations at the Analytics Engine service level, which will apply to all the Spark jobs and kernels submitted against it. For this you can specify the configurations, under the sparkDefaults in the spec section of the Analytics Engine powered by Apache Spark custom resource (CR):

    sparkDefaults:
      mutableConfigs:
        spark.ui.requestheadersize: 12k
      immutableConfigs:
        ae.kernel.idle_timeout: 1000
mutableConfigs
Configuration parameters under this section are specified by an administrator and can be overridden for a specific job or kernel.
immutableConfigs
Configuration parameters under this section are specified by an administrator and cannot be overridden for a specific job or kernel.
Alternatively, you can also patch the CR after it has been created by running:
oc patch ae/analyticsengine-sample -p '{"spec": {"sparkDefaults":{"mutableConfigs":{"spark.ui.requestheadersize": "12k"}, "immutableConfigs":{"ae.kernel.idle_timeout":"1000"}}}}' --type=merge -n <cpd_instance_ns>
Table 1. Analytics Engine powered by Apache Spark Custom Resource description
Property Description Type Specification for .yml files
spec.scaleConfig Possible values: Small, Medium, or Large.

Default: Small

String (Choice Parameter) N/A
spec.serviceConfig To change service level configurations. Object N/A
spec.serviceConfig.schedulerForQuotaAndQueuing Set scheduler for resource quota and queuing features. Supported scheduler is ibm-cpd-scheduler. String N/A
spec.serviceConfig.sparkAdvEnabled This flag will enable or disable job UI capabilities of Analytics Engine powered by Apache Spark.

Default: False

Boolean analyticsengine_spark_adv_enabled
spec.serviceConfig.jobAutoDeleteEnabled Set to false if you do not want to remove jobs once they have reached terminal states. For example, FINISHED/FAILED.

Default: True

Boolean analyticsengine_job_auto_delete_enabled
spec.serviceConfig.kernelCullTime Change the value to minutes if you want to remove the idle kernel after X minutes.

Default: 30

Integer analyticsengine_kernel_cull_time
spec.serviceConfig.imagePullCompletions If you have a large Openshift cluster, you can update imagePullCompletions and imagePullParallelism accordingly.

Default: 20

Integer analyticsengine_image_pull_completions
spec.serviceConfig.imagePullParallelism If you have 100 nodes in the cluster, set imagePullCompletions: "100" and imagePullParallelism: "150".

Default: 40

Integer analyticsengine_image_pull_parallelism
spec.serviceConfig.kernelCleanupSchedule By default, kernel and job cleanup cronjobs look for idle Spark kernels/jobs based on the kernelCullTime parameter. If you want a less or more aggressive cleanup, change the value accordingly. For example, to 1 hour "* */1 * * *" k8s format.

Default: "*/30 * * * *"

String analyticsengine_kernel_cleanup_schedule
spec.serviceConfig.jobCleanupSchedule By default, kernel and job cleanup cronjobs look for idle Spark kernels/jobs based on the kernelCullTime parameter. If you want a less or more aggressive cleanup, change the value accordingly. For example, to 1 hour "* */1 * * *" k8s format.

Default: "*/30 * * * *"

String analyticsengine_job_cleanup_schedule
spec.serviceConfig.skipSELinuxRelabeling Set the value to true to skip SELinux relabeling.

Default: False

Prerequisite:
Boolean analyticsengine_skip_selinux_relabeling
spec.serviceConfig.mountCustomizationsFromCCHome Set the value to true to mount customizations from CCHome.

Default: False

Prerequisite:
  • Common-Core-Services should be installed.
Boolean analyticsengine_mount_customizations_from_cchome
spec.sparkRuntimeConfig Change Spark runtime level configurations. Object N/A
spec.sparkRuntimeConfig.maxDriverCpuCores Maximum number of Driver CPUs .

Default: 5

Integer analyticsengine_max_driver_cpu_cores
spec.sparkRuntimeConfig.maxExecutorCpuCore Maximum number of Executor CPUs.

Default: 5

Integer analyticsengine_max_executor_cpu_cores
spec.sparkRuntimeConfig.maxDriverMemory Maximum Driver memory in gigabytes.

Default: 50g

String analyticsengine_max_driver_memory
spec.sparkRuntimeConfig.maxExecutorMemory Maximum Executor memory in gigabytes.

Default: 50g

String analyticsengine_max_executor_memory
spec.sparkRuntimeConfig.maxNumWorkers Maximum number of workers/executors.

Default: 50

Integer analyticsengine_max_num_workers
spec.sparkRuntimeConfig.localDirScaleFactor

Temp disk size in Spark master/worker is a factor of number of CPU.

temp_disk_space = numCpu * localDirScaleFactor

Default: 10

Integer analyticsengine_local_dir_scale_factor
spec.serviceInstanceConfig Service instance level configurations. Each Analytics Engine powered by Apache Spark service instance has a resource quota (CPU/memory) set by default. It can be changed using the API for an instance, but to change default values for any new instance, update serviceInstanceConfig. Object N/A
spec.serviceInstanceConfig.defaultCpuQuota defaultCpuQuota is the accumulative CPU consumption of Spark jobs created. Under an instance, the CPU consumption can be no more than 20.

Default: 20

Integer analyticsengine_default_cpu_quota
spec.serviceInstanceConfig.defaultMemoryQuota defaultCpuQuota is the accumulative memory consumption of Spark jobs created. Under an instance, the CPU consumption can be no more than 80.

Default: 80

String analyticsengine_default_memory_quota
spec.sparkDefaults.mutableConfigs Change custom configurations for Spark runtimes that can be overridden a specific job or kernel. Object N/A
spec.sparkDefaults.immutableConfigs Change custom configurations for Spark runtimes that can not be overridden a specific job or kernel. Object N/A

Instance-level immutable configurations

These configurations can be set by instance-level administrators and cannot be over-ridden by a specific job or kernel.

However, configurations at the service-level can override instance-level configurations.

The following cURL commands can be used to GET, PUT and PATCH instance-level immutable configurations:
PUT API
curl -k -X PUT <cpd-route>/v4/analytics_engines/<instance_id>/immutable_configs  -H "Authorization: Bearer $TOKEN" --data-raw '{"ae.kubernetes.spec.nodeSelector":"bXlub2RlOiBzcGFyawo="}'
GET API
curl -k -X GET <cpd-route>/v4/analytics_engines/<instance_id>/immutable_configs  -H "Authorization: Bearer $TOKEN"
Response:
{
  ae.kubernetes.spec.nodeSelector: "bXlub2RlOiBzcGFyawo="
}
PATCH API
curl -k -X PATCH <cpd-route>/v4/analytics_engines/<instance_id>/immutable_configs  -H "Authorization: Bearer $TOKEN" --data-raw '{"ae.kubernetes.spec.nodeSelector":"bXlub2RlOiBzcGFyawo="}'

Enabling MachineConfig and RuntimeClass definitions for certain properties

For some of the properties in table 1, you will have to enable SELinux with the MachineConfig definition on the cluster and create RuntimeClass.
  1. Run the following command:
    cat << EOF | oc apply -f -
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: |
          {"apiVersion":"machineconfiguration.openshift.io/v1","kind":"MachineConfig","metadata":{"annotations":{},"labels":{"machineconfiguration.openshift.io/role":"worker"},"name":"99-worker-selinux-configuration"},"spec":{"config":{"ignition":{"version":"3.2.0"},"storage":{"files":[{"contents":{"source":"data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS5ydW50aW1lcy5zZWxpbnV4XQpydW50aW1lX3BhdGggPSAiL3Vzci9iaW4vcnVuYyIKcnVudGltZV9yb290ID0gIi9ydW4vcnVuYyIKcnVudGltZV90eXBlID0gIm9jaSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsiaW8ua3ViZXJuZXRlcy5jcmktby5UcnlTa2lwVm9sdW1lU0VMaW51eExhYmVsIl0K"},"mode":416,"overwrite":true,"path":"/etc/crio/crio.conf.d/01-selinux.conf"}]}},"osImageURL":""}}
      generation: 1
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 99-worker-selinux-configuration
    spec:
      config:
        ignition:
          version: 3.2.0
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS5ydW50aW1lcy5zZWxpbnV4XQpydW50aW1lX3BhdGggPSAiL3Vzci9iaW4vcnVuYyIKcnVudGltZV9yb290ID0gIi9ydW4vcnVuYyIKcnVudGltZV90eXBlID0gIm9jaSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsiaW8ua3ViZXJuZXRlcy5jcmktby5UcnlTa2lwVm9sdW1lU0VMaW51eExhYmVsIl0K
            mode: 416
            overwrite: true
            path: /etc/crio/crio.conf.d/01-selinux.conf
      osImageURL: ""
    EOF
  2. The RuntimeClass definition:
    cat << EOF | oc apply -f -
    apiVersion: node.k8s.io/v1
    kind: RuntimeClass
    metadata:
      name: selinux
    handler: selinux
    EOF

What to do next

Complete the following tasks in order before users can access the service:

  1. An instance administrator can set the scale of the service adjust the number of available pods. See Scaling services.
  2. Before you can submit Spark jobs by using the Spark jobs API, you must provision a service instance. See Provisioning the service instance.
  3. The service is ready to use. See Spark environments.