Specifying additional configurations for Analytics Engine powered by Apache Spark

An instance administrator can specify additional configurations for Analytics Engine powered by Apache Spark on IBM Cloud Pak for Data.

You can set additional configurations, other than the default ones, as part of post-install steps.

Service-level configurations

The following specifications are optional and can be altered to change the service level configurations:

serviceConfig:
  schedulerForQuotaAndQueuing: "ibm-cpd-scheduler"
  sparkAdvEnabled: true                  
  jobAutoDeleteEnabled: true             
  kernelCullTime: 30                     
  imagePullCompletions: 20               
  imagePullParallelism: "40"             
  kernelCleanupSchedule: "*/30 * * * *"  
  jobCleanupSchedule: "*/30 * * * *"     
  skipSELinuxRelabeling: false           
  mountCustomizationsFromCCHome: false

Property	Description
`schedulerForQuotaAndQueuing`	Set the scheduler for the resource quota and queuing features. Valid values `ibm-cpd-scheduler` This is the supported scheduler.
`analyticsengine_spark_adv_enabled`	Specify whether to display the job UI. Default value `true` Valid values `false` Do not display the job UI. `true` Display the job UI.
`analyticsengine_job_auto_delete_enabled`	Specify whether to automatically delete jobs after they reach a terminal state, such as `FINISHED` or `FAILED`. The default is true. Default value `true` Valid values `true` Delete jobs after they reach a terminal state. `false` Retain jobs after they reach a terminal state.
`analyticsengine_kernel_cull_time`	The amount of time, in minutes, idle kernels are kept. Default value `30` Valid values An integer greater than 0.
`analyticsengine_image_pull_parallelism`	The number of pods that are scheduled to pull the Spark image in parallel. For example, if you have 100 nodes in the cluster, set: `analyticsengine_image_pull_completions: "100"` `analyticsengine_image_pull_parallelism: "150"` In this example, at least 100 nodes will pull the image successfully with 150 pods pulling the image in parallel. Default value `"40"` Valid values An integer greater than or equal to 1. Increase this value only if you have a very large cluster and you have sufficient network bandwidth and disk I/O to support more pulls in parallel.
`analyticsengine_image_pull_completions`	The number of pods that should be completed in order for the image pull job to be completed. For example, if you have 100 nodes in the cluster, set: `analyticsengine_image_pull_completions: "100"` `analyticsengine_image_pull_parallelism: "150"` In this example, at least 100 nodes will pull the image successfully with 150 pods pulling the image in parallel. Default value `"20"` Valid values An integer greater than or equal to 1. Increase this value only if you have a very large cluster and you have sufficient network bandwidth and disk I/O to support more pulls in parallel.
`analyticsengine_kernel_cleanup_schedule`	Override the `analyticsengine_kernel_cull_time` setting for the kernel cleanup `CronJob`. By default, the kernel cleanup `CronJob` runs every 30 minutes. Default value `"/30 * * *"` Valid values A string that uses the `CronJob` schedule syntax.
`analyticsengine_job_cleanup_schedule`	Override the `analyticsengine_kernel_cull_time` setting for the job cleanup `CronJob`. By default, the job cleanup `CronJob` runs every 30 minutes. Default value `"/30 * * *"` Valid values A string that uses the `CronJob` schedule syntax.
`analyticsengine_skip_selinux_relabeling`	Specify whether to skip the SELinux relabeling. To use this feature, you must create the required `MachineConfig` and `RuntimeClass` definitions. For more information, see Enabling `MachineConfig` and `RuntimeClass` definitions for certain properties. Default value `false` Valid values `false` Do not skip the SELinux relabeling. `true` Skip the SELinux relabeling.
`analyticsengine_mount_customizations_from_cchome`	Specify whether to you want to enable custom drivers. These drivers need to be mounted from the cc-home-pvc directory. Common core services This feature is available only when the Cloud Pak for Data common core services are installed. Default value `false` Valid values `false` You do not want to use custom drivers. `true` You want to enable custom drivers.

Runtime-level configurations

The following specifications are optional and can be altered to change Analytics Engine powered by Apache Spark runtime level configurations:

sparkRuntimeConfig:                   
  maxDriverCpuCores: 5                         # If you want to create Spark jobs with drive CPUs more than 5, set this value accordingly 
  maxExecutorCpuCores: 5                       # If you want to create Spark jobs with more than 5 CPU per Executor, set this value accordingly
  maxDriverMemory: "50g"                       # If you want to create Spark jobs with Drive Memory more than 50g, set this value accordingly 
  maxExecutorMemory: "50g"                     # If you want to create Spark jobs with more than 50g Memory per Executor, set this value accordingly
  maxNumWorkers: 50                            # If you want to create Spark jobs with more than 50 workers/executors, set this value accordingly
  localDirScaleFactor: 10                      # If you want to increase local disk space for your Spark jobs, set this value accordingly.

Service instance-level configurations

The following specifications are optional and can be altered to change the services instance level configurations. Each Analytics Engine powered by Apache Spark service instance has a resource quota (CPU/memory) set by default. It can be changed via API for an instance, but to change default values for any new instance creation, update the following values.

serviceInstanceConfig:                   
  defaultCpuQuota: 20                       # defaultCpuQuota is the accumulative CPU consumption of Spark jobs created under an instance. It can be no more than 20 
  defaultMemoryQuota: 80                    # defaultMemoryQuota is the accumulative memory consumption of Spark jobs created under an instance. It can be no more than 80 gigabytes.

The following configurations are for jobs and kernels at the Analytics Engine powered by Apache Spark service level. The service administrator can specify certain configurations at the Analytics Engine service level, which will apply to all the Spark jobs and kernels submitted against it. For this you can specify the configurations, under the sparkDefaults in the spec section of the Analytics Engine powered by Apache Spark custom resource (CR):


    sparkDefaults:
      mutableConfigs:
        spark.ui.requestheadersize: 12k
      immutableConfigs:
        ae.kernel.idle_timeout: 1000

mutableConfigs: Configuration parameters under this section are specified by an administrator and can be overridden for a specific job or kernel.
immutableConfigs: Configuration parameters under this section are specified by an administrator and cannot be overridden for a specific job or kernel.

Alternatively, you can also patch the CR after it has been created by running:

oc patch ae/analyticsengine-sample -p '{"spec": {"sparkDefaults":{"mutableConfigs":{"spark.ui.requestheadersize": "12k"}, "immutableConfigs":{"ae.kernel.idle_timeout":"1000"}}}}' --type=merge -n <cpd_instance_ns>

Table 1. Analytics Engine powered by Apache Spark Custom Resource description
Property	Description	Type	Specification for .yml files
spec.scaleConfig	Possible values: Small, Medium, or Large. Default: Small	String (Choice Parameter)	N/A
spec.serviceConfig	To change service level configurations.	Object	N/A
spec.serviceConfig.schedulerForQuotaAndQueuing	Set scheduler for resource quota and queuing features. Supported scheduler is `ibm-cpd-scheduler`.	String	N/A
spec.serviceConfig.sparkAdvEnabled	This flag will enable or disable job UI capabilities of Analytics Engine powered by Apache Spark. Default: False	Boolean	analyticsengine_spark_adv_enabled
spec.serviceConfig.jobAutoDeleteEnabled	Set to false if you do not want to remove jobs once they have reached terminal states. For example, FINISHED/FAILED. Default: True	Boolean	analyticsengine_job_auto_delete_enabled
spec.serviceConfig.kernelCullTime	Change the value to minutes if you want to remove the idle kernel after X minutes. Default: 30	Integer	analyticsengine_kernel_cull_time
spec.serviceConfig.imagePullCompletions	If you have a large Openshift cluster, you can update imagePullCompletions and imagePullParallelism accordingly. Default: 20	Integer	analyticsengine_image_pull_completions
spec.serviceConfig.imagePullParallelism	If you have 100 nodes in the cluster, set imagePullCompletions: "100" and imagePullParallelism: "150". Default: 40	Integer	analyticsengine_image_pull_parallelism
spec.serviceConfig.kernelCleanupSchedule	By default, kernel and job cleanup cronjobs look for idle Spark kernels/jobs based on the kernelCullTime parameter. If you want a less or more aggressive cleanup, change the value accordingly. For example, to 1 hour "* /1 * " k8s format. Default: "/30 * * * "*	String	analyticsengine_kernel_cleanup_schedule
spec.serviceConfig.jobCleanupSchedule	By default, kernel and job cleanup cronjobs look for idle Spark kernels/jobs based on the kernelCullTime parameter. If you want a less or more aggressive cleanup, change the value accordingly. For example, to 1 hour "* /1 * " k8s format. Default: "/30 * * * "*	String	analyticsengine_job_cleanup_schedule
spec.serviceConfig.skipSELinuxRelabeling	Set the value to true to skip SELinux relabeling. Default: False Prerequisite: Common-Core-Services should be installed. Create MachineConfig and RuntimeClass definitions. See Enabling MachineConfig and RuntimeClass definitions for certain properties.	Boolean	analyticsengine_skip_selinux_relabeling
spec.serviceConfig.mountCustomizationsFromCCHome	Set the value to true to mount customizations from CCHome. Default: False Prerequisite: Common-Core-Services should be installed.	Boolean	analyticsengine_mount_customizations_from_cchome
spec.sparkRuntimeConfig	Change Spark runtime level configurations.	Object	N/A
spec.sparkRuntimeConfig.maxDriverCpuCores	Maximum number of Driver CPUs . Default: 5	Integer	analyticsengine_max_driver_cpu_cores
spec.sparkRuntimeConfig.maxExecutorCpuCore	Maximum number of Executor CPUs. Default: 5	Integer	analyticsengine_max_executor_cpu_cores
spec.sparkRuntimeConfig.maxDriverMemory	Maximum Driver memory in gigabytes. Default: 50g	String	analyticsengine_max_driver_memory
spec.sparkRuntimeConfig.maxExecutorMemory	Maximum Executor memory in gigabytes. Default: 50g	String	analyticsengine_max_executor_memory
spec.sparkRuntimeConfig.maxNumWorkers	Maximum number of workers/executors. Default: 50	Integer	analyticsengine_max_num_workers
spec.sparkRuntimeConfig.localDirScaleFactor	Temp disk size in Spark master/worker is a factor of number of CPU. `temp_disk_space = numCpu * localDirScaleFactor` Default: 10	Integer	analyticsengine_local_dir_scale_factor
spec.serviceInstanceConfig	Service instance level configurations. Each Analytics Engine powered by Apache Spark service instance has a resource quota (CPU/memory) set by default. It can be changed using the API for an instance, but to change default values for any new instance, update serviceInstanceConfig.	Object	N/A
spec.serviceInstanceConfig.defaultCpuQuota	defaultCpuQuota is the accumulative CPU consumption of Spark jobs created. Under an instance, the CPU consumption can be no more than 20. Default: 20	Integer	analyticsengine_default_cpu_quota
spec.serviceInstanceConfig.defaultMemoryQuota	defaultCpuQuota is the accumulative memory consumption of Spark jobs created. Under an instance, the CPU consumption can be no more than 80. Default: 80	String	analyticsengine_default_memory_quota
spec.sparkDefaults.mutableConfigs	Change custom configurations for Spark runtimes that can be overridden a specific job or kernel.	Object	N/A
spec.sparkDefaults.immutableConfigs	Change custom configurations for Spark runtimes that can not be overridden a specific job or kernel.	Object	N/A

Instance-level immutable configurations

These configurations can be set by instance-level administrators and cannot be over-ridden by a specific job or kernel.

However, configurations at the service-level can override instance-level configurations.

The following cURL commands can be used to GET, PUT and PATCH instance-level immutable configurations:

PUT API

curl -k -X PUT <cpd-route>/v4/analytics_engines/<instance_id>/immutable_configs  -H "Authorization: Bearer $TOKEN" --data-raw '{"ae.kubernetes.spec.nodeSelector":"bXlub2RlOiBzcGFyawo="}'

GET API

curl -k -X GET <cpd-route>/v4/analytics_engines/<instance_id>/immutable_configs  -H "Authorization: Bearer $TOKEN"

Response:

{
  ae.kubernetes.spec.nodeSelector: "bXlub2RlOiBzcGFyawo="
}

PATCH API

curl -k -X PATCH <cpd-route>/v4/analytics_engines/<instance_id>/immutable_configs  -H "Authorization: Bearer $TOKEN" --data-raw '{"ae.kubernetes.spec.nodeSelector":"bXlub2RlOiBzcGFyawo="}'

Enabling MachineConfig and RuntimeClass definitions for certain properties

For some of the properties in table 1, you will have to enable SELinux with the MachineConfig definition on the cluster and create RuntimeClass.

Run the following command:

cat << EOF | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"machineconfiguration.openshift.io/v1","kind":"MachineConfig","metadata":{"annotations":{},"labels":{"machineconfiguration.openshift.io/role":"worker"},"name":"99-worker-selinux-configuration"},"spec":{"config":{"ignition":{"version":"3.2.0"},"storage":{"files":[{"contents":{"source":"data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS5ydW50aW1lcy5zZWxpbnV4XQpydW50aW1lX3BhdGggPSAiL3Vzci9iaW4vcnVuYyIKcnVudGltZV9yb290ID0gIi9ydW4vcnVuYyIKcnVudGltZV90eXBlID0gIm9jaSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsiaW8ua3ViZXJuZXRlcy5jcmktby5UcnlTa2lwVm9sdW1lU0VMaW51eExhYmVsIl0K"},"mode":416,"overwrite":true,"path":"/etc/crio/crio.conf.d/01-selinux.conf"}]}},"osImageURL":""}}
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-selinux-configuration
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS5ydW50aW1lcy5zZWxpbnV4XQpydW50aW1lX3BhdGggPSAiL3Vzci9iaW4vcnVuYyIKcnVudGltZV9yb290ID0gIi9ydW4vcnVuYyIKcnVudGltZV90eXBlID0gIm9jaSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsiaW8ua3ViZXJuZXRlcy5jcmktby5UcnlTa2lwVm9sdW1lU0VMaW51eExhYmVsIl0K
        mode: 416
        overwrite: true
        path: /etc/crio/crio.conf.d/01-selinux.conf
  osImageURL: ""
EOF

The RuntimeClass definition:

cat << EOF | oc apply -f -
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: selinux
handler: selinux
EOF

What to do next

Complete the following tasks in order before users can access the service:

An instance administrator can set the scale of the service adjust the number of available pods. See Scaling services.
Before you can submit Spark jobs by using the Spark jobs API, you must provision a service instance. See Provisioning the service instance.
The service is ready to use. See Spark environments.