Specifying additional configurations for Analytics Engine Powered by Apache Spark
You can set additional configurations, other than the default ones, as part of pre-install or post-install steps. The following specifications are optional and can be altered to change the service level configurations.
serviceConfig:
schedulerForQuotaAndQueuing: "ibm-cpd-scheduler" # set the scheduler for resource quota and queueing features. Supported scheduler is ibm-cpd-scheduler.
sparkAdvEnabled: true # This flag will enable or disable job UI capabilities of Analytics Engine Powered by Apache Spark
jobAutoDeleteEnabled: true # Set this to false if you do not want to remove Analytics Engine Powered by Apache Spark jobs once they have reached terminal states. For example, FINISHED/FAILED
kernelCullTime: 30 # Change this value to minutes if you want to remove idle kernel after X minutes
imagePullCompletions: 20 # If you have a large Openshift cluster, you can update imagePullCompletions and imagePullParallelism accordingly
imagePullParallelism: "40" # If you have 100 nodes in the cluster, set imagePullCompletions: "100" and imagePullParallelism: "150"
kernelCleanupSchedule: "*/30 * * * *" # By default, the kernel and job cleanup cronjobs look for idle spark kernels/jobs based on the kernelCullTime parameter
jobCleanupSchedule: "*/30 * * * *" # and removes them. If you want a less or more aggressive cleanup, change this value accordingly. For example, to 1 hour "* */1 * * *" k8s format
The following specifications are optional and can be altered to change Spark runtime level configurations.
sparkRuntimeConfig:
maxDriverCpuCores: 5 # If you want to create Spark jobs with drive CPUs more than 5, set this value accordingly
maxExecutorCpuCores: 5 # If you want to create Spark jobs with more than 5 CPU per Executor, set this value accordingly
maxDriverMemory: "50g" # If you want to create Spark jobs with Drive Memory more than 50g, set this value accordingly
maxExecutorMemory: "50g" # If you want to create Spark jobs with more than 50g Memory per Executor, set this value accordingly
maxNumWorkers: 50 # If you want to create Spark jobs with more than 50 workers/executors, set this value accordingly
localDirScaleFactor: 10 # If you want to increase local disk space for your Spark jobs, set this value accordingly.
The following specifications are optional and can be altered to change the services instance level configurations. Each Analytics Engine Powered by Apache Spark service instance has a resource quota (CPU/memory) set by default. It can be changed via API for an instance, but to change default values for any new instance creation, update the following values.
serviceInstanceConfig:
defaultCpuQuota: 20 # defaultCpuQuota is the accumulative CPU consumption of Spark jobs created under an instance. It can be no more than 20
defaultMemoryQuota: 80 # defaultMemoryQuota is the accumulative memory consumption of Spark jobs created under an instance. It can be no more than 80 gigabytes.
Property | Description | Type | Specification for .yml files |
---|---|---|---|
spec.scaleConfig | Possible values: Small, Medium, or Large. Default: Small |
String (Choice Parameter) | N/A |
spec.serviceConfig | To change service level configurations. | Object | N/A |
spec.serviceConfig.schedulerForQuotaAndQueuing | Set scheduler for resource quota and queuing features. Supported scheduler is
ibm-cpd-scheduler . |
String | N/A |
spec.serviceConfig.sparkAdvEnabled | This flag will enable or disable job UI capabilities of Analytics Engine Powered by Apache
Spark. Default: False |
Boolean | analyticsengine_spark_adv_enabled |
spec.serviceConfig.jobAutoDeleteEnabled | Set to false if you do not want to remove jobs once they have reached terminal states. For
example, FINISHED/FAILED. Default: True |
Boolean | analyticsengine_job_auto_delete_enabled |
spec.serviceConfig.kernelCullTime | Change the value to minutes if you want to remove the idle kernel after X minutes.
Default: 30 |
Integer | analyticsengine_kernel_cull_time |
spec.serviceConfig.imagePullCompletions | If you have a large Openshift cluster, you can update imagePullCompletions and
imagePullParallelism accordingly. Default: 20 |
Integer | analyticsengine_image_pull_completions |
spec.serviceConfig.imagePullParallelism | If you have 100 nodes in the cluster, set imagePullCompletions: "100" and
imagePullParallelism: "150". Default: 40 |
Integer | analyticsengine_image_pull_parallelism |
spec.serviceConfig.kernelCleanupSchedule | By default, kernel and job cleanup cronjobs look for idle Spark kernels/jobs based on the
kernelCullTime parameter. If you want a less or more aggressive cleanup, change the value
accordingly. For example, to 1 hour "* */1 * * *" k8s format. Default: "*/30 * * * *" |
String | analyticsengine_kernel_cleanup_schedule |
spec.serviceConfig.jobCleanupSchedule | By default, kernel and job cleanup cronjobs look for idle Spark kernels/jobs based on the
kernelCullTime parameter. If you want a less or more aggressive cleanup, change the value
accordingly. For example, to 1 hour "* */1 * * *" k8s format. Default: "*/30 * * * *" |
String | analyticsengine_job_cleanup_schedule |
spec.sparkRuntimeConfig | Change Spark runtime level configurations. | Object | N/A |
spec.sparkRuntimeConfig.maxDriverCpuCores | Maximum number of Driver CPUs . Default: 5 |
Integer | analyticsengine_max_driver_cpu_cores |
spec.sparkRuntimeConfig.maxExecutorCpuCore | Maximum number of Executor CPUs. Default: 5 |
Integer | analyticsengine_max_executor_cpu_cores |
spec.sparkRuntimeConfig.maxDriverMemory | Maximum Driver memory in gigabytes. Default: 50g |
String | analyticsengine_max_driver_memory |
spec.sparkRuntimeConfig.maxExecutorMemory | Maximum Executor memory in gigabytes. Default: 50g |
String | analyticsengine_max_executor_memory |
spec.sparkRuntimeConfig.maxNumWorkers | Maximum number of workers/executors. Default: 50 |
Integer | analyticsengine_max_num_workers |
spec.sparkRuntimeConfig.localDirScaleFactor |
Temp disk size in Spark master/worker is a factor of number of CPU.
Default: 10 |
Integer | analyticsengine_local_dir_scale_factor |
spec.serviceInstanceConfig | Service instance level configurations. Each Analytics Engine Powered by Apache Spark service instance has a resource quota (CPU/memory) set by default. It can be changed using the API for an instance, but to change default values for any new instance, update serviceInstanceConfig. | Object | N/A |
spec.serviceInstanceConfig.defaultCpuQuota | defaultCpuQuota is the accumulative CPU consumption of Spark jobs created. Under an instance,
the CPU consumption can be no more than 20. Default: 20 |
Integer | analyticsengine_default_cpu_quota |
spec.serviceInstanceConfig.defaultMemoryQuota | defaultCpuQuota is the accumulative memory consumption of Spark jobs created. Under an
instance, the CPU consumption can be no more than 80. Default: 80 |
String | analyticsengine_default_memory_quota |
What to do next
Complete the following tasks in order before users can access the service:
- A project administrator can set the scale of the service adjust the number of available pods. See Scaling services.
- Before you can submit Spark jobs by using the Spark jobs API, you must provision a service instance. See Provisioning the service instance.
- The service is ready to use. See Spark environments.