Changing cron job configurations in Analytics Engine Powered by Spark

The Analytics Engine Powered by Apache Spark service uses cron jobs to automate repetitive tasks on the Spark cluster.

The Spark assembly cron jobs fall into two catagories:

  • Cleanup cron jobs

    The Spark assembly runs auto cleanup to delete unwanted or unused Spark runtimes or Spark jobs. Auto cleanup ensures that no additional resources occupy the cluster. There are three cleanup cron jobs that run every 30 minutes by default.

    1. Spark jobs are automatically cleaned up by the cron job spark-hb-job-cleanup-cron based on the following criteria:

      • The Spark jobs completed successfully but the job was not deleted by the user before the defined idle timeout exceeded.
      • The Spark jobs completed but failed.
    2. Spark runtimes started in Watson Studio are automatically cleaned up by the cron job spark-hb-kernel-cleanup-cron based on the following criteria:

      • The Spark runtime was created successfully and is inactive but the defined idle timeout has exceeded.
      • The Spark runtime creation failed.
    3. spark-hb-terminating-pod-cleanup-cron removes all the Spark runtime pods that are stuck in the terminating state.

    To change the default schedule of the cron cleanup jobs, see Changing the cron job run frequency.

  • Cron jobs that cache Spark runtime images on worker nodes

    • spark-hb-preload-jkg-image ensures that all the Spark runtime images are preloaded on the worker nodes and that the images are not garbage collected. By default, this cron job creates 40 pods every 2 hours, and makes sure that 32 pods reach completion.

      If you have more than 40 nodes in your cluster, you can change the configuration of the cron job to fit your cluster size. See Changing the cron job configuration to preload images on large clusters.

Releasing Spark runtime resources

Cleanup cron jobs determine whether Spark jobs or runtimes need to be deleted based on the value of kernelCullTime in the Analytics Engine custom resource (CR) YAML file. By default, kernelCullTime is set to 30 minutes. If you want to change the cleanup frequency, you can do so by changing the value of kernelCullTime in the CR YAML file and changing the schedule of the cleanup cron job.

  1. Change the value of kernelCullTime in the Analytics Engine custom resource (CR) YAML file:

    1. Update the kernelCullTime property in Analytics Engine CR YAML file that was used to set up Analytics Engine Powered by Apache Spark. See Additional installation options. Then apply the changes to an existing deployed CR using the following command:

      oc apply -f analyticsengine-cr.yaml -n ${PROJECT_CPD_INSTANCE}
      
    2. Wait for the Analytics Engine CR to be in Completed state:

      oc get analyticsengine -n ${PROJECT_CPD_INSTANCE}
      
  2. Change the schedule of the cleanup cron job. See Changing the cron job run frequency.

Changing the cron job frequency

You can change the default run frequency of the cron cleanup jobs. For example, you can change the schedule of the spark-hb-job-cleanup-cron job to run every hour instead of every 30 minutes.

  1. View the default cron cleanup job schedule:

    oc get cronjobs -l release=ibm-analyticsengine-prod -n ${PROJECT_CPD_INSTANCE}
    

    This outputs:

     NAME                                    SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
     spark-hb-job-cleanup-cron               */30 * * * *   False     0        18m             69m
     spark-hb-kernel-cleanup-cron            */30 * * * *   False     0        18m             69m
     spark-hb-preload-jkg-image              0 */2 * * *    False     0        none            69m
     spark-hb-terminating-pod-cleanup-cron   */30 * * * *   False     0        18m             69m
     
  2. Change the schedule to 1 hour by updating the kernelCleanupSchedule and jobCleanupSchedule properties in the Analytics Engine CR YAML file that was used to set up Analytics Engine Powered by Apache Spark. See Additional installation options. Then apply the changes to an existing deployed CR using the following command:

    oc apply -f analyticsengine-cr.yaml -n ${PROJECT_CPD_INSTANCE}
    
  3. Wait for the Analytics Engine CR to be in Completed state:

    oc get analyticsengine -n ${PROJECT_CPD_INSTANCE}
    
  4. View the changed cron job:

    oc get cronjobs -l release=ibm-analyticsengine-prod 
    

Changing the cron job configuration to preload images on large clusters

You can change the configuration of the cron job spark-hb-preload-jkg-image that preloads the runtime images on cluster nodes. For example, if you have 100 nodes on your cluster, you can change the number of nodes to which to preload the runtime images.

  1. Get the number of nodes and calculate the parallelism:

    nodes=100
    parallelism=$(($nodes + $(($nodes / 3))))
    
  2. Change the imagePullCompletions property to the number of nodes you have and the imagePullParallelism property to the calculated parallelism value in Analytics Engine CR YAML file that was to set up Analytics Engine Powered by Apache Spark. See Additional installation options. Then apply the changes to an existing deployed CR using the following command:

    oc apply -f analyticsengine-cr.yaml -n ${PROJECT_CPD_INSTANCE}
    
  3. Wait for the Analytics Engine CR to be in Completed state:

    oc get analyticsengine -n ${PROJECT_CPD_INSTANCE}
    

Parent topic: Administering Analytics Engine Powered by Apache Spark