Table of contents

Spark jobs API syntax, parameters and return codes

You typically submit a Spark job in a cURL command as follows:

curl -k -X POST <V3_JOBS_API_ENDPOINT> -H "Authorization: Bearer <ACCESS_TOKEN>" -d @input.json

Spark jobs cURL options:

  • The -k option skips certificate validation as the service instance website uses a self-signed SSL certificate.
  • <V3_JOBS_API_ENDPOINT> is endpoint to use to submit your Spark job. To get the Spark jobs endpoint for your provisioned Analytics Engine powered by Apache Spark service instance, see Administering the service instance.
  • The -H option is the header parameter. The header parameter is a key value pair. You must send the bearer token (<ACCESS_TOKEN>) in an authorization header. To get the access token for your service instance, see Administering the service instance.
  • The -d option sends input data in a POST request to the HTTP server. See the example of an input payload below.

    An example of an input payload for a Python job:

      {
          "template_id": "<template_id>",
          "application_details": {
                  "application": "/opt/ibm/spark/examples/src/main/python/wordcount.py",
                  "application_arguments": ["/opt/ibm/spark/examples/src/main/resources/people.txt"],
                  "conf": {
                          "spark.app.name": "MyJob",
                          "spark.eventLog.enabled": "true"
                          },
                  "env": {
                          "SAMPLE_ENV_KEY": "SAMPLE_VALUE"
                          },
                  "driver-memory": "4G",
                  "driver-cores": 1,
                  "executor-memory": "4G",
                  "executor-cores": 1,
                  "num-executors": 1
            }
      }
    

    An example of an input payload for a Scala job:

      {
          "template_id": "<template_id>",
          "application_details": {
                  "application": "/opt/ibm/spark/examples/jars/spark-examples*.jar",
                  "application_arguments": ["1"],
                  "class": "org.apache.spark.examples.SparkPi",
                  "conf": {
                          "spark.app.name": "MyJob",
                          "spark.eventLog.enabled": "true"
                          },
                  "env": {
                          "SAMPLE_ENV_KEY": "SAMPLE_VALUE"
                          },
                  "driver-memory": "4G",
                  "driver-cores": 1,
                  "executor-memory": "4G",
                  "executor-cores": 1,
                  "num-executors": 1
                  }
      }
    

    The returned response if your job was successfully submitted:

      {
          "application_id": "<application_id>",
          "state": "RUNNING",
          "start_time": "Monday' 07 June 2021 '14:46:23.237+0000",
          "spark_application_id": "app-20210607144623-0000"
      }
    

Hint: Save the returned Spark application ID because you will need it to get the status of a job or to delete a job.

Spark jobs API parameters

These are the parameters you can use in the Spark jobs API:

Name Sub-properties Required/Optional Type Description
application_details   Required Object Specifies the Spark application details
  application Required String Specifies the Spark application file, i.e. the file path to the Python, R, or scala job file
  application_arguments Optional String[] Specifies the application arguments
  conf Optional Key-value JSON object Specifies the Spark configuration values that override the predefined values. See Spark environment variables for a list of the supported variables.
  env Optional Key-value JSON object Specifies Spark environment variables required for the job. See Spark environment variables for a list of the supported variables.
  class Optional String Takes the parameters num_workers, worker_size and driver_size
  name Optional String Specifies the number of worker nodes in the Spark cluster. num_workers is equal to the number of executors you want. The default is 1 executor per worker node. The maximum number of executors supported is 50.
  executor-memory Optional String Specifies the memory per executor, for example 1000M or 2G. The default is 1G.
  executor-cores Optional Integer Specifies the number of cores per executor or all available cores on the worker in standalone mode
  num-executors Optional Integer Specifies the number of executors to launch. The default is 1.
  driver-cores Optional Integer Specifies the number of cores used by the driver, only in cluster mode. The default is 1.
  driver-memory Optional String Specifies the memory for the driver, for example 1000M or 2G. The default is 1024M.
  driver-java-options Optional String Specifies extra Java options to pass to the driver
  driver-library-path Optional String Specifies extra library path entries to pass to the driver
  driver-class-path Optional String Species extra class path entries to pass to the driver. Note that jars added with --jars are automatically included in the classpath. 
  jars Optional  String Specifies a comma-separated list of jars to include on the driver and executor classpaths
  packages Optional String Specifies a comma-separated list of Maven coordinates of jars to include on the driver and executor classpaths. Searches the local Maven repository, then Maven central and finally any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.
  exclude-packages Optional String Specifies a comma-separated list of groupId:artifactId to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts
  repositories Optional  String Specifies a comma-separated list of additional remote repositories to search for the Maven coordinates given with --packages
  py-files Optional String Specifies a comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps
template_id   Optional String Specifies the Spark version and preinstalled system libraries. The default is spark-2.4.0-jaas-v2-cp4d-template. For Spark 3.0 use spark-3.0.0-jaas-v2-cp4d-template.
volumes   Optional list of objects Specifies the volumes to be mounted other than the home volume
  volume_name Required  String Specifies the name of the volume
  source_path Required String Specifies the source path in the volume to be mounted
  mount_path Required String Specifies the location where the volume is to be mounted

Response codes

The Spark jobs API returns the following response codes:

Return code Meaning of the return code Description
201  Job created The Spark job was successfully submitted.
Job response: {"application_id":"<job_id>", "state":"<job_state>", "start_time": "<start_time>", "spark_application_id": "<spark_app_id>"}
 Location Header : Link to GET
400  Bad request This is returned when the payload is incorrect, for example, if the payload format is incorrect or arguments are missing.
404 Not Found This is returned when the Spark application is submitted for instance ID that does not exist.
500  Internal server error This is returned when the server isn’t responding to what you’re asking it to do. Try submitting your job again.
503 Service unavailable This is returned when there are insufficient resources. 
Possible response: Could not complete the request. Reason - FailedScheduling. Detailed error - 0/6  nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.