Table of contents

Spark jobs API syntax, parameters and return codes

You typically submit a Spark job in a cURL command as follows:

curl -k -X POST <JOBS_API_ENDPOINT> -H "Authorization: Bearer <ACCESS_TOKEN>" -d @input.json

Spark jobs cURL options:

  • The -k option skips certificate validation as the service instance website uses a self-signed SSL certificate.
  • <JOB_API_ENDPOINT> is endpoint to use to submit your Spark job. To get the Spark jobs endpoint for your provisioned Analytics Engine powered by Apache Spark service instance, see Administering the service instance.
  • The -H option is the header parameter. The header parameter is a key value pair. You must send the bearer token (<ACCESS_TOKEN>) in an authorization header. To get the access token for your service instance, see Administering the service instance.
  • The -d option sends input data in a POST request to the HTTP server. See the example of an input payload below.

    An example of an input payload for a Python job:

      {
          "engine": {
                  "type": "spark",
                  "template_id": "spark-2.4.0-jaas-v2-cp4d-template",
                  "conf": {
                          "spark.app.name": "MyJob",
                          "spark.eventLog.enabled": "false"
    
                  },
                  "env": {
                          "SAMPLE_ENV_KEY" : "SAMPLE_VALUE"
                  },
                  "size": {
                          "num_workers": 1,
                          "worker_size": {
                                  "cpu": 1,
                                  "memory": "1g"
                          }
                  }
          },
          "application_arguments": ["/opt/ibm/spark/examples/src/main/resources/people.txt"],
          "application": "/opt/ibm/spark/examples/src/main/python/wordcount.py"
      }
    

    An example of an input payload for a Scala job:

      {
          "engine": {
                  "type": "spark",
                  "template_id": "spark-2.4.0-jaas-v2-cp4d-template",
                  "conf": {
                          "spark.app.name": "MyJob"
                  },
                  "env": {
                          "SAMPLE_ENV_KEY" : "SAMPLE_VALUE"
                  },
                  "size": {
                          "num_workers": 1,
                          "worker_size": {
                                  "cpu": 1,
                                  "memory": "1g"
                          }
                  }
          },
          "application_jar":"/opt/ibm/spark/examples/jars/spark-examples*.jar",
          "main_class":"org.apache.spark.examples.SparkPi"
      }
     
    

The returned response if your job was successfully submitted:

{
        "id": "JOB_ID",
        "job_state": "RUNNING"
}

Hint: Save the returned job ID because you will need it to get the status of a job or to delete a job.

Spark jobs API parameters

These are the parameters you can use in the Spark jobs API:

Name Required/Optional Type Description
engine Optional Key-value pairs Specifies the runtime (Spark) with the configuration and version information.
type Required if engine is specified. String Specifies the runtime type. Currently, only “spark” is supported.
template_id Optional String Specifies the Spark version and preinstalled system libraries. The default is spark-2.4.0-jaas-v2-cp4d-template. For Spark 3.0 use spark-3.0.0-jaas-v2-cp4d-template.
conf Optional Key-value JSON object Specifies the Spark configuration values that override the predefined values.
env Optional Key-value JSON object Specifies Spark environment variables required for the job.
size Optional   Takes the parameters num_workers, worker_size and driver_size
num_workers Required if size is specified Integer Specifies the number of worker nodes in the Spark cluster. num_workers is equal to the number of executors you want. The default is 1 executor per worker node. The maximum number of executors supported is 50.
worker_size Required if size is specified   Takes the parameters cpu and memory.
cpu Required if worker_size is specified Integer Specifies the amount of CPU for the worker node. Default is 1 CPU. Maximum is 10 CPU.
memory Required if worker_size is specified Integer Specifies the amount of memory for each worker node. Default is 1 GB. Maximum is 40 GB.
driver_size Required if size is specified   Takes the parameters cpu and memory.
cpu Required if driver_size is specified Integer Specifies the amount of CPU for the driver node. Default is 1 CPU. Maximum is 10 CPU.
memory Required if driver_size is specified Integer Specifies the amount of memory for each driver node. Default is 1 GB. Maximum is 40 GB.
application_arguments Optional Array of strings Specifies the arguments required by the job. If the job doesn’t require any arguments, application_arguments can be empty, "application_arguments":[].
application Required for Python and R jobs. String The file path to the Python or R job file.
application_jar Required for Scala jobs. String The file path to the Scala job jar file.
main_class Required for Scala jobs. Optional for Python and R jobs. String Specifies the main entry point to the application, for example for Scala org.apache.spark.examples.SparkPi.

Error return codes and messages

The Spark jobs API returns the following return codes and messages:

Return code Meaning of the return code Description
201 Job created