Spark jobs API syntax, parameters and return codes
You typically submit a Spark job with a cURL command.
The Spark job cURL command syntax is:
curl -k -X POST <V4_JOBS_API_ENDPOINT> -H "Authorization: ZenApiKey <TOKEN> -d @input.json
Replace the variables as follows:
<V4_JOBS_API_ENDPOINT>
: The endpoint for the instance that you want to use to submit your Spark job. Note that multiple Analytics Engine Powered by Apache Spark instances can exist on the IBM Cloud Pak for Data server and each instance has its own endpoint for submitting jobs. To get the Spark jobs endpoint for your provisioned instance, see Managing Analytics Engine powered by Apache Spark instances.<TOKEN>
: To get the access token for your service instance, see Generating an API authorization token.
The POST method returns after the initial validation of the application. The job request is processed asynchronously; first the SparkContext is created and then the application is executed. The current status of the application can be fetched by using the GET method. See Spark job status.
An example of an input payload for a Python job:
{
"application_details": {
"application": "/opt/ibm/spark/examples/src/main/python/wordcount.py",
"arguments": [
"/opt/ibm/spark/examples/src/main/resources/people.txt"
],
"conf": {
"spark.app.name": "MyJob",
"spark.eventLog.enabled": "true",
"spark.driver.memory": "4G",
"spark.driver.cores": 1,
"spark.executor.memory": "4G",
"spark.executor.cores": 1,
"ae.spark.executor.count": 1
},
"env": {
"SAMPLE_ENV_KEY": "SAMPLE_VALUE"
}
}
}
An example of an input payload for an R job:
{
"application_details": {
"application": "/opt/ibm/spark/examples/src/main/r/dataframe.R",
"conf": {
"spark.app.name": "MyJob",
"spark.eventLog.enabled": "true",
"spark.driver.memory": "4G",
"spark.driver.cores": 1,
"spark.executor.memory": "4G",
"spark.executor.cores": 1,
"ae.spark.executor.count": 1
},
"env": {
"SAMPLE_ENV_KEY": "SAMPLE_VALUE"
}
}
}
An example of an input payload for a Scala job:
{
"application_details": {
"application": "/opt/ibm/spark/examples/jars/spark-examples*.jar",
"arguments": [
"1"
],
"class": "org.apache.spark.examples.SparkPi",
"conf": {
"spark.app.name": "MyJob",
"spark.eventLog.enabled": "true",
"spark.driver.memory": "4G",
"spark.driver.cores": 1,
"spark.executor.memory": "4G",
"spark.executor.cores": 1,
"ae.spark.executor.count": 1
},
"env": {
"SAMPLE_ENV_KEY": "SAMPLE_VALUE"
}
}
}
The returned response if your job was successfully submitted:
{
"application_id": "<application_id>",
"state": "ACCEPTED"
}
Hint:
- Save the returned value of "application_id" to get the status of the job or to stop the job.
- Save the returned value of "spark_application_id" to monitor and analyze the Spark application on the Spark history server.
Spark Job API using custom spark runtime
An example of an input payload for changing spark runtime version:
{
"application_details": {
"application": "/opt/ibm/spark/examples/src/main/python/wordcount.py",
"arguments": [
"/opt/ibm/spark/examples/src/main/resources/people.txt"
],
"runtime": {
"spark_version": "3.4"
}
}
}
Spark Job API using custom packages
An example of an input payload for using custom packages:
{
"volumes": [
{
"name": "cpd-instance::myapp-vol",
"mount_path": "/my-app"
}
],
"application_details": {
"application": "/my-app/python-spark-pi.py",
"packages":
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.0,org.apache.spark:spark-streaming-kafka-0-10_2.12:3.4.0",
"conf": {
"spark.app.name": "MyJob",
"spark.eventLog.enabled": "true",
"spark.driver.cores": 4,
"spark.driver.memory": "8G",
"spark.executor.memory": "2G",
"spark.executor.cores": 4,
"ae.spark.driver.log.level": "ERROR",
"ae.spark.executor.log.level": "WARN"
}
}
}
Spark jobs API parameters
These are the parameters you can use in the Spark jobs API:
Name | Sub-properties | Required/Optional | Type | Description |
---|---|---|---|---|
application_details | Required | Object | Specifies the Spark application details | |
application | Required | String | Specifies the Spark application file, i.e. the file path to the Python, R, or scala job file | |
arguments | Optional | String[] | Specifies the application arguments | |
conf | Optional | Key-value JSON object | Specifies the Spark configuration values that override the predefined values. See the section Default Spark configuration parameters and environment variables for the default configuration parameters defined by the Spark service. See Apache Spark configurations for the configuration parameters supported by Apache Spark. | |
env | Optional | Key-value JSON object | Specifies Spark environment variables required for the job. See the section Default Spark configuration parameters and environment variables for the default environment variables defined by the Spark service. See Apache Spark environment variables for the environment variables supported by Apache Spark. | |
class | Optional | String | Specifies the entry point for your Scala application. | |
driver-java-options | Optional | String | Specifies extra Java options to pass to the driver | |
driver-library-path | Optional | String | Specifies extra library path entries to pass to the driver | |
driver-class-path | Optional | String | Species extra class path entries to pass to the driver. Note that jars added with --jars are automatically included in the classpath. |
|
jars | Optional | String | Specifies a comma-separated list of jars to include on the driver and executor classpaths | |
packages | Optional | String | Specifies a comma-separated list of Maven coordinates of jars to include on the driver and executor classpaths. Searches the local Maven repository, then Maven central and finally any additional remote repositories given by --repositories .
The format for the coordinates should be groupId:artifactId:version . |
|
exclude-packages | Optional | String | Specifies a comma-separated list of groupId:artifactId to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts |
|
repositories | Optional | String | Specifies a comma-separated list of additional remote repositories to search for the Maven coordinates given with --packages |
|
py-files | Optional | String | Specifies a comma-separated list of .zip , .egg , or .py files to place on the PYTHONPATH for Python apps |
|
runtime.spark_version | Optional | String | Specifies Spark runtime version to be used for the job. IBM Cloud Pak for Data supports Spark 3.3 and Spark 3.4. | |
volumes | Optional | list of objects | Specifies the volumes to be mounted other than the Spark instance volume. If volumes are added in the application payload, then the conf section in payload is mandatory. | |
name | Required | String | Specifies the name of the volume | |
source_sub_path | Optional | String | Specifies the source path in the volume to be mounted. Source path MUST be a relative path. | |
mount_path | Required | String | Specifies the location where the volume is to be mounted. Note that there are a few prohibited mount paths, which you will be restricted from using when you try to enter them as these can compromise the runtime. |
Response codes
The Spark jobs API returns the following response codes:
Return code | Meaning of the return code | Description |
---|---|---|
202 | Job accepted | The Spark job is successfully validated and accepted for submitting the application. |
400 | Bad request | This is returned when the payload is incorrect, for example, if the payload format is incorrect or arguments are missing. |
404 | Not Found | This is returned when the Spark application is submitted for instance ID that does not exist. |
500 | Internal server error | This is returned when the server isn’t responding to what you’re asking it to do. Try submitting your job again. |
503 | Service unavailable | This is returned when there are insufficient resources. Possible response: Could not complete the request. Reason - FailedScheduling. |
Default Spark configuration parameters and environment variables
The following tables show the Spark configuration parameters and environment variables that are commonly used in Analytics Engine powered by Apache Spark and their default values.
The following table lists the Spark configuration parameters and their defaults:
Spark configuration | Default value |
---|---|
spark.eventLog.enabled |
TRUE |
spark.executor.extraClassPath |
/home/spark/space/assets/data_asset/*:/home/spark/user_home/dbdrivers/*:/cc-home/_global_/dbdrivers/*:/home/spark/shared/user-libs/spark2/*:/home/spark/user_home/dbdrivers/*:/home/spark/shared/user-libs/common/*:/home/spark/shared/user-libs/connectors/*:/opt/ibm/connectors/parquet-encryption/*:/opt/ibm/third-party/libs/spark2/*:/opt/ibm/third-party/libs/common/*:/opt/ibm/third-party/libs/connectors/*:/opt/ibm/spark/external-jars/* |
spark.executer.memory |
1 G |
spark.executer.cores |
1 |
(custom) ae.spark.executor.count |
1 |
(custom) ae.spark.application.priority |
1 |
spark.driver.extraClassPath |
/home/spark/space/assets/data_asset/*:/home/spark/user_home/dbdrivers/*:/cc-home/_global_/dbdrivers/*:/home/spark/shared/user-libs/spark2/*:/home/spark/user_home/dbdrivers/*:/home/spark/shared/user-libs/common/*:/home/spark/shared/user-libs/connectors/*:/opt/ibm/connectors/parquet-encryption/*:/opt/ibm/third-party/libs/spark2/*:/opt/ibm/third-party/libs/common/*:/opt/ibm/third-party/libs/connectors/*:/opt/ibm/spark/external-jars/* |
spark.driver.memory |
1024 M |
spark.driver.cores |
1 |
spark.local.dir |
/tmp/spark/scratch See spark.local.dir configuration parameter for details. |
spark.master.ui.port |
8080 |
spark.worker.ui.port |
8081 |
spark.ui.port |
4040 |
spark.history.ui.port |
18080 |
spark.ui.enabled |
TRUE |
spark.ui.killEnabled |
FALSE |
spark.eventLog.dir |
file:///home/spark/spark-events |
spark.ui.reverseProxy |
TRUE |
spark.ui.showConsoleProgress |
TRUE |
spark.shuffle.service.port |
7337 |
spark.r.command |
/opt/ibm/conda/R/bin/Rscript |
spark.hadoop.fs.s3a.fast.upload |
TRUE |
spark.hadoop.fs.s3a.multipart.size |
33554432 |
spark.hadoop.fs.stocator.scheme.list |
cos |
spark.hadoop.fs.stocator.cos.scheme |
cos |
spark.hadoop.fs.stocator.glob.bracket.support |
TRUE |
spark.hadoop.fs.stocator.cos.impl |
com.ibm.stocator.fs.cos.COSAPIClient |
spark.hadoop.fs.cos.impl |
com.ibm.stocator.fs.ObjectStoreFileSystem |
spark.hadoop.fs.s3a.impl |
org.apache.hadoop.fs.s3a.S3AFileSystem |
spark.authenticate |
FALSE |
spark.network.crypto.enabled |
FALSE |
spark.network.crypto.keyLength |
256 |
The following table lists the environment variables and their defaults:
Environment variable | Default value |
---|---|
SPARK_DIST_CLASSPATH | /home/spark/space/assets/data_asset/*:/home/spark/user_home/dbdrivers/*:/cc-home/_global_/dbdrivers/*:/opt/ibm/connectors/idax/*:/opt/ibm/connectors/cloudant/*:/opt/ibm/connectors/db2/*:/opt/ibm/connectors/others-db-drivers/*:/opt/ibm/connectors/wdp-connector-driver/*:/opt/ibm/connectors/wdp-connector-jdbc-library/*:/opt/ibm/connectors/stocator/*:/opt/ibm/connectors/s3/*:/opt/ibm/image-libs/common/*:/opt/ibm/image-libs/spark2/*:/opt/ibm/third-party/libs/batch/*:/opt/ibm/spark/external-jars/* |
SPARK_LOCAL_DIRS | /tmp/spark/scratch |
SPARK_MASTER_WEBUI_PORT | 8080 |
SPARK_MASTER_PORT | 7077 |
SPARK_WORKER_WEBUI_PORT | 8081 |
CLASSPATH | /home/spark/user_home/dbdrivers/*:/opt/ibm/connectors/idax/*:/opt/ibm/connectors/cloudant/*:/opt/ibm/connectors/db2/*:/opt/ibm/connectors/others-db-drivers/*:/opt/ibm/connectors/wdp-connector-driver/*:/opt/ibm/connectors/wdp-connector-jdbc-library/*:/opt/ibm/connectors/stocator/*:/opt/ibm/connectors/s3/*:/opt/ibm/image-libs/common/*:/opt/ibm/image-libs/spark2/*:/opt/ibm/third-party/libs/batch/* |
LD_LIBRARY_PATH | /opt/ibm/connectors/dsdriver/dsdriver/lib:/opt/ibm/connectors/others-db-drivers/oracle/lib:/opt/ibm/jdk/jre/lib/architecture/server:/opt/ibm/jdk/jre/lib/architecture/:/usr/local/lib:/lib64 |
RUNTIME_PYTHON_ENV | python310 |
PYTHONPATH | /home/spark/space/assets/data_asset:/home/spark/user_home/python-3:/cc-home/_global_/python-3:/home/spark/shared/user-libs/python:/home/spark/shared/conda/envs/python/lib/python/site-packages:/opt/ibm/conda/miniconda/lib/python/site-packages:/opt/ibm/third-party/libs/python3:/opt/ibm/image-libs/python3:/opt/ibm/image-libs/spark2/xskipper-core.jar:/opt/ibm/image-libs/spark2/spark-extensions.jar:/opt/ibm/image-libs/spark2/metaindexmanager.jar:/opt/ibm/image-libs/spark2/stmetaindexplugin.jar:/opt/ibm/spark/python:/opt/ibm/spark/python/lib/py4j-0.10.7-src.zip |
R_LIBS_USER | /home/spark/space/assets/data_asset:/home/spark/shared/user-libs/R:/opt/ibm/third-party/libs/R:/opt/ibm/conda/R/lib64/R/library/:/opt/ibm/spark/R/lib:/opt/ibm/image-libs/R |
Parent topic: Submitting Spark jobs