Deploying Spark application code and dependencies to Db2 Warehouse

If you want your application code to be stored on the Db2 Warehouse host system instead of on your client system, you can deploy the application code to the $HOME/spark/apps directory. You can also deploy, to one of the directories in the Spark search path, auxiliary libraries or files (called dependencies) that the application requires and that are not otherwise available.

For example, to deploy the Spark application code contained in the file loc_apps/my_app7.jar, issue the following command:

spark-submit.sh --upload-file apps loc_apps/my_app7.jar

This creates the $HOME/spark/apps directory (if it does not already exist) and copies the file my_app7.jar into it.

If your Spark application needs access to dependencies, you or an administrator can deploy them by uploading a compressed file that contains them to one of the following directories:

  • If the file is for use by your applications only, upload it to your $HOME/spark/defaultlibs directory:
    spark-submit.sh --upload-file defaultlibs file_name
  • If the file is to be shared among several Db2 Warehouse users, an administrator can upload it to the /globallibs directory:
    spark-submit.sh --upload-file globallibs file_name

The extension of a compressed file containing dependencies must be:

.jar
For Java dependencies, which can be used by Java, Scala, and Python applications.
.py, .zip, or .egg
For Python dependencies, which can be used by Python applications only.

Alternatively, you can make dependencies available using one of the other methods described in Managing dependencies.

Using a REST API call

To deploy Spark application code, use the IBM® Db2 Warehouse API to submit an HTTP POST request that calls the /dashdb-api/home endpoint. For example, to deploy the file my_examples.jar from the /tmp directory on your client system to Db2 Warehouse, issue the following cURL command (replace the user ID, password, and host name):

curl --user "userid:password" 
  -X POST -H "Content-Type: multipart/form-data" 
  -F "data=@/tmp/my_examples.jar" 
  "https://hostname:8443/dashdb-api/home/spark/apps"

This creates a /spark/apps directory (if it does not already exist) in your Db2 Warehouse home directory, and copies my_examples.jar into that directory.

To deploy several files with a single request, specify a separate -F parameter for each file. For example:

-F "data1=@/tmp/my_examples1.jar" -F "data2=@/tmp/my_examples2.jar"

To deploy dependencies to your $HOME/spark/defaultlibs directory, submit an HTTP POST request that calls the /home/spark/defaultlibs endpoint. For example, to deploy the file /tmp/extra1.jar to this directory, issue the following cURL command (replace the user ID and password):

curl --user "userid:password" 
  -X POST -H "Content-Type: multipart/form-data" 
  -F "data=@/tmp/extra1.jar" 
  "https://hostname:8443/dashdb-api/home/spark/defaultlibs"