Managing dependencies
An application might require auxiliary libraries or files called dependencies. These can be made available to an application in several ways.
Dependencies for R applications
source(paste(Sys.getenv("HOME"),"/spark/apps","/example_utilities.R", sep = ""),encoding = "UTF-8",local = TRUE)
In addition, an R script can use libraries from a .jar file in the $HOME/spark/defaultlibs or /globallibs directory.
General dependencies for Scala, Java™, and Python applications
$HOME/spark/defaultlibs
/globallibs
(For information about how to
deploy files to these directories, see Deploying Spark application code and dependencies to Db2 Warehouse.) The files
in these directories are available to the Spark driver and executor processes:- The .jar files in these directories are included in the Java class path of the Spark driver and executor processes.
- The .py, .zip, and .egg files in these directories are included to the PySpark search path.
In addition to the libraries contained in these directories, application code that is written in Python can also use libraries that are provided by any Python packages that are installed (see Installing Python packages on Db2 Warehouse).
Segregated dependencies for Scala, Java, and Python applications
Sometimes using general dependencies is not an option, for example when two applications require different versions of the same library. If you need to segregate dependencies, put them into different subdirectories of $HOME/spark/apps and refer to their paths relative to $HOME/spark/apps when calling each application. How you do this depends on which interface you use to launch the application:
- Using spark-submit.sh
- Specify dependencies in a comma-separated list as the value of one of the following options:
- For an application written in Java or Scala,
--jars
. For example:spark-submit.sh mySparkApplication.jar --class mycorp.myApplicationMainClass --jars dir1/myJavaDep1.jar,dir1/myJavaDep2.jar
- For an application written in Python,
--py-files
. For example:spark-submit.sh myApplication.py --py-files dir5/myPythonDep1.py,dir5/myPythonDep2.py
- For an application written in Java or Scala,
- Using the IDAX.APP_SUBMIT stored procedure
- Specify dependencies in a comma-separated list as the value of one of the following parameters:
- For an application written in Java or Scala,
sparkJars
. For example:{..., "sparkProperties" : { "sparkJars" : "dir1/myJavaDep1.jar,dir1/myJavaDep2.jar"} }
- For an application written in Python,
sparkSubmitPyFiles
. For example:{..., "sparkProperties" : { "sparkSubmitPyFiles" : "dir5/myPythonDep1.py,dir5/myPythonDep2.py"} }
- For an application written in Java or Scala,
- Using a REST API call
- Specify dependencies in a comma-separated list as the value of one of the following options:
- For an application written in Java or Scala,
sparkJars
. For example:curl --user "userid:password" -X POST "https://hostname:8443/dashdb-api/analytics/public/apps/submit" --header "Content-Type:application/json;charset=UTF-8" -d "{"appResource":"mySparkApplication.jar","mainClass":"mycorp.myApplicationMainClass", "sparkProperties" : {"sparkJars" :"dir1/myJavaDep1.jar,dir1/myJavaDep2.jar"}}"
- For an application written in Python,
sparkSubmitPyFiles
. For example:curl --user "userid:password" -X POST "https://hostname:8443/dashdb-api/analytics/public/apps/submit" --header "Content-Type:application/json;charset=UTF-8" -d "{"appResource":"myApplication.py", "sparkProperties" : {"sparkSubmitPyFiles" : "dir5/myPythonDep1.py,dir5/myPythonDep2.py"}}"
- For an application written in Java or Scala,