Managing dependencies

An application might require auxiliary libraries or files called dependencies. These can be made available to an application in several ways.

Dependencies for R applications

Application code written in R can use a source statement to refer to an external R file that contains additional functions. The external file must be located in the $HOME/spark/apps directory or in one of its subdirectories. For example, the following source statement refers to the file $HOME/spark/apps/example_utilities.R:
source(paste(Sys.getenv("HOME"),"/spark/apps","/example_utilities.R", sep = ""),encoding = "UTF-8",local = TRUE)

In addition, an R script can use libraries from a .jar file in the $HOME/spark/defaultlibs or /globallibs directory.

General dependencies for Scala, Java™, and Python applications

At runtime, a Spark application looks in the following directories for any dependencies:
$HOME/spark/defaultlibs
/globallibs
(For information about how to deploy files to these directories, see Deploying Spark application code and dependencies to Db2 Warehouse.) The files in these directories are available to the Spark driver and executor processes:
  • The .jar files in these directories are included in the Java class path of the Spark driver and executor processes.
  • The .py, .zip, and .egg files in these directories are included to the PySpark search path.

In addition to the libraries contained in these directories, application code that is written in Python can also use libraries that are provided by any Python packages that are installed (see Installing Python packages on Db2 Warehouse).

Segregated dependencies for Scala, Java, and Python applications

Sometimes using general dependencies is not an option, for example when two applications require different versions of the same library. If you need to segregate dependencies, put them into different subdirectories of $HOME/spark/apps and refer to their paths relative to $HOME/spark/apps when calling each application. How you do this depends on which interface you use to launch the application:

Using spark-submit.sh
Specify dependencies in a comma-separated list as the value of one of the following options:
  • For an application written in Java or Scala, --jars. For example:
    spark-submit.sh mySparkApplication.jar --class mycorp.myApplicationMainClass --jars dir1/myJavaDep1.jar,dir1/myJavaDep2.jar
  • For an application written in Python, --py-files. For example:
    spark-submit.sh myApplication.py --py-files dir5/myPythonDep1.py,dir5/myPythonDep2.py
Using the IDAX.APP_SUBMIT stored procedure
Specify dependencies in a comma-separated list as the value of one of the following parameters:
  • For an application written in Java or Scala, sparkJars. For example:
    {..., 
    "sparkProperties" : { "sparkJars" : "dir1/myJavaDep1.jar,dir1/myJavaDep2.jar"} 
    }
  • For an application written in Python, sparkSubmitPyFiles. For example:
    {..., 
    "sparkProperties" : { "sparkSubmitPyFiles" : "dir5/myPythonDep1.py,dir5/myPythonDep2.py"} 
    }
Using a REST API call
Specify dependencies in a comma-separated list as the value of one of the following options:
  • For an application written in Java or Scala, sparkJars. For example:
    curl --user "userid:password" 
      -X POST "https://hostname:8443/dashdb-api/analytics/public/apps/submit" 
      --header "Content-Type:application/json;charset=UTF-8" 
      -d "{"appResource":"mySparkApplication.jar","mainClass":"mycorp.myApplicationMainClass",
         "sparkProperties" : {"sparkJars" :"dir1/myJavaDep1.jar,dir1/myJavaDep2.jar"}}"
  • For an application written in Python, sparkSubmitPyFiles. For example:
    curl --user "userid:password" 
     -X POST "https://hostname:8443/dashdb-api/analytics/public/apps/submit" 
     --header "Content-Type:application/json;charset=UTF-8" 
    -d "{"appResource":"myApplication.py", "sparkProperties" : {"sparkSubmitPyFiles" : "dir5/myPythonDep1.py,dir5/myPythonDep2.py"}}"