Testing a Spark application

Test your Apache Spark application code by launching the Spark applications that it contains. Launching a Spark application creates a Spark cluster for the user (if one does not already exist) and runs the application in that cluster.

Note:
  • To reduce the chance of problems due to insufficient memory, the maximum number of Spark applications that can run concurrently is limited:
    • If your system has less than 120 GB of memory, at most 3 applications can run concurrently.
    • If your system has 120 GB of memory or more, at most 5 applications can run concurrently.
  • To be able to run application code written in R, Db2® Warehouse requires the RJSONIO package. If this package has not already been installed in your R environment, ask your Db2 Warehouse administrator to issue the following command from within the interactive R shell:
    install.packages('RJSONIO')

    If your application has dependencies to auxiliary libraries, you will need to satisfy them as described in Managing dependencies.

Testing Spark applications using the spark-submit.sh script

How you launch an application using the spark-submit.sh script depends on the location of the application code:
  • If the file is located on the Db2 Warehouse host system, specify the --loc host option (or don't, because it's the default). Any path information specified in the file name indicates the path, relative to the $HOME/spark/apps directory, to the file. For example, this command submits code from the file $HOME/spark/apps/subdir6/cool.jar in the host file system:
    ./spark-submit.sh --class c.myclass subdir6/cool.jar --loc host
  • If the file is located on your client system, specify the --loc client option. Any path information specified in the file name indicates the path, relative to the current directory, to the file. The file is automatically deployed to the $HOME/spark/apps/temp directory before being submitted, and any file with the same name that is already in that directory is overwritten. For example, this command submits code from the file ./subdir6/cool.jar in the client file system:
    ./spark-submit.sh --class c.myclass subdir6/cool.jar --loc client
Db2 Warehouse provides a sample application that demonstrates how the IBM® Idax Data Source can be used to read from a Db2 Warehouse table. Loading the sample Spark application code describes how to load the corresponding application code into your $HOME/spark/apps directory.
  • The version of this application that is written in Scala has the main class com.ibm.idax.spark.examples.ReadExample and is contained in the file idax_examples.jar. To launch this Spark application using spark-submit.sh, issue the following command:
    spark-submit.sh idax_examples.jar --class com.ibm.idax.spark.examples.ReadExample
  • The version of this application that is written in Python is contained in the ReadExample.py file. This file requires additional utilities, which are contained in the example_utilities.egg file. To launch this Spark application using spark-submit.sh, issue the following command:
    spark-submit.sh ReadExample.py --py-files example_utilities.egg
  • The version of this application that is written in R is contained in the ReadExample.R file. This file requires additional utilities, which are contained in the example_utilities.R file. To launch this Spark application using spark-submit.sh, issue the following command:
    spark-submit.sh ReadExample.R

After you launch an application, note the submission ID that is returned, because you will need it later to locate the corresponding log files.

Testing Spark applications using the IDAX.APP_SUBMIT stored procedure

From within a database connection, issue a CALL statement that calls the IDAX.SPARK_SUBMIT stored procedure.

Testing Spark applications using a REST API call

You can launch a Spark application by using the IBM Db2 Warehouse Analytics API to submit an HTTP POST request that calls the /dashdb-api/analytics/public/apps/submit endpoint. In the request body:
  • The appResource parameter specifies the .jar, .py, or .R file that contains the application code.
  • The mainClass parameter describes the name of the main class in the .jar file when submitting Java or Scala application code.
  • If necessary, use the args parameter to pass arguments to the application.
For example, the sample application com.ibm.idax.spark.examples.ReadExample is contained in the idax_examples.jar file and demonstrates how the IBM Idax Data Source can be used to read from a table. To launch this application, issue the following cURL command (replace the user ID, password, and host name):
curl --user "userid:password" 
  -X POST "https://hostname:8443/dashdb-api/analytics/public/apps/submit" 
  --header "Content-Type:application/json;charset=UTF-8" 
  -d '{"appResource":"idax_examples.jar","mainClass":"com.ibm.idax.spark.examples.ReadExample"}'
Note: On a Windows system, you must escape the double quotes in the data string, for example:
-d "{\"appResource\":\"idax_examples.jar\",\"mainClass\":\"com.ibm.idax.spark.examples.ReadExample\"}"
The result returned by this request will look something like this:
{"statusDesc":"The application was submitted.","submissionId":"20160928114702066000","exitInfo":{"code":"","details":[],"message":""},"resultCode":200,"applicationId":"app-20160928114707-0011","username":"user1","status":"submitted"}
Note the submission ID, because you will need it later to locate the corresponding log files.

Testing Spark applications using a Livy server

You can use a Livy server to launch a Spark application as described in Launching a Spark application through an Apache Livy server.