Testing a Spark application
Test your Apache Spark application code by launching the Spark applications that it contains. Launching a Spark application creates a Spark cluster for the user (if one does not already exist) and runs the application in that cluster.
Note:
- To reduce the chance of problems due to insufficient memory, the maximum number of Spark
applications that can run concurrently is limited:
- If your system has less than 120 GB of memory, at most 3 applications can run concurrently.
- If your system has 120 GB of memory or more, at most 5 applications can run concurrently.
- To be able to run application code written in R, Db2® Warehouse requires the RJSONIO package. If this package has
not already been installed in your R environment, ask your Db2 Warehouse administrator to issue the following command from
within the interactive R shell:
install.packages('RJSONIO')
If your application has dependencies to auxiliary libraries, you will need to satisfy them as described in Managing dependencies.
Testing Spark applications using the spark-submit.sh script
How you launch an application using the spark-submit.sh script depends on
the location of the application code:
- If the file is located on the Db2 Warehouse host system,
specify the
--loc host
option (or don't, because it's the default). Any path information specified in the file name indicates the path, relative to the $HOME/spark/apps directory, to the file. For example, this command submits code from the file $HOME/spark/apps/subdir6/cool.jar in the host file system:./spark-submit.sh --class c.myclass subdir6/cool.jar --loc host
- If the file is located on your client system, specify the
--loc client
option. Any path information specified in the file name indicates the path, relative to the current directory, to the file. The file is automatically deployed to the $HOME/spark/apps/temp directory before being submitted, and any file with the same name that is already in that directory is overwritten. For example, this command submits code from the file ./subdir6/cool.jar in the client file system:./spark-submit.sh --class c.myclass subdir6/cool.jar --loc client
Db2 Warehouse provides a sample application that
demonstrates how the IBM® Idax Data Source can be used to read
from a Db2 Warehouse table. Loading the sample Spark application code describes how to load the corresponding application code into your
$HOME/spark/apps directory.
- The version of this application that is written in Scala has the main class
com.ibm.idax.spark.examples.ReadExample
and is contained in the file idax_examples.jar. To launch this Spark application using spark-submit.sh, issue the following command:spark-submit.sh idax_examples.jar --class com.ibm.idax.spark.examples.ReadExample
- The version of this application that is written in Python is contained in the
ReadExample.py file. This file requires additional utilities, which are
contained in the example_utilities.egg file. To launch this Spark application
using spark-submit.sh, issue the following command:
spark-submit.sh ReadExample.py --py-files example_utilities.egg
- The version of this application that is written in R is contained in the
ReadExample.R file. This file requires additional utilities, which are
contained in the example_utilities.R file. To launch this Spark application
using spark-submit.sh, issue the following command:
spark-submit.sh ReadExample.R
After you launch an application, note the submission ID that is returned, because you will need it later to locate the corresponding log files.
Testing Spark applications using the IDAX.APP_SUBMIT stored procedure
From within a database connection, issue a CALL statement that calls the IDAX.SPARK_SUBMIT stored procedure.
Testing Spark applications using a REST API call
You can launch a Spark application by using the IBM Db2 Warehouse Analytics API to submit an
HTTP POST request that calls the /dashdb-api/analytics/public/apps/submit
endpoint. In the request body:
- The appResource parameter specifies the .jar, .py, or .R file that contains the application code.
- The mainClass parameter describes the name of the main class in the .jar file when submitting Java or Scala application code.
- If necessary, use the args parameter to pass arguments to the application.
For example, the sample application
com.ibm.idax.spark.examples.ReadExample
is
contained in the idax_examples.jar file and demonstrates how the IBM Idax Data Source can be used to read from a table. To launch
this application, issue the following cURL command (replace the user ID, password, and host
name):curl --user "userid:password"
-X POST "https://hostname:8443/dashdb-api/analytics/public/apps/submit"
--header "Content-Type:application/json;charset=UTF-8"
-d '{"appResource":"idax_examples.jar","mainClass":"com.ibm.idax.spark.examples.ReadExample"}'
Note: On
a Windows system, you must escape the double quotes in the
data string, for
example:
-d "{\"appResource\":\"idax_examples.jar\",\"mainClass\":\"com.ibm.idax.spark.examples.ReadExample\"}"
The result returned by this request will look something like
this:
{"statusDesc":"The application was submitted.","submissionId":"20160928114702066000","exitInfo":{"code":"","details":[],"message":""},"resultCode":200,"applicationId":"app-20160928114707-0011","username":"user1","status":"submitted"}
Note
the submission ID, because you will need it later to locate the corresponding log files.Testing Spark applications using a Livy server
You can use a Livy server to launch a Spark application as described in Launching a Spark application through an Apache Livy server.