Testing a Spark application

Test your Apache Spark application code by launching the Spark applications that it contains. Launching a Spark application creates a Spark cluster for the user (if one does not already exist) and runs the application in that cluster.

Note:

To reduce the chance of problems due to insufficient memory, the maximum number of Spark applications that can run concurrently is limited:
- If your system has less than 120 GB of memory, at most 3 applications can run concurrently.
- If your system has 120 GB of memory or more, at most 5 applications can run concurrently.
To be able to run application code written in R, Db2® Warehouse requires the RJSONIO package. If this package has not already been installed in your R environment, ask your Db2 Warehouse administrator to issue the following command from within the interactive R shell:
```
install.packages('RJSONIO')
```
If your application has dependencies to auxiliary libraries, you will need to satisfy them as described in Managing dependencies.

Testing Spark applications using the spark-submit.sh script

How you launch an application using the spark-submit.sh script depends on the location of the application code:

If the file is located on the Db2 Warehouse host system, specify the --loc host option (or don't, because it's the default). Any path information specified in the file name indicates the path, relative to the $HOME/spark/apps directory, to the file. For example, this command submits code from the file $HOME/spark/apps/subdir6/cool.jar in the host file system:
```
./spark-submit.sh --class c.myclass subdir6/cool.jar --loc host
```
If the file is located on your client system, specify the --loc client option. Any path information specified in the file name indicates the path, relative to the current directory, to the file. The file is automatically deployed to the $HOME/spark/apps/temp directory before being submitted, and any file with the same name that is already in that directory is overwritten. For example, this command submits code from the file ./subdir6/cool.jar in the client file system:
```
./spark-submit.sh --class c.myclass subdir6/cool.jar --loc client
```

Db2 Warehouse provides a sample application that demonstrates how the IBM® Idax Data Source can be used to read from a Db2 Warehouse table. Loading the sample Spark application code describes how to load the corresponding application code into your $HOME/spark/apps directory.

The version of this application that is written in Scala has the main class com.ibm.idax.spark.examples.ReadExample and is contained in the file idax_examples.jar. To launch this Spark application using spark-submit.sh, issue the following command:
```
spark-submit.sh idax_examples.jar --class com.ibm.idax.spark.examples.ReadExample
```
The version of this application that is written in Python is contained in the ReadExample.py file. This file requires additional utilities, which are contained in the example_utilities.egg file. To launch this Spark application using spark-submit.sh, issue the following command:
```
spark-submit.sh ReadExample.py --py-files example_utilities.egg
```
The version of this application that is written in R is contained in the ReadExample.R file. This file requires additional utilities, which are contained in the example_utilities.R file. To launch this Spark application using spark-submit.sh, issue the following command:
```
spark-submit.sh ReadExample.R
```

After you launch an application, note the submission ID that is returned, because you will need it later to locate the corresponding log files.

Testing Spark applications using the IDAX.APP_SUBMIT stored procedure

From within a database connection, issue a CALL statement that calls the IDAX.SPARK_SUBMIT stored procedure.

Testing Spark applications using a REST API call

You can launch a Spark application by using the IBM Db2 Warehouse Analytics API to submit an HTTP POST request that calls the /dashdb-api/analytics/public/apps/submit endpoint. In the request body:

The appResource parameter specifies the .jar, .py, or .R file that contains the application code.
The mainClass parameter describes the name of the main class in the .jar file when submitting Java or Scala application code.
If necessary, use the args parameter to pass arguments to the application.

For example, the sample application com.ibm.idax.spark.examples.ReadExample is contained in the idax_examples.jar file and demonstrates how the IBM Idax Data Source can be used to read from a table. To launch this application, issue the following cURL command (replace the user ID, password, and host name):

curl --user "userid:password" 
  -X POST "https://hostname:8443/dashdb-api/analytics/public/apps/submit" 
  --header "Content-Type:application/json;charset=UTF-8" 
  -d '{"appResource":"idax_examples.jar","mainClass":"com.ibm.idax.spark.examples.ReadExample"}'

Note: On a Windows system, you must escape the double quotes in the data string, for example:

-d "{\"appResource\":\"idax_examples.jar\",\"mainClass\":\"com.ibm.idax.spark.examples.ReadExample\"}"

The result returned by this request will look something like this:

{"statusDesc":"The application was submitted.","submissionId":"20160928114702066000","exitInfo":{"code":"","details":[],"message":""},"resultCode":200,"applicationId":"app-20160928114707-0011","username":"user1","status":"submitted"}

Note the submission ID, because you will need it later to locate the corresponding log files.

Testing Spark applications using a Livy server

You can use a Livy server to launch a Spark application as described in Launching a Spark application through an Apache Livy server.