Examples of Spark applications
Db2® Warehouse provides examples of application code that illustrate how to develop Apache Spark applications.
Load the examples into your home directory as described in Loading the sample Spark application code.
Examples for Python, Scala, and R
Each of the following examples is available in a Python version, a Scala version, and an R version:
- ReadWriteExampleKMeansJson and ReadExampleJson
- These examples read data from and write data to JSON files, not Db2 Warehouse tables. A common way to develop applications is to start by creating code like this. Then, after that code has been tested using the JSON files, you can modify it so that it uses database tables instead (as in ReadWriteExampleKMeans and ReadExample).
- ReadWriteExampleKMeans and ReadExample
- These examples are based on ReadWriteExampleKMeansJson and ReadExampleJson, but have been
modified in the following ways so that they use Db2 Warehouse
tables instead of JSON files:
- Change #1
- The master was removed, because the application is assigned to your cluster automatically when it is submitted. If you prefer to set the application name explicitly, use the setAppName method.
- Change #2
- The data source was replaced so that the application reads data from a Db2 Warehouse table instead of from a JSON file.
- Change #3 (applies to ReadWriteExampleKMeans only)
- The data source was replaced so that the application writes data to a Db2 Warehouse table instead of to a JSON file.
- ExceptionExample
- This example shows how an application can throw an exception so that the corresponding error is
recorded in the file
$HOME/spark/log/submission_id/submission.info. - SqlPredicateExample
- This example shows how an application can push an SQL predicate down into the database. This improves performance, because only a subset of the data needs to be fetched.
Examples for R only
The following examples are available only in an R version:
- idaxTApply.R
- This example shows how to write an R function that applies a user-defined function to each subset (group of rows) of a distributed data frame (ida.data.frame).
- idaxTApplyExample.R
- This example shows how to connect to the database, read tables, open a SparkR session, and call the idaxTApply function.
Note: Both examples contain performance hints regarding the configuration of sparkR and idax data
source.