Analyzing with Spark
Db2® Warehouse includes an integrated Apache Spark cluster environment that is optimized for use with Db2 Warehouse. You can run Spark applications in this environment to analyze data contained in the database and to write results to the database.
Use one of the following methods to deploy, launch, and manage Spark applications. To launch, test, or monitor Spark applications, you can also use the
automatically installed and configured Apache Livy server as described in Launching a
Spark application through an Apache Livy server.
- Use SQL to call stored procedures such as IDAX.SPARK_SUBMIT.
- Use the spark-submit.sh script to issue commands.
- Use a REST API call to submit a request to one of several different REST endpoints. These endpoints are described here:
You can use cURL (or another REST command-line tool or client) to issue commands directly via the
REST API. cURL is an open source REST command-line tool that can be downloaded from the internet at
no cost. The examples shown in the topics that describe how to use the REST API to launch and manage
Spark applications use cURL syntax.
Note: By default, Db2 Warehouse is set up to use a self-signed SSL
certificate for secure connections. Your version of cURL might reject this certificate if cURL
cannot validate it. If cURL rejects the certificate, specify the
-k
option when
calling cURL to disable peer verification. For more information, see https://curl.haxx.se/docs/sslcerts.html.