Creating a Spark data source

You can create Spark data sources to reliably manage large volumes of structured and unstructured data.

Before you begin

  • Make sure that you have defined the Hive driver library JAR files so that QMF can connect to Spark data sources. Use the Hive driver library JAR files of version 1.2.1 or later.
  • Ensure that the Spark Thrift server is up and running.

About this task

Spark data source is a data warehouse infrastructure that provides data summarization and ad hoc querying. Spark data sources are accessed using special drivers for JDBC. The current JDBC interface for Spark only supports running queries and fetching results. For steps to add drivers and connecting as a relational data source, see Creating the JDBC driver configuration file.

To create a Spark data source:


  1. Select File > New > Other > Repository > Spark Data Source.
    The Create New Spark Data Source wizard opens.
  2. On the page of availability type selection, select Personal or Shared, and then click Next.
    Note: For more information about types of data source availability, see Data sources.
  3. Type the data source name in the Data Source Name field.
  4. Specify the necessary parameters in the Connection Parameters area. You must set the Host name, Port number, and Database name.
  5. Click Set User Information button to specify the necessary user parameters.
  6. Click Advanced button to select the advanced parameters supported by the installed Hive driver.
  7. In the Description field, you can enter the description of the created data source.
  8. Click Finish to create the Spark data source and close the Create New Spark Data Source wizard.