JDBC Query
The JDBC Query origin reads data from a database based on the specified query.
Use the JDBC Query origin in batch pipelines only. In a batch pipeline, the origin reads all available data with the specified query, then stops the pipeline. The origin does not support streaming execution mode and does not save offsets or support partitioning.
When you configure the JDBC Query origin, you specify the database connection information and any additional JDBC configuration properties you want to use. You can also use a connection to configure the origin.
You specify the SQL query to use for the read. You can specify a separate query to use when previewing data for pipeline development and testing. You can define a fetch size that is used by both queries.
You can configure the origin to cache the data for reuse throughout the pipeline run. You can also specify the JDBC driver to include with the pipeline.
Before using the JDBC Query origin, verify if you need to install a JDBC driver.
Database Vendors and Drivers
The JDBC Query origin can read database data from multiple database vendors.
Database Vendor | Versions and Drivers |
---|---|
Microsoft SQL Server | SQL Server 2017 with the SQL Server JDBC 8.4.0 JRE8 driver |
MySQL | MySQL 5.7 with the MySQL Connector/J 8.0.12 driver |
Oracle | Oracle 11g with the Oracle 19.3.0 JDBC driver |
PostgreSQL | PostgreSQL 9.6.2 with the PostgreSQL 42.2.5 driver |
Installing the JDBC Driver
- Microsoft SQL Server
- PostgreSQL
When using the stage to connect to any other database, you must install the JDBC driver for the database. Install the driver as an external library for the JDBC stage library.
By default, Transformer bundles a JDBC driver into the launched Spark application so that the driver is available on each node in the cluster. If you prefer to manually install an appropriate JDBC driver on each Spark node, you can configure the stage to skip bundling the driver on the Advanced tab of the stage properties.
- Apache Derby
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Teradata
If you install a custom JDBC driver or a driver provided by another vendor, you must specify the JDBC driver class name on the Advanced tab of the stage.
SQL Queries
- Pipeline run query
- The pipeline run query defines the data that is read from the database and passed to the pipeline for processing.
- Preview query
- The preview query defines the data that is read from the database when you preview a pipeline. Use the preview query to provide a smaller data set for pipeline development and testing.
Configuring a JDBC Query Origin
Configure a JDBC Query origin to read data from a database table. Before using the origin, verify if you need to install a JDBC driver.