JDBC
The JDBC destination writes data to a database table.
The destination can write each batch of data to a new or existing database table. If the specified table doesn't exist, the destination creates the new table in the database. If the specified table exists, the destination can either append data to the table or overwrite data in the table.
When you configure the JDBC destination, you specify the database connection information and any additional JDBC configuration properties you want to use. You specify the table to write to, the mode the destination uses to write to an existing table, and the number of partitions used to write to the database table. You can optionally configure advanced properties related to the JDBC driver.
You can also use a connection to configure the destination.
Before using the JDBC destination, verify if you need to install a JDBC driver.
Database Vendors and Drivers
The JDBC destination can write database data to multiple database vendors.
Database Vendor | Versions and Drivers |
---|---|
Microsoft SQL Server | SQL Server 2017 with the SQL Server JDBC 8.4.0 JRE8 driver |
MySQL | MySQL 5.7 with the MySQL Connector/J 8.0.12 driver |
Oracle | Oracle 11g with the Oracle 19.3.0 JDBC driver |
PostgreSQL | PostgreSQL 9.6.2 with the PostgreSQL 42.2.5 driver |
Installing the JDBC Driver
- Microsoft SQL Server
- PostgreSQL
When using the stage to connect to any other database, you must install the JDBC driver for the database. Install the driver as an external library for the JDBC stage library.
By default, Transformer bundles a JDBC driver into the launched Spark application so that the driver is available on each node in the cluster. If you prefer to manually install an appropriate JDBC driver on each Spark node, you can configure the stage to skip bundling the driver on the Advanced tab of the stage properties.
- Apache Derby
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Teradata
If you install a custom JDBC driver or a driver provided by another vendor, you must specify the JDBC driver class name on the Advanced tab of the stage.
Write Mode
When the JDBC destination is configured to write to a table that doesn't exist, the destination creates the table in the specified database. The table is created based on the first batch of data that the destination processes when you start the pipeline, and that first batch is inserted to the new table.
The write mode defines how the destination writes to an existing table, including the second batch to a table that the destination created.
- Append to table
- The destination appends rows to the table.
- Overwrite table
- The destination removes all rows in the table before writing new rows to the
table. When you select overwrite mode, you also configure how the
destination removes rows from the table:
- Truncates all rows in the table before the write
- Drops and recreates the table before the write.
- Write to a slowly changing dimension
- The destination inserts and updates rows provided by the Slowly Changing Dimension processor.
By default, the destination appends rows to existing tables.
Partitioning
Spark runs a Transformer pipeline just as it runs any other application, splitting the data into partitions and performing operations on the partitions in parallel.
When the pipeline starts processing a new batch, Spark determines how to split pipeline data into initial partitions based on the origins in the pipeline. Spark uses these partitions for the rest of the pipeline processing, unless a processor causes Spark to shuffle the data.
When writing to a database table, Spark creates one connection to the database for each partition. To change the number of partitions that write to the database table, configure the number of partitions for the JDBC destination.
- The size and configuration of the cluster.
- The amount of data being processed.
- The number of concurrent connections that can be made to the database.
If the pipeline fails because the JDBC destination encounters an out of memory error, you likely need to increase the number of partitions for the destination.
Configuring a JDBC Destination
Configure a JDBC destination to write data to a database table. Before using the destination, verify if you need to install a JDBC driver.