ELT run mode in DataStage
Use DataStage in the Extract, Load, and Transform (ELT) run mode to more efficiently run SQL queries in target databases.
- What is the ELT process?
- What is the ELT run mode in DataStage?
- Availability
- Supported connectors
- Supported stages
- Limitations
- Enabling ELT run mode
- Setting database connector permissions for ELT run mode
- Compiling with the dsjob command
- Example
What is the ELT process?
The primary process that DataStage® uses is Extract, Transform, and Load (ETL), in which data is read into memory, processed, then written to a target. In DataStage, all jobs run in ETL mode by default.
The Extract, Load, and Transform (ELT) process transfers data from a source to a target database and then prepares the information in the database itself. In certain situations, it is advantageous to use the ELT process to transform the data in the target database. An example might be when you need to transform large data sets that are already in specific data sources or targets.
What is the ELT run mode in DataStage?
- No ELT mode
- When the analysis determines that the DataStage flow cannot be converted to SQL, ETL mode is used, and DataStage compiles the flow with the PX runtime engine.
- ELT mode
- When the analysis determines that the DataStage flow can be converted to SQL, ELT mode is used, and DataStage compiles the flow to SQL.
- Mixed ETL and ELT mode
- When the analysis determines that the DataStage flow can be only partially converted to SQL, both ETL and ELT modes are used as needed.
Availability
ELT run mode is available in both DataStage Enterprise and DataStage Enterprise Plus.
Supported connectors
- Amazon RDS for PostgreSQL
- IBM Cloud® Databases for PostgreSQL
- Google BigQuery
- IBM Db2®
- IBM® Db2 Warehouse
- IBM Db2 on Cloud
- Oracle
- PostgreSQL
- Snowflake
Supported stages
Stage | Mode | Limitations |
---|---|---|
Aggregator | Mixed |
|
Google BigQuery | ELT | Only supports one authentication method: Account key (full JSON snippet) |
Copy | ELT |
|
Filter | Mixed |
|
Funnel | ELT |
|
Join | ELT |
|
Lookup | Mixed |
|
Remove Duplicates | Mixed |
|
Sort | Mixed |
|
Limitations
- Flows with runtime column propagation enabled are not supported.
Enabling ELT run mode
- Open a DataStage flow.
- On the toolbar, click the Settings icon .
- Click the Run tab.
- Click Extract, load, transform (ELT). Then, click Save.
Setting database connector permissions for ELT run mode
- Create views from select statements
- Drop views
- Create tables from select statements
- Drop tables
- Alter tables for adding null constraints and primary keys
If a connector is configured with custom SQL statements, then more permissions are needed for the SQL operations that are used in the custom SQL statements.
The permissions must be set within the corresponding database by a database administrator, which is outside of the scope of DataStage and Cloud Pak for Data.
Compiling with the dsjob command
cpdctl dsjob compile --project <project name> --enable-elt-mode
Example
- A PostgreSQL data source
- A Sort stage
- A Filter stage
- A PostgreSQL data target
You enable ELT mode by opening the run settings and selecting it.
After you compile the job successfully, the Compile log has a single message of "full pushdown". This message indicates that the entire flow was compiled in ELT mode.
You run the job. During the job run, the PostgreSQL source data is converted by using SQL statements that apply the sorting and filtering. The result is persisted as a table that is defined in the PostgreSQL target connector.
The job run finishes and the target database now holds all the transformed data.