Simplifying and Accelerating New Database Applications with Db2 11.5 and External Tables
5 min read
By: Phil Downey
The new Db2 11.5 has introduced External Table capabilities that will change the way Db2 developers and users work with data external to the database.
The External Table capabilities will simplify existing applications and provide new ways for externally stored data to be accessed by applications without the overhead of having to load all data into the database.
What are External Tables?
External Tables are files that are external to the database (but accessible by the database server), and they are mapped and accessed as if they were a standard table. Db2 External Tables currently support files of comma separated (CSV) or fixed file length formats that are stored on either a file system or supported object storage (IBM Cloud Object Storage, S3, AWS) that is directly referenceable by the database server.
Db2 External Tables are accessed more efficiently than standard federation or existing data movement techniques, providing greater performance benefits when working with external data.
For data partitioned environments using Db2 DPF, you can configure External Tables to observe data partitions, giving the user the option to export or consume data at a partition or database level, which makes it worth considering for both ETL and archiving of data operations. As they are entirely implemented through SQL, they can be easily integrated into applications via SQL or into ETL tools.
What can I do with External Tables?
External Tables can be selected from, inserted into, and created within a simple SQL statement. They can either be directly referenced or created from within an SQL statement as a transient table or cataloged just like a standard table for sharing and reuse by others.
External Tables support a wide variety of data types, including the following:
You cannot create Indexes or MDC’s for obvious reasons, but performance is still very fast for directly accessing the file, as the above video demonstrates.
How can I benefit from using External Tables?
Simplify and increase performance of existing applications
Loading and merging data
Many batch or warehousing workloads import and merge data in two steps, with a mixture of commands and SQL statements. This can now be implemented with External Tables in one simple SQL command, providing greater performance and simplicity than before.
MERGE INTO ASSOC_TRANS T_OLD USING (SELECT * FROM EXTERNAL '/database/test.csv' LIKE ASSOC_TRANS USING (DELIMITER ',' SKIPROWS 1) ) as T_NEW ON (T_OLD.RECEIPT_ID = T_NEW.RECEIPT_ID and T_OLD.TX_NO = T_NEW.TX_NO) WHEN NOT MATCHED THEN INSERT (RECEIPT_ID ,TX_DATE , TX_NO ,PRODUCT_ID ,PRODUCT ,QUANTITY, ITEM_COST ) VALUES (T_NEW.RECEIPT_ID ,T_NEW.TX_DATE , T_NEW.TX_NO ,T_NEW.PRODUCT_ID ,T_NEW.PRODUCT , T_NEW.QUANTITY, T_NEW.ITEM_COST ) Else IGNORE
Exporting or dumping data
Data can be dumped into an external file through a Simple SQL Statement, bypassing traditional export commands and providing greater performance and simplicity to the application developer. This is ideal for fast dumping of data from OLTP systems, and if the file system or object storage is shared with a data warehouse server, a simple External Table operation on the warehouse system can have the data loaded for fast reporting without worrying about transporting, unloading, and loading of the data.
CREATE EXTERNAL TABLE '/database/dump.csv' USING (DELIMITER '#') AS SELECT * FROM ASSOC_TRANS
Access to detailed data outside the database
Quite often, atomic-level or archived data needs to be accessible for further follow-up or enquiries. However, keeping this data in the database can slow down backups and restores of the system and other administration operations so the data is typically left out in flat file or in a data lake, requiring some form of federation or NoSQL operation to access it. Now, select data can be accessible via External Tables for drill-down analysis or data lookups without having to put in place federation or expensive archive alternatives.
Data warehouses are often not updated in real-time; instead, refreshes are staged at the end of day so as to maintain consistency across the environment. For those reporting use cases where intra-day data is required, External Tables provide an alternate way of reading data for intra-day reporting without having to worry about complex replication and data integration scenarios. They simply provide a dump of the current day’s data at regular intervals for access by data warehouse reporting queries.
Db2 External Tables can simplify the way we import and export data from Db2, reducing execution time and opening up new ways to address batch workloads, data archive access, and self-service BI. They are easy to implement and get started quickly with, with few changes required to existing import or export code.
External Table functionalities are available with all editions of Db2 11.5. Learn more about Db2 11.5 capabilities.
To try it yourself, you can either download and upgrade off your existing Db2 entitlement, use the Db2 Community Edition that is available on Docker Hub, or you can go to the IBM Db2 Database page to download the standard installation of the product.