Ingestion by adding files to HDFS

Adding files or appending to files already in HDFS is one of the fastest ways to ingest data into Db2® Big SQL and Hive.

You can create Db2 Big SQL tables by specifying the location of files on HDFS. Such tables are referred to as external tables. Use the CREATE TABLE (HADOOP) statement with the LOCATION clause to identify the location of these data files on HDFS.

If a Db2 Big SQL external table already exists, and new data files are added or removed or data is appended to the original HDFS files, Db2 Big SQL recognizes that these files are associated with the existing table.

For performance reasons, it is good practice to partition large tables. With a partitioned table, data corresponding to each partition resides in a separate HDFS directory. You can use the ALTER TABLE…ADD PARTITION statement to add partition directories if the data is already on HDFS.

For more information and examples, see Db2 Big SQL Ingest – Adding files directly to HDFS.