Transactional behavior of the LOAD HADOOP statement in Db2® Big SQL
This topic provides details that can help you to avoid problems when using the LOAD HADOOP statement.
Data that is being processed when a LOAD HADOOP…APPEND statement is running is put into a temporary location until processing is complete. If the load operation completes successfully, the data files are moved into a directory for the target table. If the LOAD HADOOP…APPEND statement fails, the data is discarded. If several data files are to be moved, and a failure occurs when those files are being moved, data might only be partially loaded. You might also see a temporary directory of files that should be removed.
The load operation does not impact existing readers. The readers do not see data being processed, nor do they see data that has been added since the load operation began.
You can verify a complete load operation by checking the returned message. When a load operation completes successfully, you receive an informational message with the number of loaded rows, the number of input records, the number of rejected records, and the number of skipped lines. The message also contains the Hadoop job ID, which you can use to retrieve the map task logs for additional information.
When you specify the OVERWRITE clause in a LOAD HADOOP statement, the new data replaces the old data in the table. The new data is written to a temporary directory until the commit stage at end of the load operation. During the commit stage, the old data files are removed, and the new data files are moved from the temporary directory to the final table location. The scheduler cache is then refreshed. If a query is in progress while the LOAD HADOOP…OVERWRITE statement is committing, the query might fail if it tries to read old data files that have been removed.