Detection and compaction of small files in HDFS

Query performance can suffer when the file layout is fragmented into multiple small files. A tool that detects this condition and recommends corrective action to optimize file size and layout is available.

The tool helps to identify problematic small files at the storage level and provides recommendations for file compaction in HDFS directories. The merging of these files improves the Db2® Big SQL read performance by minimizing the metadata that must be processed and by aligning file sizes to HDFS blocks more efficiently. For more information, see Inspect Files tooling for IBM® Db2 Big SQL and Optimizing ORC and Parquet files for Big SQL queries performance.