Process small, compressed files in Hadoop using CombineFileInputFormat
From the developerWorks archives
Date archived: April 22, 2019 | First published: February 11, 2014
This article provides detailed examples that show you how to extend and
CombineFileInputFormat to read the content of gzip (default codec)
files at runtime. Learn how to use
CombineFileInputFormat within the MapReduce
framework to decouple the amount of data a Mapper consumes from the block size
of the files in HDFS.
This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some content, steps, or illustrations may have changed.