Process small, compressed files in Hadoop using CombineFileInputFormat

From the developerWorks archives

Sujay Som

Date archived: April 22, 2019 | First published: February 11, 2014

This article provides detailed examples that show you how to extend and implement CombineFileInputFormat to read the content of gzip (default codec) files at runtime. Learn how to use CombineFileInputFormat within the MapReduce framework to decouple the amount of data a Mapper consumes from the block size of the files in HDFS.

This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some content, steps, or illustrations may have changed.



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Data and analytics
ArticleID=961759
ArticleTitle=Process small, compressed files in Hadoop using CombineFileInputFormat
publish-date=02112014