Topic
  • 1 reply
  • Latest Post - ‏2013-10-07T18:26:12Z by dvillegas
AlanFischereSilva
AlanFischereSilva
2 Posts

Pinned topic Compression - DefaultCodec vs GzipCodec

‏2013-10-02T20:11:59Z |

Hi, I am running a JAQL compression from a flat file and analysing different results between all compression codecs:

org.apache.hadoop.io.compress.DefaultCodec

org.apache.hadoop.io.compress.GzipCodec

org.apache.hadoop.io.compress.BZip2Codec

org.apache.hadoop.io.compress.SnappyCodec

com.ibm.biginsights.compress.CmxCodec

The problem is that DefaultCodec and GzipCodec show identical results in terms of time and size whereas the other codecs show different results.

I would expect to see DefaultCodec to be the hadoop deflate codec and therefore to show different results as well, not the same as Gzip. Is this expected?

I got these results on BigInsights 2.1fp1. Any comment is appreciated.

 

Updated on 2013-10-02T20:16:37Z at 2013-10-02T20:16:37Z by AlanFischereSilva
  • dvillegas
    dvillegas
    18 Posts

    Re: Compression - DefaultCodec vs GzipCodec

    ‏2013-10-07T18:26:12Z  

    Hi Alan,

     

    You are correct assuming that DefaultCodec uses DEFLATE. The reason you get almost the same results between that codec and GzipCodec is that both of them use the same underlying algorithm, zlib. In fact, you would not be getting identical results, since Gzip adds some extra headers, but the underlying payload of the compression should be exactly the same.

     

    Best regards,

    David Villegas.
    BigInsights Development.

     

    Updated on 2013-10-07T18:26:40Z at 2013-10-07T18:26:40Z by dvillegas