HDFS erasure coding
HDFS erasure coding is an alternative storage strategy to traditional HDFS three-way block replication. The key advantage of using erasure coding is that your files will consume less space.
You can enable erasure coding on a per directory basis. For more information, see HDFS erasure coding.
Db2® Big SQL is compatible with erasure coding. For example, the HDFS files that underlie Db2 Big SQL tables can use erasure coding or three-way block replication, or a combination of both strategies.
There are performance considerations when choosing erasure coding. Although this approach uses only about half the HDFS space compared to three-way replication, additional CPU and network overhead with erasure coding can impact Db2 Big SQL performance. When table data is stored by using HDFS erasure coding (instead of three-way block replication), Db2 Big SQL query workloads can take 10% or more longer, on average, to complete, and individual queries can take several times longer to complete. This appears to be associated with a small increase in CPU resource consumption, but a significant increase in network traffic. The performance impact of switching to erasure coding is also both workload specific and cluster specific.
- For Intel based clusters, ensure that ISA-L native libraries are enabled. Use the hadoop checknative command to confirm this.
- Assess the impact of erasure coding on performance by testing a subset of your tables and workloads.