Tuning S3 performance

You can use the S3a protocol to store data for Db2® Big SQL tables in an object store. With object storage, network performance is critical.

If you plan to use Db2 Big SQL with object storage, try to collocate your cluster and the object store in the same data center to minimize network latency and maximize network throughput.

  • When Db2 Big SQL table data is on HDFS, the ORC, Parquet, or text file format is recommended.
  • When Db2 Big SQL table data is in an object store, use a compact file format such as ORC or Parquet to minimize network traffic.

Be sure to review the Db2 Big SQL S3a connector settings and tune them for your environment. The following table provides some examples:

Table 1. Tuning the S3a file system configuration
S3a setting Default value Tuned value Notes®
fs.s3a.buffer.dir ${hadoop.tmp.dir}/s3a For example, suppose that HDFS uses N disks: /data1 to /dataN. Set fs.s3a.buffer.dir to /data1/tmp/s3a,/data2/tmp/s3a,...,/dataN/tmp/s3a. Spread the temporary files across the available data disks that are used for HDFS.
fs.s3a.connection.maximum 15 250 Tune fs.s3a.connection.maximum and fs.s3a.threads.core together.
fs.s3a.threads.core 15 250 The value of fs.s3a.threads.max is 256.