Local caching for object storage

Object storage is known to have a significantly higher fixed latency per request compared to block storage.

The storage hierarchy for data persistence in the previous generation IBM Db2® Warehouse in AWS was implemented exclusively using network-attached block storage, and it relied on the buffer pools as a single tier of in-memory caching for table and index data as it is read from persistent storage.

With the introduction of support for object storage as a persistent storage solution, the single tier of in-memory caching is insufficient due to the distinct input/output (I/O) characteristics of object storage compared to network-attached block storage. These differences manifest in terms of both throughput and latency. Object storage is designed to be throughput-optimized, while network-attached block storage offers a more balanced approach in terms of both throughput and latency.

In terms of throughput, assuming sufficient server-side resources for the object storage implementation, the throughput limit for object storage is determined by the network bandwidth available to the compute nodes, conducting the I/O access and the level of parallelism employed to maximize the utilization of that bandwidth. Conversely, in the case of block storage, this limit is determined by each device attached to the compute nodes, and throughput scalability is achieved by attaching additional devices to each compute node (along with parallel access to each device to maximize throughput).

In terms of latency, object storage is known to have a significantly higher fixed latency per request compared to block storage. As a result, I/O operations in object storage are typically performed using much larger block sizes (in the order of tens of MBs) compared to the block sizes (in the order of KBs) used in network-attached block storage access. This approach aims to better amortize the higher latency cost per operation.

These differences are the main motivation for the need for an additional local caching area in very fast, locally attached storage (in the form of SSDs or NVMe), which offers much better I/O characteristics than block storage, even when backed by the same disk technology. This results in a multi-tiered cache when combining the buffer pools and this additional local on-disk caching area. By using a local on-disk caching area, the higher latency cost is further amortized by maintaining a significantly larger working set in this cache than what would traditionally be maintained exclusively in the buffer pools for network-attached block storage-based access. Additionally, the higher throughput of object storage can be better utilized through batching to initially populate this cache during the warm-up phase and maintain the working set

Having a very fast local disk available not only benefits the query workload through significantly increasing the caching space available, but also enables the much faster formation of large blocks to be written to object storage without relying exclusively on main memory for this, enabling the high-latency writes to object storage to be done with fully formed blocks, which also plays a significant role in the ingest performance of the object storage support.