Over the past 15 years, key-value (KV) stores have emerged as a popular solution to build large scale data platforms.
KV stores expose a basic data model that maps unique keys to values, much like a hash-map. This data model comes with the promise of high flexibility and scalability, which are generally considered to be pain points of traditional transactional processing systems based on the relational model. In fact, sharding the data set across several cores or machines effectively and supporting efficient operations such as joins over several tables are very challenging tasks. As a result, KV stores have been adopted in many deployments as a replacement of traditional database systems.
KV stores embedded as storage engines within database systems
With time, KV store designs have become more mature, providing additional functionalities, such as efficient range queries, durability, and atomicity. These are key requirements for databases and are typically implemented from scratch in such systems. The increased set of KV native functionalities has led database designers to reconsider the historical contraposition between KV stores and database systems and to embrace designs where KV stores are embedded as storage engines within database systems. In these designs, the transaction processing logic is decoupled from the data storage and retrieval subsystem, which is taken care of by the KV store.
Examples of database systems that embrace this design are Microsoft's Deuteronomy, which is built around the BW-tree KV store; Apple's FoundationDB, which has employed SQLite and is now shipping with the new RedWood KV store; Google's Spanner, which has been layered on-top of a Bigtable-based KV store; MongoDB, which uses WiredTiger; and MariaDB, which uses Facebook's RocksDB.
This decoupled design allows for a clear separation of concerns, making performance improvements to the KV store available for the database system as a whole, with no further integration efforts.
Efficient and high-performance KV store designs
In light of this design shift, proposing efficient and high-performance KV store designs becomes relevant for a very broad set of data platform systems and use cases, and IBM Research is highly involved in projects that pursue this line of research.
Researchers from the Zurich lab have designed and implemented uDepot, a KV store built from the ground up to enable high throughput, low latency, and high efficiency with emerging NVM storage devices, such as Intel 3DXP SSDs, and with NAND flash devices, which are widely available in Cloud environments.
uDepot achieves high performance and resource efficiency thanks to the synergistic implementation of two main techniques:
- A lightweight task-based runtime that can handle multiple I/O operations in parallel, thus saturating the available I/O bandwidth.
- A two-level main-memory index that dynamically adjusts its DRAM footprint to the current data-set size, while providing fast lookups and insertions thanks to high cache efficiency and a low-overhead concurrency control scheme.
Overall, uDepot is delivers up to 2x higher throughput than existing systems on the YCSB workloads and is able to fully utilize the available I/O bandwidth, even when deployed on top of 20 storage devices. Additional details on uDepot can be found in the paper published at the USENIX FAST conference.
IBM Research is also active in the development of FoundationDB and has already contributed with two improvements to its KV store component. The first contribution is the design of an in-memory KV data structure that is used by the storage layer of FoundationDB to buffer the incoming writes before committing them to disk. This data structure is based on the Adaptive Radix Tree, and replaces the existing Red-black tree implementation, providing up to 20% higher write throughput.
The second contribution is the implementation of a new page caching scheme that adopts the last recently used replacement policy. This policy aims to retain frequently accessed pages in memory and provides up to 10% higher hit rates on skewed workloads compared to the default random replacement policy. Both contributions have been made directly to the public repository and in collaboration with the FoundationDB community.
Additional details on the involvement of IBM Research in the FoundationDB project can be found in this presentation, given at the 2019 FoundationDB summit.