June 23, 2020 By Diego Didona
Nikolas Ioannou
3 min read

Over the past 15 years, key-value (KV) stores have emerged as a popular solution to build large scale data platforms. 

KV stores expose a basic data model that maps unique keys to values, much like a hash-map. This data model comes with the promise of high flexibility and scalability, which are generally considered to be pain points of traditional transactional processing systems based on the relational model. In fact, sharding the data set across several cores or machines effectively and supporting efficient operations such as joins over several tables are very challenging tasks. As a result, KV stores have been adopted in many deployments as a replacement of traditional database systems.

KV stores embedded as storage engines within database systems

With time, KV store designs have become more mature, providing additional functionalities, such as efficient range queries, durability, and atomicity. These are key requirements for databases and are typically implemented from scratch in such systems. The increased set of KV native functionalities has led database designers to reconsider the historical contraposition between KV stores and database systems and to embrace designs where KV stores are embedded as storage engines within database systems. In these designs, the transaction processing logic is decoupled from the data storage and retrieval subsystem, which is taken care of by the KV store.

Examples of database systems that embrace this design are Microsoft’s Deuteronomy, which is built around the BW-tree KV store; Apple’s FoundationDB, which has employed SQLite and is now shipping with the new RedWood KV store; Google’s Spanner, which has been layered on-top of a Bigtable-based KV store; MongoDB, which uses WiredTiger; and MariaDB, which uses Facebook’s RocksDB.

This decoupled design allows for a clear separation of concerns, making performance improvements to the KV store available for the database system as a whole, with no further integration efforts.

Efficient and high-performance KV store designs

In light of this design shift, proposing efficient and high-performance KV store designs becomes relevant for a very broad set of data platform systems and use cases, and IBM Research is highly involved in projects that pursue this line of research.

Researchers from the Zurich lab have designed and implemented uDepot, a KV store built from the ground up to enable high throughput, low latency, and high efficiency with emerging NVM storage devices, such as Intel 3DXP SSDs, and with NAND flash devices, which are widely available in Cloud environments. 

uDepot achieves high performance and  resource efficiency thanks to the synergistic implementation of two main techniques:

  1. A lightweight task-based runtime that can handle multiple I/O operations in parallel, thus saturating the available I/O bandwidth.
  2. A two-level main-memory index that dynamically adjusts its DRAM footprint to the current data-set size, while providing fast lookups and insertions thanks to high cache efficiency and a low-overhead concurrency control scheme.

Overall, uDepot is delivers up to 2x higher throughput than existing systems on the YCSB workloads and is able to fully utilize the available I/O bandwidth, even when deployed on top of 20 storage devices. Additional details on uDepot can be found in the paper published at the USENIX FAST conference.

Other contributions

IBM Research is also active in the development of FoundationDB and has already contributed with two improvements to its KV store component. The first contribution is the design of an in-memory KV data structure that is used by the storage layer of FoundationDB to buffer the incoming writes before committing them to disk. This data structure is based on the Adaptive Radix Tree, and replaces the existing Red-black tree implementation, providing up to 20% higher write throughput. 

The second contribution is the implementation of a new page caching scheme that adopts the last recently used replacement policy. This policy aims to retain frequently accessed pages in memory and provides up to 10% higher hit rates on skewed workloads compared to the default random replacement policy. Both contributions have been made directly to the public repository and in collaboration with the FoundationDB community. 

Additional details on the involvement of IBM Research in the FoundationDB project can be found in this presentation, given at the 2019 FoundationDB summit.

Was this article helpful?
YesNo

More from Cloud

Fortressing the digital frontier: A comprehensive look at IBM Cloud network security services

6 min read - The cloud revolution has fundamentally transformed how businesses operate. Its superior scalability, agility and cost-effectiveness have made it the go-to platform for organizations of all sizes. However, this shift to the cloud has introduced a new landscape of ever-evolving security threats. Data breaches and cyberattacks continue to hit organizations, making robust cloud network security an absolute necessity. IBM®, a titan in the tech industry, recognizes this critical need, provides a comprehensive suite of tools and offers unmatched expertise to fortify…

How well do you know your hypervisor and firmware?

6 min read - IBM Cloud® Virtual Private Cloud (VPC) is designed for secured cloud computing, and several features of our platform planning, development and operations help ensure that design. However, because security in the cloud is typically a shared responsibility between the cloud service provider and the customer, it’s essential for you to fully understand the layers of security that your workloads run on here with us. That’s why here, we detail a few key security components of IBM Cloud VPC that aim…

New IBM study: How business leaders can harness the power of gen AI to drive sustainable IT transformation

3 min read - As organizations strive to balance productivity, innovation and environmental responsibility, the need for sustainable IT practices is even more pressing. A new global study from the IBM Institute for Business Value reveals that emerging technologies, particularly generative AI, can play a pivotal role in advancing sustainable IT initiatives. However, successful transformation of IT systems demands a strategic and enterprise-wide approach to sustainability. The power of generative AI in sustainable IT Generative AI is creating new opportunities to transform IT operations…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters