In the latest Database Deep Dives, we had the pleasure of catching up with Liu Tang, the Chief Engineer at PingCAP and a maintainer of TiKV.
We discussed PingCAP's approach to building not only a MySQL-compatible, HTAP database, but the underlying technologies that make it possible, like TiKV, and their broader ecosystem and offerings.
Read on to learn about how TiKV is incubating with the Cloud Native Computing Foundation, the technical trade-offs associated with Google's Percolator model, and how TiKV compares to something like FoundationDB.
What’s your role at PingCAP and how did you get involved there?
Liu Tang (LT): I’m Liu Tang, the Chief Engineer at PingCAP. I’m a senior maintainer on the TiKV project and the author of go-ycsb and LedisDB. I was the first employee to join PingCAP, besides the three founders. I lead and mentor the TiKV team (which now spreads across four countries) at PingCAP as well as the TiKV community worldwide. Along with Shen Li, I also build and foster TiDB and its community.
I work remotely in Guandong, China and live with my wife and daughter. I’m now an experienced world traveler, but I’m mostly interested in the people I’m meeting, not the traveling itself. I occasionally write on my technical blog and Twitter, but I spend most of my time on GitHub.
For readers that may not be familiar, what are TiDB and TiKV?
(LT): TiDB and TiKV are the two biggest projects for us right now! TiDB, our flagship product, is a highly scalable, distributed, cloud-native NewSQL database. It’s totally open source, written in Go, and supports hybrid transactional/analytical processing (HTAP) workloads. While it supports MySQL well enough to operate products like Wordpress and Confluence without patches, it also provides features such as horizontal scalability, strong consistency, and high availability. It has been battle-tested in production by over 500 enterprises across multiple industries.
Behind every TiDB cluster lies a TiKV cluster. TiKV is a distributed transactional key-value database originally created by PingCAP as the underlying storage engine for TiDB. It is now adopted as a Cloud Native Computing Foundation (CNCF) Incubating Project. TiKV is developed with the intention of creating a common cloud-native data substrate. It works alongside PD, our coordinator, to keep the cluster in order and tame the chaos of distributed transactions.
Over the last few years, we’ve started to see this vision come to fruition, and we’re very excited to have such a vibrant community involvement in our projects. From projects like Titan, TiPrometheus, and Tidis to tools like the tikv-browser, we’re seeing more and more people adopt our ideas and technology, and we couldn’t be happier.
TiDB adopts the Percolator model from Google. What are the tradeoffs there that users should keep in mind, like the relationship between scale and latency?
(LT): You’re right, Percolator does have some tradeoffs. In particular, this model is not ideal for workloads with many conflicting transactions or with extremely low latency requirements.
While we have many users reporting sub-50ms query times in production clusters, the Percolator model requires TiKV to contact PD, the timestamp coordinator, twice for each transaction. Once at the
BEGIN, and once at the
COMMIT. These round trips take time, but with careful topology management users can reduce the impact.
Percolator has an idea of ‘secondary locks’ that need to be cleaned up on a failed transaction. This means high-conflict workloads can have poor performance characteristics in Percolator.
While these things may sound like a huge bummer—and they are—we think Percolator is a very practical and understandable model. It’s timestamp-based, which means we can try other timestamps based transaction models easily, and it’s quite decentralized, only relying on a central clock which can be quite performant.
Even still, we introduced pessimistic locks in version 3.0, allowing us to deal better with the high conflict workloads Percolator is known to struggle with.
Keen readers of yours might note that all of our current transaction models have consequences, and it’s more a matter of choosing the least-worst for your use case.
What does the future hold for PingCAP over the next 5-10 years? How does the TiKV joining the CNCF as an incubation project affect that?
(LT): We’d like to continue working hard to improve and expand our ecosystem. We’ll continue working hard to make TiDB and TiKV more cloud-native by improving our operator for Kubernetes and working with users to battle-test our deployments on popular clouds like GKE, Azure, and AWS.
Doing this will help us refine our user experience, performance, and stability, but that’s only the start of the effort. We recognize that most cloud-native technologies are fairly hard to grasp and use, and we’d like to be an exception rather than the rule.
Not content to stand idle when there are exciting engineering problems to be solved, our team is also working on boosting our HTAP (hybrid transactional/analytical) workload performance in designing new technologies like the columnar store TiFlash.
How does TiKV joining the CNCF affect that?
(LT): Well, we built TiKV with the intention of it being a building block for other technologies, not just TiDB and TiSpark. The CNCF’s interest only confirmed our aspiration, and we couldn’t be more excited to have them steward the project and help us foster its growth.
TiKV—and ultimately all of our projects—has benefitted immensely from all of the knowledge and mentoring we’ve received from the foundation. While some companies treat foundations like graveyards, we’re treating this as a long-term investment, and our TiKV team has only grown since the CNCF adopted TiKV. We hope to continue this trend, growing both our team at PingCAP and our community maintainership.
Continuing on the thread of TiKV, how does that compare with something like FoundationDB?
(LT): We think FoundationDB is way cool and we’ve been admiring their work for years now. We have an ongoing discussion about how/where FoundationDB and TiKV differ but, ultimately, we think our users are choosing TiKV for the vibrant ecosystem around it and the rock-solid guarantee of enterprise-level support and our legacy of open source stewardship.
Technically, their transactions model differs—FoundationDB uses Paxos for metadata, replicating logs to all replicas, while TiKV uses Multi-Raft for all its data. While Raft is essentially just a limited form of Paxos, research has shown that Raft is considerably easier for operators to reason about, which means less mistakes when the network is in chaos.
TiKV’s coprocessor, when harnessed by query layers like TiDB, offers a distinct advantage for users looking for an infrastructure building block, not just a key value store.
Venturing into another layer of PingCAP’s portfolio, what’s the advantage of using TiSpark as a modular addition to TiKV versus using a typical Spark cluster outside of the database?
(LT): TiSpark was the second stateless query layer we wrote for TiKV. While Spark supports SQL, when we started investigating, we realized we could eke out even more performance by leveraging the metadata that PD provides for the Spark Catalyst Engine, meaning that users will see performance improvements using TiSpark over just attaching Spark to TiDB.
TiSpark means TiKV is more than just a data source that gets loaded into the Spark cluster—it actually rewrites the execution plan to leverage the TiKV clusters computing power through the coprocessor. This results in much better data locality and reduced network traffic.
For many users, TiSpark is also their second query layer, but users can also find Titan (redis), TiPrometheus (prometheus), titea (redis), and Tidis (redis).
What should users know if they want to run TiKV and TiDB themselves?
(LT): Users should be prepared for a distributed system. They’ll notice considerable more complexity to set up a TiDB cluster compared to, say, a MySQL master and replica, but this complexity will pay off when they’re scaling from one to hundreds of nodes.
We’d also love to invite folks to come and chat with our community! We can help you through the entire lifecycle with TiDB, TiKV, TiSpark, or anything else. PingCAP offers comprehensive support for all our products, all the way from evaluation and design, to proof-of-concept and benchmarking, all the way through to deploying them in production, migrating your data, and performing a seamless handoff. We also have managed TiDB clusters you can try out without provisioning any hardware!
More Database Deep Dives
Thanks for reading. Hungry for more Database Deep Dives? Check out our previous interviews: