Database Deep Dives with Andy Pavlo

12 min read

In this iteration of Database Deep Dives, we had the pleasure of catching up with Professor Andy Pavlo.

Andy Pavlo is an Associate Professor in the Computer Science Department at Carnegie Mellon University (CMU), where his research focuses on database management systems like self-driving databases, transaction processing systems and large-scale data analytics. Andy is also the CEO and co-founder of OtterTune, an automated database configuration optimization product. 

In this interview, we cover a wide range of topics, such as CMU’s new autonomous database project, NoisePage, what his Ph.D. students are up to and predictions for the future of databases. 

Background and experience with database management systems

Josh Mintz (JM): Hi Andy, glad to have you with us today. To get us started, can you introduce yourself and give us some background in the database space and what you've been up to at Carnegie Mellon University?

Andy Pavlo (AP): Thanks Josh, happy to be here. I am an Associate Professor in the Computer Science Department at Carnegie Mellon University (CMU) and have been at CMU since 2013. My area of research is database management systems (DBMSs).

I completed my Ph.D. from Brown University under Stan Zdonik and Mike Stonebraker. When I was in grad school, my core research was on a transaction processing system called H-Store, which was later commercialized as VoltDB. With H-Store and VoltDB, it was all about getting insane transaction performance, as we designed these systems to run transactions as fast as possible.

That is an important problem to solve, but most people do not need a DBMS that can execute 10 million transactions a second. So, when I started at CMU, I became interested in other aspects of databases that could potentially help a larger number of people. I saw a recurring problem theme — one that dates back to the 1970s — that people were struggling to maintain, configure and deploy DBMSs. The labor cost of running databases is high. That is why my research has been on trying to alleviate these pain points.

I divide my research into two tracks. The first track is on black-box methods for optimizing existing database systems (MySQL, PostgreSQL, Oracle). The second track is on white-box methods that take a clean slate approach to building a DBMS from scratch, with the goal that it should be autonomous.

For the first track on tuning existing systems, we built an automated service called OtterTune that uses machine learning to optimize a DBMS when you do not know anything about the system's internals and can only use its existing APIs to tune it. There is a long history of research on other automated tuning methods for database physical design, like picking indexes, sharding keys and materialized views. IBM did a bunch of great work in this space for Db2 in the early 2000s and Microsoft as well with the AutoAdmin project. But these previous methods require access to the application's data and its queries, which brings up security and privacy issues. With Ottertune, we focused on tuning the DBMS's configuration "knobs" because it potentially does not require access to sensitive data. Knobs are also an aspect of DBMSs that have grown in complexity, and there is not a lot of work on automated tuning tools in this area.

For background, these configuration knobs are runtime parameters that control the system's behavior like buffer sizes, caching policies and log file size. But tuning a DBMS's knobs to make it perform well is a difficult task, historically. Every DBMS has its own set of knobs; the names are different, but they control similar things at a high level. And the number of knobs that these systems expose grows over time. For example, when we looked at the last 20 years of releases for MySQL and Postgres, we found that the number of knobs that these systems expose increased by 7x and 5x, respectively.

These DBMSs have so many knobs because their developers add features that require somebody to control some aspect of it (e.g., how much memory to allocate for a hash table). Instead of hardcoding the value, they expose it as a knob and say, "someone else who knows how the application works will use the database will set this value correctly." And, of course, that never happens. DBMSs are complex pieces of software, and few people know how to tune them!

The way OtterTune works is that we first observe the runtime metrics of the system. These metrics are internal performance counters that provide a view into the internal behavior of the system. For example, there are metrics on how much data the system reads and writes from disk. We then train a statistical model that predicts how the DBMS will perform as we change the knobs according to an objective function, such as throughput or latency. OtterTune uses its recommendation algorithm on these models to generate a new value for the knobs. It then installs the new configuration in the DBMS and repeats the process. Over time, the service learns that it makes things better or worse if it tunes the knobs a certain way.

Research track on building self-driving systems

JM: That’s interesting. Let’s now talk about your second research track on building self-driving systems. Your team recently decided to move on from your first attempt in this space, called Peloton, and are now working on a new self-driving DBMS called NoisePage. What architectural mistakes did Peloton make that were fixed in NoisePage?

AP: Peloton was a relational database management system that we were building at CMU for a couple of years. Rather than tuning the database from the outside like with OtterTune, our goal with Peloton was to design a DBMS from the ground-up for autonomous operation. There were several mistakes that we made in Peloton that cover both the software engineering side and architecture side.

With software engineering, the biggest problem was that we were not checking the system's performance carefully over long periods of time. Students would write new code and then test the end-to-end system for 60 seconds. They might say, "It's 1% slower, but who cares?" Over time, we accumulated so much technical debt, and the DBMS's performance was terrible. In the first public demo of Peloton in 2017, it executed 100,000 TPC-C transactions per second over JDBC. The final version was down to six transactions a second! Nobody was checking these things over more extended periods as we added new code to Peloton. 

In 2018, we decided to scrap the entire Peloton codebase and start over with a new DBMS called NoisePage. With this new DBMS, we have a more rigorous CI pipeline, testing pipeline and performance pipeline using a combination of Jenkins, GitHub actions and custom tools. This kind of performance tracking is common knowledge for experienced DBMS developers, but it is tricky in my environment because I am working with students. To be clear, I have phenomenal students, but they always graduate and get hired at top companies!

This constant churn from one semester to the next means that I might have a student only working with us for about one year before they leave. We realized that if we did not have something in place that enforces these constraints and makes sure that we were writing high-quality code, NoisePage would suffer the same fate as Peloton. We now think that we have the proper guardrails and monitoring tools to ensure that we do not repeat the same mistakes as Peloton.

On the architectural side, we realized a couple of things were a mistake with Peloton's design that we did differently in NoisePage. I will give a couple of examples, from a pure performance standpoint and another for the self-driving functionality aspect.

For performance, we built an open-source version of the Bw-Tree lock-free data structure for our database indexes, but it turned out to be slower than I had expected. The Bw-Tree originally came out of the Hekaton project from Microsoft. I am a huge fan of Microsoft's original Bw-Tree paper from ICDE'13. When I started at CMU, I said, "We should build our own Bw-tree in Peloton because that would be fun to do, and Microsoft's performance numbers against a regular B+Tree showed that it was better." But, it turns out not to be the case for pure in-memory workloads. Our SIGMOD'18 paper shows that a well-written B+Tree can crush the Bw-Tree in terms of performance.

Another aspect from the software engineering side is that for the original version of Peloton, we did code generation or just-in-time (JIT) query compilation using LLVM. But the way we did it was the same way that Thomas Neumann did it in his HyPer paper from SIGMOD 2011, where the C++ code generates the LLVM's intermediate representation (IR) directly then compiles it into machine code. This approach is great for performance, but it is difficult for students to maintain and debug the code. If the system crashes when running one of these compiled queries, the only information you have is the query's assembly code; there is no stack trace to step through with a debugger. In NoisePage, we now take the same approach as SingleStore (previously known as MemSQL) from their 2016 High-Scalability article.

The DBMS first converts a query plan into a domain-specific language (DSL) specific to our DBMS. This DSL looks like C, but it describes the query's high-level steps (e.g., scanning tables, building hash tables). The system then compiles the DSL into opcodes. It can then either interpret those opcodes to execute the query using a custom VM (similar to SQLite) or compile those opcodes into machine code with the LLVM. With this new approach, if a student breaks something, we can at least go through the interpreter and debug it. This new approach has been a massive win for us in terms of productivity because students no longer need to be LLVM experts to work on NoisePage's execution engine. 

Reflection on the self-driving support mistakes

JM: And what about the mistakes for the self-driving support?

AP: With Peloton, we wrote many papers on how to get the best performance of different components. That is no longer our overarching design goal with NoisePage. Instead, our focus is on stable performance. The longtail makes it hard to train accurate models if things get noisy at the end. Another mistake is that we did not design the system to be able to make any possible change without requiring a restart before that change takes effect. A self-driving DBMS should never require a restart because whether the database can have downtime is an external cost that the algorithms cannot easily reason about in their calculations. That requires a human to make a value judgment.

The most challenging problem that we have been focused on over the last two years is modeling. The best way to understand this is to use self-driving cars as an example. I do not want to push the self-driving car metaphor too much, but they heavily inspire our database research. The high-level approach is the same. What does a self-driving car do? The car has LIDAR and cameras that it uses to look down the road. Then it uses forecasting to try to predict, "here are the objects around me and here is where they will be in the future." These models guide planning components in choosing how to steer the car.

A self-driving database sort of operates in the same way. It has forecasting models to predict what queries it is going to execute in the future. Then you need behavior models that can predict how much more work the system will have to do to execute those queries, given the database's current state. Then you have a search algorithm or a planning policy that says, “given the forecast and behavior models, here is the action I should take in the future to tell me how to improve my objective function.”

We think we have solved the first two in NoisePage (i.e., forecasting and behavior modeling). We are working on the third — action planning. You must design the system from the ground up to record the necessary information to support self-driving operations. For example, we model individual components of the system at the lowest level, like building a hash table or probing an index. These models predict how much resources the DBMS will use when it executes these components during query execution, including CPU, disk and memory utilization. The planning components can then use these models to estimate which actions (e.g., building an index on a table) will decrease the system's resource utilization and potentially improve performance.

JM: So, with all these machine learning (ML) features, do you need GPUs to run NoisePage or are you doing something that uses CPUs to do the inference?

AP: Future DBMSs may need additional computational resources (e.g., GPUs, TPUs) to accelerate their ML components in self-driving operations. The current version of NoisePage’s self-driving components uses stock PyTorch, so it runs off the CPU. We could use a GPU in the future if it becomes necessary. This highlights another advantage of using fine-grained behavior models for DBMS components because you do not have to use GPUs. Each individual model is not excessively large, and we can easily train it with a CPU, which would not be possible if one used a giant model for the entire system.

The impact of industry trends on students

JM: What are industry players doing that are exciting your students? And what are your students working on right now that excites you?

AP: Dana Van Aken is finishing up her Ph.D. on the OtterTune academic project. We just completed a 10-month field study at a major French bank that shows OtterTune. Our VLDB'21 paper from this study show that OtterTune was able to improve the performance of a real-world Oracle database by up to 50% even after expert DBAs had already tuned it. Given these experiments' success, Dana and I have spun out OtterTune from the university as a new start-up and will be announcing even more results later in 2021. There some other companies working in this space, such as Unravel Data, but they focus on tuning "big data" frameworks, like Hadoop and Spark.

My Ph.D. student Prashanth Menon is working on NoisePage's code generation engine using the LLVM. He studies how the system can avoid taking an all-or-nothing-approach to query compilation and allow the DBMS to make adaptive changes to queries without recompiling. When he started in 2015, there were not many DBMSs out there doing code generation. This has changed in recent years; more open-source and commercial systems are using this technique. Even Postgres added limited JIT compilation in 2018. But everyone is still taking the "set it and forget it" approach to this. He's working on a way that you can adapt and improve things based on what you see in the data.

Another one of my Ph.D. students, Lin Ma, is leading the self-driving research effort in NoisePage. The industry work in this space is limited to the big tech companies. Microsoft is still at the forefront of automated databases with their AutoAdmin project. Oracle's self-driving play is interesting, but they are not too public about their system works yet. To the best of our knowledge, their approach is to simply automate the tools that they built in the previous years for DBAs. 

My other Ph.D. student, Matt Butrovich, investigates how to use eBPF to build a non-intrusive observability framework in the DBMS to extract out the training data we need for our models. Our current approach is to use heavy-weight instrumentation in the DBMS to collect the training data offline. eBPF potentially will allow us to collect this training data while the system is online to improve its accuracy and adaptability to the DBMS's operating environment. We are inspired by Brendan Gregg's work on eBPF for getting the training data we need for our models.

Lastly, my Ph.D. student Wan Shen Lim is looking at how to get training data from online machines without slowing down anything in production. This is related to active learning, where you need to figure out what you do not know about your system in your models and then guide the self-driving components to explore that part of the solution space. This method means the DBMS deploys actions — like building an index on a new table or changing a configuration knob to a value that it has not tried before — to see whether it helps.

Obviously, you do not want the system to be doing this on the production databases. Wan's work seeks to piggyback off database replicas for high availability and use them as guinea pigs to try out new things. If the DBMS learns that an action is beneficial, it can then apply that change to the production machine. In other words, the DBMS tries out all sorts of crazy things on the replicas behind the scenes, so the application don't see the changes, but the system still maintains its availability guarantees through replication.

One advantage to building a system from scratch is you must tackle a bunch of problems that other people have solved. But they solved it in one way, and no one has taken a holistic view. A bunch of challenges come up, like how do we how do we integrate eBPF into the system or how should we keep track of actions that the planning components try out? If we were just hacking on Postgres, I do not think it would be as interesting.

The future of databases

JM: That makes sense. So Andy, where do you see databases going in five years? In 10 years?

AP: In the next five years, I think everything is going to continue moving to the cloud. Every new database startup needs to have a cloud offering out of the gate. We will also see a more significant move into more automated administration for existing systems. It will be something like OtterTune, AutoAdmin from Microsoft or the Db2 Advisor tools from IBM. These will be tools to work with existing systems. I think we are still 5-10 years away before a system like NoisePage becomes commercially available. People will be skeptical about full automation at first, just like self-driving cars where people will still want a steering wheel. But we will see automation acceptance increase gradually.

Over the next 10 years, I also think that the hardware landscape for databases will get more diverse. This diversity will be in both computational resources and storage resources. A significant change because of this new hardware is that it will require DBMS developers to rethink how to design systems. For example, if persistent memory technologies like Intel’s Optane take off, you will have to throw away the textbook on how to build databases from the last 50 years.

Learn more

That’s it for this round of Database Deep Dives.  If you are looking to take a cloud database for a spin, IBM offers a variety of databases technologies, such as Db2, PostgreSQL, and MongoDB, that are purpose built for all your application needs.

Learn how you can get up to $1,500 in credit towards our database service (proceed to Promotions tab under Cloud Database Use Cases)

Be the first to hear about news, product updates, and innovation from IBM Cloud