Welcome back to the latest installment of Database Deep Dives! In this session, we caught up with Evan Weaver, CEO of Fauna.
We cover a great deal of ground in this article, especially around the nuances of consensus protocols, language standards, and serverless data.
Hacker Noon did a great piece with you last year so I’m going to cut to more technical questions.
Why should someone pick Fauna over CockroachDB, Yugabyte, and MemSQL?
Evan Weaver [EW]: Ultimately, these products are in very different markets. Most people that bring an operational database to market, their strategy is to copy the same interface, the same operational profile, the same goals as existing mature systems and communities.
There are a lot of people that say, “Here is PostgreSQL or MySQL but it doesn’t scale, and I want to provide something more scalable,” but there are fundamental architectural implications to building SQL for a traditional DevOps or containerized environment that constrain the search space you can explore when you are trying to build a more perfect database.
We took a look at this when we said that we wanted to start from the right architecture and then find the right interface for that—to build a general purpose operational data fabric rather than have incremental improvement on the existing state of the art.
So, one of the biggest differences between these systems and Fauna is that they are fundamentally operated systems. You are still talking about managing clusters for specific workloads and some degree of scale. Whereas Fauna has turned it all inside out and is a serverless platform. All the tenancy, all the quality of service management, all the security . . . everything you need to access data securely at scale is inside the database kernel.
That means, in particular, that SQL is not a great fit for us because it’s not expansive enough. It doesn’t let us build the security model and low-overhead connection model we need to build. It’s a great language for analytics and a great language for OLTP, in some contexts, but it doesn’t solve the new class of serverless problems.
In terms of your specific choice, it really comes down to what stack you are coming from. People update a database of the development stack they are choosing to use, not vice versa. So, if you have a traditional full stack application, run server-side, all the business logic as an app code like in a container somewhere, incorporating a partitioned database for your backend isn’t a heavy lift and you will probably stick with the things you know.
But, if you are coming from the new stack like serverless or JAMstack, where all access is from the embedded client, those legacy tools don’t work for you at all. You are looking at something that is purpose-built for that development model and maintains the productivity benefits that you’re looking for in the serverless context. That’s why you are going to choose Fauna.
One of the things I love about Fauna is the recent focus on GraphQL. What spurred you to invest there?
[EW]: Ultimately, we built FQL, which is Fauna’s native query language that is similar to Linq in that you build object relational patterns and it’s a unification of the document, relational, and graph query paradigms. We built that to be general purpose.
What we are now doing is using it as an intermediate representation, similar to an IR and compiler like LLVM, to interpret or JIT standard languages into FQL for execution on the database kernel. GraphQL is the first of those standard languages and we choose it because it’s becoming the lingua-franca for JAMstack and serverless development.
In particular, GraphQL solves the coupling problem between the backend and the frontend. Small to large organizations across multiple products that to try to interact with same data. Which is everybody now… so, we saw a lot of interest from our community in doing GraphQL adaptors and we decided to internalize it as the first of the standard languages we are rolling out.
Fauna positions itself as a database for serverless applications and data but pay-go metered pricing isn’t new. What functionality and architectural implementations do you provide that you believe distinguishes FaunaDB in this camp?
[EW]: To deliver serverless data, you need more than just consumption-based pricing. If you look at the legacy databases—even the legacy cloud databases like Dynamo, for example—the fundamental model is still scaling the physical machine down. You are still thinking about the location of your servers and your tables, provisioning limits, hot spots for shards, you still have to work through this metaphor of a physical computer to build your application.
The real kernel of serverless is that that metaphor is gone. But to make that be gone on the data level requires a lot more granular billing by the minute. It does require consumption-based billing. Fauna bills by resources consumed, but it also requires things these legacy systems can’t provide like no cold starts or connection overheads because you have clients and lambda functions connecting and disconnecting transparently rather than an application server with a long live connection to a single database machine.
You need global low latency because you are talking to the database directly from globally diverse clients. So, if you are building an app locally and want to push it to the cloud as an API and just have it work, that means you don’t want to think about the physical location of the data systems behind the scenes.
You also need it be secure by default, rather than have to add in some middleware or custom functionality (or nothing a lot of the time, which leads to the obvious bad outcomes) to be able to query it from the embedded clients. You don’t want to make consistency tradeoffs by having to tune consistency to get the latency profile you need or vice versa. You need a consistent, ubiquitous experience.
You just don’t want to do operations at all. You want to live in a world where DevOps is just gone—thinking about capacity, provisioning, scale, partitioning, all these things. They are a holdover from the old world. Anything that requires you to enter the old world and start doing DevOps—whether that’s configuring a security profile in Amazon or trying to think about provisioning of your data sets—means that you might as well go back to the old stack because the burden isn’t eliminated.
We saw this happen in the backend-as-a-service era with things like Heroku, where it was a better development experience until you got to the database and then you’re basically dealing with a managed service again with no transformative productivity improvements. There are some trade-offs for price performance vs. ease of use but, fundamentally, it’s that same mental overhead in the development process as the old managed world. The upshot of that is that people just want back to the old managed world because it didn’t fundamentally transform their development experience.
To make the serverless ecosystem real, we need a serverless database that never makes you think about the physical machine.
What does the future hold for Fauna? What should readers be excited about?
[EW]: We are really excited about continuing to mature GraphQL and continuing to integrate, in particular, with the JAMstack ecosystem (by the way that’s JavaScript, APIs, and Markup). I think a lot of people misinterpret it as “only React.js” or “only static sites.” It’s not. It’s similar to the full-stack paradigm, except the main twist compared to the full-stack or even the client side rendering/server-side business logic that we got from the MEAN stack with MongoDB and Node.JS is that all business logic is pushed into the client—rendering is on the client and logic is on the client.
You can use Lambdas and compute functions as, essentially, a secure enclave to integrate other services but you don’t have to. The more computation you can push into the data tier, the less you have to worry about that as well.
All these partners are moving into . . . well I don’t want to say full-stack because it’s the old paradigm, but they are moving into complete, dynamic application development. So, we are working closely with Netlify, Zeit, Prisma, Hasura, Apollo, and others in this JAMstack to make sure the serverless development experience is complete.
To me, the dream of JAMstack and serverless development is the same as the dream of essentially PHP in the 90’s, if you’ll bear with me. You can build an application on one computer—the laptop in front of you. If it runs there, you can push it now to the cloud instead of a single physical server and it will work. There is no fundamental distinction between the interface, performance profiles, workflows, and development methodologies you’re using to development and go to production.
I think we lost a lot in industry in the transition to first, cloud, and then, distributed systems, in terms of this ease-of-use and simple development paradigm. But we can bring it back with a new generation of API-vendors who are delivering that simplicity in a fully-distributed platform managed way.
Reading one of your recent white papers, I saw that, “FaunaDB’s query language is designed to increase safety, predictability, and performance . . . These improvements are difficult or impossible in legacy query languages because they require restricting the language, not extending it, damaging standards compatibility.”
How does the business balance the trade-off of demand for open standards vs. the improvements that can be derived from a purpose-built Query Language?
[EW]: That speaks to the language compilation strategy, which I think is unique to the industry, as far as I know.
A pattern we see a lot is that people will begin developing their app with GraphQL; they will get really far in their product development lifecycle but then they will get to a point where they want a little bit more access, a little bit more control, a little bit more flexibility in something like the history of their data, the temporality, or the security model that the GraphQL interface exposes.
What’s completely unique to Fauna is that when you get to that point, you can drop down to FQL because we have this intermediate representation. You can simultaneously continue to develop your primary query patterns on GraphQL, but still drop down to the IR level, which is admittedly more complex and harder to use because it’s a proprietary interface. But, you can get to your data and the full power of the database platform when you need to. That pattern has been extremely successful for us and we are excited to extend it to other standard languages like SQL in the future.
In particular, it also lets us—as long as the data model paradigms are aligned—even expose the same data through multiple interfaces without having to ETL it into a second system.
In the fullness of time, Fauna will give you access to GraphQL, SQL, probably a graph language like Gremlin, and other standard languages on top of the same data sets—all backed by FQL as the high performance, secure, centralized, intermediate representation—and let you basically mix and match your productivity from open standards with the power of using a full-blown propriety interface.
You went through the Jepsen ringer and emerged in pretty good shape. Can you talk about your decision to implement Calvin as the consensus protocol vs. Raft/Paxos/Spanner? Has it held you back in any areas, like being able to offer an OLAP interface?
[EW]: So, Calvin has not held us back at all. The trade-offs are pretty minor in terms of the solution space you can explore, especially for replication on the WAN, which was our goal. We were looking at replication protocols and consensus protocols for transactions back in the day. Around the time that everyone else hopped on the Spanner train, we took a lonely journey on the Calvin train, in particular because our experience at Twitter had led us to believe that in the future, every product will be global, every product will launch to a global audience and require ubiquitous low latency access from the edge.
It seemed clear to us, although I guess it wasn’t clear to others, that Calvin was the best solution for us. The paper was pretty opaque I think, which led people astray. There is some handwaving around putting locks around, which was scary to people. There is a bunch of statements that partitions are rare, which people took issue with and didn’t understand that when Abadi and his co-authors are writing that partitions are rare, they mean they might happen every minute or two. They don’t happen a majority of the time. But people were like, “I get a network partition once day, they aren’t rare at all!” There was a lot of miscommunication I think, which speaks to the gap between academia and industry which tends to leave innovation by the wayside.
We were focused on the latency profile in a replicated WAN scenario, particular across the global internet, and Calvin was clearly better. But, we have had to extend it a lot with things like the OCC mechanism, pipelining, and building lock-free replication. A lot of additional on top turns it from simply a transactional consensus protocol into a full-blown database.
You mentioned Raft and Paxos; Fauna is built on Raft, and Calvin is a layer above Raft. The big difference from these other systems that are based on the Paxos model is the inversion of the timestamping and the transaction logging.
If you take a system like Spanner, it’s essentially trying to use synchronized clocks to create an emergent order for transactions effects that are being dropped directly into the replica partition leader which are backed by Paxos. Now, that leads to all kinds of problems, both in terms of correctness under skew, but also long tail latency—you have to talk to more and more partition leaders as your queries get wider.
We have seen this be very visible in benchmarking of similar systems in the space. But what Calvin does is invert this model completely. Instead of using the time to order the transactions, the transactions order the time. So, they are submitted to random partitions, which are essentially a distributed write-ahead log that have no single point of failure and non-central coordination point. And the order they get submitted to this log—not dissimilar to the WAL in something like PostgreSQL, except it’s globally distributed and strongly consistent because it’s backed by Raft—determines the time-stamping and the progress of the logical clock across the cluster. It roughly corresponds to physical time, but doesn’t depend on it in a way like Spanner does.
That means a transaction becomes durable before its result is known, so it’s not until the replicas have applied the transaction effects that we can return back to the client and give them the result. But it completely obviates the need for dealing with clocks or even having the replica themselves coordinate with each other beyond tailing from the WAL. For us, it’s been a dramatically simpler model and higher performance across the board.
Before we wrap up, any last comments to leave the readers with?
[EW]: I think it’s important to emphasize, you talked to a lot about the technical depth of the system, and it is very complex in terms of implementation, as the Jepsen paper reveals, but our fundamental attitude is that we don’t want the complexity to affect the experience of the developer. The point of the complexity is to remove a burden from the developer workflow and the experience of interacting with your database.
I don’t want anyone to be put off by the sophistication of the system, the sophistication is there for you. It’s very easy to get started with GraphQL using our web dashboard, the base plans are all free, and you get going building an app now with a serverless stack like Netlify, Lambda, Fauna, React—whatever you are used to using and just try it out.
Our goal is to take the burden off the developer, not expand the operational feature set like some other databases seem to be doing. Some people are put off when they think about FQL, Calvin, the consistency model . . . the whole point of all that is that you don’t have to worry about it.
More Database Deep Dives
Thanks for reading. Hungry for more Database Deep Dives? Check out our previous interviews: