A look SQL and NoSQL databases, their differences, and which option would be best for your situation.
Application developers in the twenty-first century face a dizzying bevy of database decisions. There are hundreds of different databases available to choose from, and while not all are in the "no one was ever fired for buying it" category, many are solid pieces of general-purpose technology. On the other hand, almost every commercially backed database can claim some important-sounding customers as references, regardless of how niche the database itself is.
In order to make some sense of the landscape, it's helpful to have a taxonomy handy. For better or worse, the most popular taxonomy from the past 10 years divides the landscape into two classes: SQL (relational databases) and NoSQL (everything else).
It's a heavy-handed distinction, akin to dividing a grocery store in to "produce" and "not produce," but it does have important implications for building and maintaining software. Let's take a deeper look at what's intended by these two monikers and see what they really mean for application developers.
What is a SQL database?
In short, SQL databases support SQL—a domain-specific language for querying and manipulating data in a relational database. The "relational" in a relational database refers to the "relational model" of data management devised by IBM researcher E.F. Codd in the early 1970s and popularized in a number of subsequent database systems starting with System R.
The key to the relational model is abstracting data as a set of tuples organized into relations, which allows for abstraction over the physical representation of data and access paths. While SQL is not the only possible language for implementing query over the relational model—in fact it does not conform strictly to Codd's original design—it's by far the most popular.
SQL and relational databases have been the industry standard since the late 1970s, though their so-called "navigational" predecessors like the Apollo-era IMS are, in some cases, still under active development. Most of the popular "enterprise" systems are direct descendants of System R and inherit much of its design constraints.
"NewSQL" and "Distributed SQL"
As a response to the "throwing the baby out with the bathwater" problems with NoSQL (see below) in the early 2010s, several organizations began building relational/SQL-based systems that made different tradeoffs, particularly with regard to horizontal scalability. This led down two largely distinct paths:
- NewSQL: A play on NoSQL, these systems typically take existing relational databases and layer distributed logic on top with varying degrees of transparency to the user. Citus and Vitess are two notable examples of "NewSQL"-style distributed engines.
- Distributed SQL: These systems take a "ground-up" approach in building horizontally scalable relational engines. CockroachDB and Google's Spanner are good examples here. These engines usually aim higher than their NewSQL counterparts. It's important to note, though, that one of the motivations for both NoSQL and NewSQL is the fact that it's very expensive to build a full-fledged relational database, and the maturity of commercially available distributed SQL systems often reflects this.
SQL databases: Pros and cons
- Reduced data storage footprint due to normalization and other optimization opportunities. Often results in better performance and more efficient use of resources.
- Strong and well-understood data integrity semantics through ACID (Atomicity, Consistency, Isolation, Durability).
- Standard access to data via SQL.
- Generally more flexible query support capable of handling a broader range of workloads. SQL abstracts over the underlying implementation and allows the engine to optimize queries to fit their on-disk representation.
- Rigid data models that require careful up-front design to ensure adequate performance and resist evolution—changing a schema will often include downtime
- Scaling horizontally is challenging—either completely unsupported, supported in an ad-hoc way, or only supported on relatively immature technologies
- Non-distributed engines are generally a "single point of failure" that must be mitigated by replication and failover techniques; no illusion of infinite scalability
Examples of SQL databases
What is a NoSQL database?
Unfortunately, it's not very clear! At one point "NoSQL" implied that a database didn't support SQL. That was unhelpful enough—if it doesn't support SQL, what does it support?—but to make matters worse, it eventually evolved to mean "Not only SQL," much to the chagrin of developers actually trying to get work done.
While the movement clearly had roots in graph, document, and key-value stores dating back to the early 1990s, NoSQL really started taking off in the mid-2000s. Inspired by the publication of industry research papers on non-relational systems such as Google's BigTable and Amazon's Dynamo, a cottage industry of startups and open source projects sprang up developing database systems that explored the design space outside of the relational model. This was largely aimed at solving two perceived problems with existing systems:
- Lack of horizontal scalability
- Rigidity of table design in relational systems
Note that neither of these problems have all that much to do with SQL, but instead, reflect design decisions and constraints of popular relational databases. While the relational database community has in part responded to this challenge (see above on "NewSQL"), once the gates were open, so to speak, new databases began appearing at a rapid clip. The result is a proliferation of systems that each addresses the fundamental problem—storing some bits and making them available later—in a slightly different way.
In many ways, this is a boon for developers. It's certainly true that not all applications have relational database-shaped problems or need to make the tradeoffs that relational databases impose on data and availability models. This freedom does not come without a cost, however—in order to make a good technological decision between NoSQL databases a developer must be armed with a thorough understanding of the full design space so that the tradeoffs made by a particular system are clear. In other words, you don't want to accidentally give up consistency isolation when your application actually requires it.
NoSQL databases: Pros and cons
Summarizing the pros and cons of NoSQL is challenging for just this reason. The space has been well-explored and the range of options available is enormous. Some general pros and cons—which may not all apply to all NoSQL stores—include the following:
- Scalable and highly available—many NoSQL databases are generally designed to support seamless, online horizontal scalability without significant single points of failure.
- Flexible data models—most non-relational systems do not require developers to make up-front commitments to data models; what schemas do exist can often be changed on the fly.
- High performance—by limiting the range of what the database can do (for example, by relaxing durability guarantees) many NoSQL systems are able to achieve extremely high levels of performance.
- High-level data abstractions—moving beyond the "value in a cell" data model, NoSQL systems can provide high-level APIs for powerful data structures. Redis, for example, includes a native-sorted set abstraction
- Vague interpretations of ACID constraints—despite widespread claims of ACID support for NoSQL systems, the interpretation of ACID is often made so broad that not much can be gleaned about the semantics of the database in question. For example, what does "isolation" mean without transactions?
- Distributed systems have distributed systems problems. While not unique to NoSQL systems, it's the norm, rather than the exception, for developers programming against NoSQL to deeply understand, e.g., CAP Theorem and its interpretation by the database in question.
- Lack of flexibility in access patterns—the relational/SQL abstraction gives the database engine broad powers to optimize queries for the underlying data; without that abstraction, the on-disk representation of data leaks in to the application's queries and leaves no room for the engine to optimize.
Examples of NoSQL databases
SQL vs. NoSQL: When to use each
When to use SQL
When you have relational data, this is the natural fit, of course. But you may be asking yourself how you identify the "mythical" natural fit. Well, when you look at your data, do you see distinct entities with well-defined relationships with one another that must be strictly enforced and/or navigable? If so, we have match!
When your focus is on data integrity, relying on a tried and true relational databases is a good bet. When you want flexible access to your data, the relational model and SQL allow for much greater support of ad-hoc queries. Additionally, databases like PostgreSQL have added excellent support for NoSQL-style workloads with features like native JSON data types. If you don't need the scale-out capabilities of NoSQL data stores, they can be a good fit for some non-relational workloads as well. This makes them a great Swiss army knife when you have some relational data and some unstructured data, but don't want to buy the complexity of working with different types of data stores.
While many people look toward NoSQL for simplicity, it's important to understand the implications of those data stores when building your application. While it is true that they are easy to get started with, it is critical to understand the implications of write consistency (or lack thereof), eventual consistency, and impacts of sharding on how you plan to access the data in the future. Relational databases can be simpler to build a reliable application on as they free you from worrying about such concerns.
When to use NoSQL
NoSQL is appealing when you have highly flexible data models or very specific needs that don't fit into the relational model. If you are taking in a lot of unstructured data, a document database like MongoDB or CouchDB can be a nice fit. If you need very fast access to key-value data but can live without strong integrity guarantees, Redis is a great fit. Complex or flexible search across a lot of data? Elasticsearch is a great fit.
NoSQL data stores tend to be highly scalable, and scaling out is a core tenet of many of these systems. Built-in sharding makes scaling reads and writes out much easier than doing so with a relational database. Relatedly, NoSQL systems can often meet very high availability requirements. Databases like Cassandra have no single points of failure and your applications can trivially react to underlying failures of individual members.
Conclusion and next steps
Choosing or recommending a database is a nontrivial exercise, even for database experts. The SQL vs. NoSQL division is a useful rubric for helping inform that decision, but ultimately, there's no substitute for thinking hard about the data needs of your application and the tradeoffs you're willing to accept to achieve performance or uptime goals.
If nothing else, the fact that NoSQL exists is a boon for systems—it's an excuse to explore the design space and find the sweet spots that solve real-world application problems. Even so, in 2020, there are innumerable reasons to keep choosing SQL.
IBM Cloud supports cloud-hosted versions of a number of SQL and NoSQL databases through the IBM Cloud Databases offerings. For more guidance on how pick the right option for you, see "A Brief Overview of the Database Landscape" and "How to Choose a Database on IBM Cloud."