Everything you need to know about NoSQL, a type of database design that offers more flexibility than traditional databases.
What is a NoSQL database?
NoSQL, which stands for “not only SQL,” is an approach to database design that provides flexible schemas for the storage and retrieval of data beyond the traditional table structures found in relational databases. While NoSQL databases have existed for many years, NoSQL databases have only recently become more popular in the era of cloud, big data and high-volume web and mobile applications. They are chosen today for their attributes around scale, performance and ease of use. The most common types of NoSQL databases are key-value, document, column and graph databases.
It's important to emphasize that the "No" in "NoSQL" is an abbreviation for "not only" and not the actual word "No." This distinction is important not only because many NoSQL databases support SQL like queries, but because in a world of microservices and polyglot persistence, NoSQL and relational databases are now commonly used together in a single application.
NoSQL vs SQL
NoSQL databases do not follow all the rules of a relational database —specifically, it does use a traditional row/column/table database design and does not use structured query language (SQL) to query data.
To better understand, let’s go back to the advent of the first databases designed for the masses, which appeared around 1960. Those databases included database management systems (DBMS) to allow users to organize large quantities of data.
The original DBMSs were flat-file/comma-delimited, often proprietary to a particular application, and limited in the relationships they could uncover among data. DBMSs were also complex.
This eventually led to the development of relational database management systems (RDBMSs). Relational databases arranged data in tables that could be connected or related by common fields, separated from applications, and queried with SQL. In other words, the relational database placed data into tables, and SQL created an interface for interacting with it.
Relational databases and SQL work well for large servers and storage mediums. But as larger sets of frequently evolving, disparate data became more common for things like e-commerce applications, programmers needed something more flexible than SQL. NoSQL is that alternative.
NoSQL databases are built for specific data models and have flexible schemas that allow programmers to create and manage modern applications. NoSQL is also more agile because it’s not built on the concept of tables and does not use SQL to manipulate or analyze data (although some NoSQL databases may have SQL-inspired query language).
NoSQL encompasses structured data (code in a specific format, written in such a way that search engines understand it), semi-structured data (data that contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data), unstructured data (information that either does not have a pre-defined data model or is not organized in a pre-defined manner), and polymorphic data (data that can be transformed to any distinct data type as required).
NoSQL enables you to be more agile, more flexible, and to iterate more quickly. NoSQL database enables simpler design, better control over availability and improved scalability.
Today’s cloud providers can support SQL or NoSQL databases. Which database you choose depends on your goals.
To learn more about the state of databases, see “A Brief Overview of the Database Landscape.”
Types and examples
A NoSQL database can manage information using any of four primary data models:
In the key-value structure, the key is usually a simple string of characters, and the value is a series of uninterrupted bytes that are opaque to the database. The data itself is usually some primitive data type (string, integer, array) or a more complex object that an application needs to persist and access directly.
This replaces the rigidity of relational schemas (schemas are basically a blueprint of how tables work) with a more flexible data model that allows developers to easily modify fields and object structures as their applications evolve. In general, key-value stores have no query language. They simply provide a way to store, retrieve, and update data using simple GET, PUT and DELETE commands. The simplicity of this model makes a key-value store fast, easy to use, scalable, portable, and flexible.
A document is an object and keys (strings) that have values of recognizable types, including numbers, Booleans, and strings, as well as nested arrays and dictionaries. Document databases are designed for flexibility. They aren’t typically forced to have a schema and are therefore easy to modify. If an application requires the ability to store varying attributes along with large amounts of data, document databases are a good option. MongoDB and Apache CouchDB are examples of popular document-based databases.
Column-based (also called ‘wide column’) models enable very quick data access using a row key, column name, and cell timestamp. The flexible schema of these types of databases means that the columns don’t have to be consistent across records, and you can add a column to specific rows without having to add them to every single record. The wide, columnar stores data model, like that found in Apache Cassandra, are derived from Google's BigTable paper.
The modern graph database is a data storage and processing engine that makes the persistence and exploration of data and relationships more efficient. In graph theory, structures are composed of vertices and edges (data and connections), or what would later be called “data relationships.” Graphs behave similarly to how people think—in specific relationships between discrete units of data. This database type is particularly useful for visualizing, analyzing, or helping you find connections between different pieces of data. As a result, businesses leverage graph technologies for recommendation engines, fraud analytics, and network analysis. Examples of graph-based NoSQL databases include Neo4j and JanusGraph.
Examples of NoSQL databases
Many NoSQL databases were designed by young technology companies like Google, Amazon, Yahoo, and Facebook to provide more effective ways to store content or process data for huge websites. Some of the most popular NoSQL databases include the following:
- Apache Cassandra, an open source, wide-column store database designed to manage large amounts of data across multiple servers and clustering that spans multiple data centers
- MongoDB, an open source document-based database that uses JSON-like documents and schema, and is the database component of the MEAN stack
- Redis, a powerful in-memory key value store used for session caching, message queues, and other specific applications
- Elasticsearch, a document-based database that includes a full-text search engine
When to use NoSQL
Relational databases have been around for over 25 years, and technology has changed dramatically since then. A relational database uses SQL to perform tasks like updating data in a database or to retrieve data from a database. Some common relational database management systems that use SQL include Oracle, Db2, and Microsoft SQL Server. Maintaining high-end, commercial relational database management systems are expensive because they require purchasing licenses, trained manpower to manage and tune them, and powerful hardware.
NoSQL enables faster, more agile storage and processing, which means NoSQL databases are generally a better fit for modern, complex applications like e-Commerce sites or mobile applications.
NoSQL database’s horizontal scaling and flexible data model means they can address large volumes of rapidly changing data, making them great for agile development, quick iterations, and frequent code pushes.
In a nutshell, the difference between relational databases and NoSQL databases are performance, availability, and scalability.
Some specific cases when NoSQL databases are a better choice than RDBMS include the following:
- When you need to store large amounts of unstructured data with changing schemas. NoSQL databases usually have horizontal scaling properties that allow them to store and process large amounts of data. And NoSQL enables ad-hoc schema changes. (In contrast, with a relational database, an engineer designs the data schema up front, and SQL queries are then run against the database; if subsequent schema changes are required, they’re often difficult and complex to carry out.)
- When you’re using cloud computing and storage. Most NoSQL databases are designed to be scaled across multiple data centers and run as distributed systems, which enables them to take advantage of cloud computing infrastructure—and its higher availability—out of the box. (For more, refer to “How to Choose a Database on the IBM Cloud.”)
- When you need to develop rapidly. NoSQL is often the data store of choice for Agile software development methods, which require very short sprint cycles. With NoSQL, you don’t have to prepare data like you do if you’re using a relational database, and instead of having to migrate structured data every time the application design changes, a dynamic NoSQL schema can evolve with the application.
- When a hybrid data environment makes sense. NoSQL is sometimes taken to mean not only SQL, which means that it can complement or sit alongside a relational database and provide the flexibility to choose the best tool for the job. For example, Craigslist hosts its active listings in a relational database, but manages its archives in a lower-overhead document-based NoSQL store.
Microservices, polyglot persistence and NoSQL
Part of the reason microservices are attractive is that they eliminate the need for a single, shared data store for an entire application. Instead, the application has many, loosely coupled and independently deployable services, each with their own data model and database.
The pattern of using multiple databases within a single application, also known as polyglot persistence, has helped to create space in the market for NoSQL databases to thrive. Today, developers can leverage the right database for the right microservice without trying to make everything work in the context of a single, relational database.
Conversely, the constraints associated with using a single, relational database for every component of an application, when better alternatives existed for specific components, is something that helped to create the need for microservices architectures.
In this sense, the rise of microservices and NoSQL as mutually reinforcing trends, because each has helped to create the market for the other.
NoSQL and IBM Cloud
Today, many applications are delivered as services, and those services must be available 24/7, accessible from a wide range of devices, and scaled to what can potentially be millions of users.
NoSQL was created to manage the scale and agility challenges that face modern applications, but the suitability of a database depends on the problem it must solve. SQL and NoSQL are each suited to different use cases, so which tool to use depends more on what you are trying to accomplish. Further, over the past few years, SQL technologies like PostgreSQL have been bridging the gap between NoSQL and SQL by offering JSON support or scale-out capabilities. With IBM Cloud Databases for PostgreSQL, IBM offers enterprise-ready, fully managed PostgreSQL built with native integration into the IBM Cloud.
IBM Cloudant, in particular, is a scalable JSON document database optimized for web, mobile, IoT, and serverless applications. The service is compatible with an open source ecosystem that includes Apache CouchDB, PouchDB, and libraries for the most popular web and mobile development stacks.
Sign up for an IBMid and create your IBM Cloud account.