In this essential guide, learn about how relational databases work and how they compare to other database options.
What is a relational database?
A relational database organizes data into tables which can be linked—or related—based on data common to each. This capability enables you to retrieve an entirely new table from data in one or more tables with a single query. It also allows you and your business to better understand the relationships among all available data and gain new insights for making better decisions or identifying new opportunities.
For example, imagine your company maintains a customer table that contains company data about each customer account and one or more transaction tables that contain data describing individual transactions.
The columns (or fields) for the customer table might be Customer ID, Company Name, Company Address, etc.; the columns for a transaction table might be Transaction Date, Customer ID, Transaction Amount, Payment Method, etc. The tables can be related based on the common Customer ID field. You can, therefore, query the table to produce valuable reports, such as a consolidated customer statement.
Report generators take these queries and run them on demand to create formal reports. Many of the documents businesses run to track inventory, sales, finance, or even perform financial projections come from a relational database operating behind the scenes.
What is SQL?
You can communicate with relational databases using Structured Query Language (SQL), the standard language for interacting with management systems. SQL allows the joining of tables using a few lines of code, with a structure most nontechnical employees can learn quickly.
With SQL, analysts do not need to know where the order table resides on disk, how to perform the lookup to find a specific order, or how to connect the order and customer tables. The database compiles the query and figures out the correct data points.
The primary benefit of the relational database approach is the ability to create meaningful information by joining the tables. Joining tables allows you to understand the relationships between the data, or how the tables connect. SQL includes the ability to count, add, group, and also combine queries. SQL can perform basic math and subtotal functions and logical transformations. Analysts can order the results by date, name, or any column.
Those features make the relational approach the single most popular query tool in business today.
Relational databases have several advantages compared to other database formats:
SQL has its a built-in language for creating tables called Data Definition Language (DDL). DDL allows you to add new columns, add new tables, rename relations, and make other changes even while the database is running and while queries are happening. This allows you to change the schema or how you model data on the fly.
Relational databases eliminate data redundancy. The information for a single customer appears in one place—a single entry in the customer table. The order table only needs to store a link to the customer table. The practice of separating the data to avoid redundancy is called normalization. Progressional database designers make sure the tables normalize during the design process.
Ease of backup and disaster recovery
Relational databases are transactional—they guarantee the state of the entire system is consistent at any moment. Most relational databases offer easy export and import options, making backup and restore trivial. These exports can happen even while the database is running, making restore on failure easy. Modern, cloud-based relational databases can do continuous mirroring, making the loss of data on restore measured in seconds or less. Most cloud-managed services allow you to create Read Replicas, like in IBM Cloud Databases for PostgreSQL. These Read Replicas enable you to store a read-only copy of your data in a cloud data center. Replicas can be promoted to Read/Write instances for disaster recovery as well.
Examples of relational databases
Many database products implement the relational database model. They either use a SQL database for processing or can at least process SQL statements for requests and database updates. These databases range from small, desktop systems to massive cloud-based systems. They can be either open source and internally supported, open source with commercial support, or commercial closed-source systems. Here are a few of the more popular ones:
MySQL is a common and easy to start a low-memory/disk/CPU database. It supports all the basic SQL commands, along with transactions and Atomicity, Consistency, Isolation, Durability (ACID) performance. MySQL is the most common database integrated with WordPress sites.
PostgreSQL is also open source. It provides enterprise features such as security, scalability, and support for more automation through a command-line interface, as well as direct access over the web. PostgreSQL supports stored procedures, which is a more complex programming language built on top of SQL. Teams can use stored procedures to do data extraction, transform, and load between systems. Examples of this use case include claims processing for insurance processing and order processing for complex orders. Postgres also works with qGIS or Geo Server to store and save global information.
Db2 is a commercially supported relational database from IBM that is available as fully-managed, configured, and optimized cloud offering.
Relational vs. NoSQL
NoSQL databases provide ways of storing, searching and retrieving data other than in tables with relationships. Here are a few of the key distinctions:
The most popular form of NoSQL database is a simple key-value pair. Key-value pairs place all the data in a single table. The key is a unique element. For employees or customers, the key might be an email address. For orders, it would be the order number. As long as every key is unique, a key-value pair can store all the relations in one "table."
Availability over consistency
When computers run over a network, they invariably need to decide to prioritize consistent results (where every answer is always the same) or high uptime, called "availability." This is called the "CAP Theory," which stands for Consistency, Availability, or Partition Tolerance. Relational databases ensure the information is always in-sync and consistent. Some NoSQL databases, like Redis, prefer to always provide a response. That means the information you receive from a query may be incorrect by a few seconds—perhaps up to half a minute. On social media sites, this means seeing an old profile picture when the newest one is only a few moments old. The alternative could be a timeout or error. On the other hand, in banking and financial transactions, an error and resubmit may be better than old, incorrect information.
While relational databases have a predefined set of rows and columns, NoSQL databases are structured more like a document. If you know where the information is in the document, you can retrieve it. If the material is still in the document, you can add other, structured information, without having to change the rules. If done correctly, the original queries will still work, making it easier to extend the database without formal updates.
Other NoSQL database formats include document story, graph, and object databases. A broader term for NoSQL databases is non-relational.
Relational vs. non-relational databases
Some examples of specific non-relational databases include the following:
Redis is a common NoSQL database, frequently used for the sort of data storage we discussed earlier. By doing just one lookup and no joins or relations, then moving processing to the web browser, Redis provides a rapid response. Response time can be critical for highly competitive modern websites.
JanusGraph is a graph database designed to store and process queries for large graphs. These large graphs contain hundreds of billions of edges and vertices, stored in a multi-machine cluster. JanusGraph can support thousands of simultaneous queries and scale up with new hardware. The kind of data JanusGraph stores might be appropriate for animation or aerodynamic modeling.
MongoDB is a document database that stores records as loosely structured documents. That might be perfect for storing web pages as documents, for an e-Commerce product catalog, or social network applications. Like Redis, MongoDB differs from relational models in that it does not strive for ACID compliance.
etcd is another key/value store, typically used for configuration data for distributed systems. The common etcd use case is cloud-based servers that scale up with demand and need a single database for configuration information.
RabbitMQ is an open source messaging queue. A message is typically some amount of text, with the structure defined by the client. Instead of records with a permanent shelf life, RabbitMQ allows one system to push messages into the queue so other applications can then pull messages when they want to. That asynchronous push/pull enables distributed applications with message passing.
Elasticsearch is a full-text search engine that interfaces through web APIs. That means the primary value is getting results back. An entire website can store itself in Elasticsearch. Look up the key-value pair, and you get the text of the website. The real power of Elasticsearch is the search enabling the website to add an accurate, fast search button. Elasticsearch can quickly and powerfully search for any large text structure.
For a full breakdown of the different types of databases available, see “A Brief Overview of the Database Landscape.”
A schema of a database is the structure defined in a formal language. In SQL, there are two formal languages: Data Definition Language (DDL) and Data Manipulation Language (DML).
To select data from a table, update it, or delete it, programmers use DML. To work on the database and to create and define the tables and relations, they use DDL or a tool to create the DDL.
DDL includes commands like
RENAME. The first DDL command is
CREATE TABLE. This command defines the primary key (which must be unique), the fields, how those fields store data, and any other constraints on the table. Once defined, you can establish a database as a combination of its definition language and the DML to
INSERT the rows into the tables.
Here's an example of a DDL command to create an employee table. As you'll see, a schema can include what columns are required, which must be unique (a key), and which must have a reference in other tables.
In this example, the jobs table describes every job title, description, and job level. A separate table, the pay_grades table, shows the salary for each job level. Integers are whole numbers; char(100) reserves up to one hundred characters for text.
CREATE TABLE employees ( employeeID integer UNIQUE NOT NULL, first_name char(100) NOT NULL, last_name char(100) NOT NULL, jobID int, birthdate date NOT NULL, governmentID char(9), FOREIGN KEY(jobID) REFERENCES jobs(jobID) );
What happens when a row is deleted from the jobs table? DDL allows the programmer to specify what to do in the case. The column in employees could become automatically updated to
NULL, as it will in this case, or it could be set to
casecade delete. For example, if a branch library is deleted due to a closure, the books they have not transferred out might be deleted as well.
Some relational databases offer the potential to have multiple schemas, with different tables, along with the ability to grant read and write permissions to them. Multiple schemas mean an HR user might have HR information for employees but not confidential employee information (and vice versa), all in the same database.
From a licensing perspective, relational databases vary in one important way open source databases vs. commercially supported databases:
- Open source databases have no cost to set up, but provide no commercial support "out of the box."
- Commercially supported databases include some level of support, but they cannot fix problems that are legitimate uses of the database, such as accidental deletes caused by bugs introduced by programmers.
You can host relational databases in the cloud, on local servers, on desktop machines, on private clouds, or even on Internet of Things (IoT) devices as long as you’re complying with the license.
How to create a relational database
A relational database needs two things to run—the database software itself and the Data Definition Language (DDL) code to create it.
To support a local installation, administrators need to look into all possibilities. For example, if the underlying operating system the database runs on needs an update, that could take the server down for hours. For high availability, the database will need to be replicated, with a copy staying up while the parent is down. Local databases need to consider how to scale with multiple users, how to take backups, and how to restore them.
The alternative is to run the databases in the cloud. With databases in the cloud, the database is automatically configured for high availability, meaning that your data replicates on multiple members, and each member sits on separate availability zones.
If an entire data center goes down, the database stays up. Data replicates across zones, so you can maintain all the data and continue operations in the event of an outage. Also, the vendor will manage security improvements and operating system and database version upgrades. Users flock to cloud-managed databases for out-of-the-box enterprise security, high availability, online scaling, and automatic backups.
Relational databases and IBM Cloud
IBM supports cloud-hosted versions of a number of relational databases.
- IBM Db2 on Cloud is a premier commercial relational database that is built for robust performance, providing a high-availability option with a 99.99 percent uptime SLA.
- IBM Cloud Databases for PostgreSQL provides enterprise-ready, fully managed PostgreSQL, built with native integration into the IBM Cloud.
- IBM Cloud Hyper Protect DBaaS for PostgreSQL is the next evolution level on how data is stored in a highly secured enterprise cloud service ideally suited for workloads with sensitive data