The benefits of using object storage to store relational data.

There are many different classes of data—voice recordings, text, pictures, and tabular, among others. Tabular data, which is also known as rectangular or structured data, is of critical importance to businesses, but also to individuals. This class of data is pervasive. It is the records the bank keeps of our transactions, a history of purchases in a store, the grades a teacher gave to a class, or the rate of transmission of COVID-19 by location. In all of these cases, we can have a table of records that defines a relationship.

For the example of a teacher’s grades, the table may have columns for the student’s name, the grade on the first exam, the grade on the second exam, and the final grade for the class. Each record in the table defines a relation among a set of values, where the value of each column has a particular type and each row is an n-tuple with a value for each column. For instance, the grades may be whole numbers and the name a string. There is no inherent ordering to the rows in the table, but one can query them based upon various properties. Returning to the example of grades, the teacher might want to find all of the students whose final grade was above 90. Structured Query Language (SQL) is the way these tables are typically queried.

The relational model

This model of organizing data that has relations was first introduced by Edgar Codd, from IBM, fifty years ago. Initially, it was realized in integrated databases that were responsible for everything from the management of the storage up through the processing of the queries. Such a top-to-bottom integrated system provides the maximum opportunities for performance optimization but limits the ability to share data between different tools.

Over the past decade or so—first with open source frameworks such as Map Reduce, then with Spark, and now with more traditional databases, such as IBM Db2 Event Store—engines that are supporting the processing of relational tables are delegating the storage and management of the data to an object store, typically supporting the S3 API. These engines use a range of formats to organize the tabular data, such as Parquet, ORC, or CSV, and then support the same SQL queries that were supported by the original integrated databases.

Object storage and relational data

This use of object storage for storing the relational data brings several benefits (although, at the price of providing less opportunities for optimization):

  • It reduces the cost of developing the overall system by separation of concerns; the query engine no longer needs to address storing the data.
  • It maps naturally to cloud deployments, as all clouds have highly scalable cost-efficient object stores.
  • It enables much higher scalability in data size, as object stores are essentially infinitely scalable.
  • If an open format is used, it allows having different engines and tools to process the same data, allowing you to use the right tool for the job.

IBM has a range of such tools that help data professionals wrangle data and fully harness the power of ANSI SQL over this effectively limitless, yet economical, storage medium. On the IBM Cloud, SQL Query provides serverless execution of SQL for rectangular data stored in IBM Cloud Object Storage. SQL Query is a great tool for data exploration, preparation, transformation, and analytics.

For self-managed capacity hybrid or private cloud workloads, Db2 Big SQL provides a Db2-compatible SQL interface to Object Storage APIs and other data stores through the IBM Cloud Pak for Data on Red Hat OpenShift. And Spark, available with Cloud Pak for Data and via IBM Analytics Engine on the IBM Cloud, has built-in integration with object storage.

To get you started, we have provided free data lake architecture patterns for Big Data Modernization, Logging, and IoT use cases.

Learn more about the 50th anniversary of the relational model and other announcements.


More from Cloud

IBM Cloud inactive identities: Ideas for automated processing

4 min read - Regular cleanup is part of all account administration and security best practices, not just for cloud environments. In our blog post on identifying inactive identities, we looked at the APIs offered by IBM Cloud Identity and Access Management (IAM) and how to utilize them to obtain details on IAM identities and API keys. Some readers provided feedback and asked on how to proceed and act on identified inactive identities. In response, we are going lay out possible steps to take.…

IBM Cloud VMware as a Service introduces multitenant as a new, cost-efficient consumption model

4 min read - Businesses often struggle with ongoing operational needs like monitoring, patching and maintenance of their VMware infrastructure or the added concerns over capacity management. At the same time, cost efficiency and control are very important. Not all workloads have identical needs and different business applications have variable requirements. For example, production applications and regulated workloads may require strong isolation, but development/testing, training environments, disaster recovery sites or other applications may have lower availability requirements or they can be ephemeral in nature,…

IBM accelerates enterprise AI for clients with new capabilities on IBM Z

5 min read - Today, we are excited to unveil a new suite of AI offerings for IBM Z that are designed to help clients improve business outcomes by speeding the implementation of enterprise AI on IBM Z across a wide variety of use cases and industries. We are bringing artificial intelligence (AI) to emerging use cases that our clients (like Swiss insurance provider La Mobilière) have begun exploring, such as enhancing the accuracy of insurance policy recommendations, increasing the accuracy and timeliness of…

IBM NS1 Connect: How IBM is delivering network connectivity with premium DNS offerings

4 min read - For most enterprises, how their users access applications and data is an essential part of doing business, and how they service those application and data responses has a direct correlation to revenue generation.    According to We Are Social’s Digital 2023 Global Overview Report, there are 5.19 billion people around the world using the internet in 2023. There’s an imperative need for businesses to trust their networks to deliver meaningful content to address customer needs.  So how responsive is the…