June 25, 2020 By Michael Factor 3 min read

The benefits of using object storage to store relational data.

There are many different classes of data—voice recordings, text, pictures, and tabular, among others. Tabular data, which is also known as rectangular or structured data, is of critical importance to businesses, but also to individuals. This class of data is pervasive. It is the records the bank keeps of our transactions, a history of purchases in a store, the grades a teacher gave to a class, or the rate of transmission of COVID-19 by location. In all of these cases, we can have a table of records that defines a relationship.

For the example of a teacher’s grades, the table may have columns for the student’s name, the grade on the first exam, the grade on the second exam, and the final grade for the class. Each record in the table defines a relation among a set of values, where the value of each column has a particular type and each row is an n-tuple with a value for each column. For instance, the grades may be whole numbers and the name a string. There is no inherent ordering to the rows in the table, but one can query them based upon various properties. Returning to the example of grades, the teacher might want to find all of the students whose final grade was above 90. Structured Query Language (SQL) is the way these tables are typically queried.

The relational model

This model of organizing data that has relations was first introduced by Edgar Codd, from IBM, fifty years ago. Initially, it was realized in integrated databases that were responsible for everything from the management of the storage up through the processing of the queries. Such a top-to-bottom integrated system provides the maximum opportunities for performance optimization but limits the ability to share data between different tools.

Over the past decade or so—first with open source frameworks such as Map Reduce, then with Spark, and now with more traditional databases, such as IBM Db2 Event Store—engines that are supporting the processing of relational tables are delegating the storage and management of the data to an object store, typically supporting the S3 API. These engines use a range of formats to organize the tabular data, such as Parquet, ORC, or CSV, and then support the same SQL queries that were supported by the original integrated databases.

Object storage and relational data

This use of object storage for storing the relational data brings several benefits (although, at the price of providing less opportunities for optimization):

  • It reduces the cost of developing the overall system by separation of concerns; the query engine no longer needs to address storing the data.
  • It maps naturally to cloud deployments, as all clouds have highly scalable cost-efficient object stores.
  • It enables much higher scalability in data size, as object stores are essentially infinitely scalable.
  • If an open format is used, it allows having different engines and tools to process the same data, allowing you to use the right tool for the job.

IBM has a range of such tools that help data professionals wrangle data and fully harness the power of ANSI SQL over this effectively limitless, yet economical, storage medium. On the IBM Cloud, SQL Query provides serverless execution of SQL for rectangular data stored in IBM Cloud Object Storage. SQL Query is a great tool for data exploration, preparation, transformation, and analytics.

For self-managed capacity hybrid or private cloud workloads, Db2 Big SQL provides a Db2-compatible SQL interface to Object Storage APIs and other data stores through the IBM Cloud Pak for Data on Red Hat OpenShift. And Spark, available with Cloud Pak for Data and via IBM Analytics Engine on the IBM Cloud, has built-in integration with object storage.

To get you started, we have provided free data lake architecture patterns for Big Data Modernization, Logging, and IoT use cases.

Learn more about the 50th anniversary of the relational model and other announcements.

Was this article helpful?
YesNo

More from Cloud

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

9 min read - As organizations strive to stay ahead of the curve in today's fast-paced digital landscape, mainframe application modernization has emerged as a critical component of any digital transformation strategy. In this blog, we'll discuss the example of a fictional US bank which embarked on a journey to modernize its mainframe applications. This strategic project has helped it to transform into a more modern, flexible and agile business. In looking at the ways in which it approached the problem, you’ll gain insights…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

24 IBM offerings winning TrustRadius 2024 Top Rated Awards

2 min read - TrustRadius is a buyer intelligence platform for business technology. Comprehensive product information, in-depth customer insights and peer conversations enable buyers to make confident decisions. “Earning a Top Rated Award means the vendor has excellent customer satisfaction and proven credibility. It’s based entirely on reviews and customer sentiment,” said Becky Susko, TrustRadius, Marketing Program Manager of Awards. Top Rated Awards have to be earned: Gain 10+ new reviews in the past 12 months Earn a trScore of 7.5 or higher from…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters