June 25, 2020 By Michael Factor 3 min read

The benefits of using object storage to store relational data.

There are many different classes of data—voice recordings, text, pictures, and tabular, among others. Tabular data, which is also known as rectangular or structured data, is of critical importance to businesses, but also to individuals. This class of data is pervasive. It is the records the bank keeps of our transactions, a history of purchases in a store, the grades a teacher gave to a class, or the rate of transmission of COVID-19 by location. In all of these cases, we can have a table of records that defines a relationship.

For the example of a teacher’s grades, the table may have columns for the student’s name, the grade on the first exam, the grade on the second exam, and the final grade for the class. Each record in the table defines a relation among a set of values, where the value of each column has a particular type and each row is an n-tuple with a value for each column. For instance, the grades may be whole numbers and the name a string. There is no inherent ordering to the rows in the table, but one can query them based upon various properties. Returning to the example of grades, the teacher might want to find all of the students whose final grade was above 90. Structured Query Language (SQL) is the way these tables are typically queried.

The relational model

This model of organizing data that has relations was first introduced by Edgar Codd, from IBM, fifty years ago. Initially, it was realized in integrated databases that were responsible for everything from the management of the storage up through the processing of the queries. Such a top-to-bottom integrated system provides the maximum opportunities for performance optimization but limits the ability to share data between different tools.

Over the past decade or so—first with open source frameworks such as Map Reduce, then with Spark, and now with more traditional databases, such as IBM Db2 Event Store—engines that are supporting the processing of relational tables are delegating the storage and management of the data to an object store, typically supporting the S3 API. These engines use a range of formats to organize the tabular data, such as Parquet, ORC, or CSV, and then support the same SQL queries that were supported by the original integrated databases.

Object storage and relational data

This use of object storage for storing the relational data brings several benefits (although, at the price of providing less opportunities for optimization):

  • It reduces the cost of developing the overall system by separation of concerns; the query engine no longer needs to address storing the data.
  • It maps naturally to cloud deployments, as all clouds have highly scalable cost-efficient object stores.
  • It enables much higher scalability in data size, as object stores are essentially infinitely scalable.
  • If an open format is used, it allows having different engines and tools to process the same data, allowing you to use the right tool for the job.

IBM has a range of such tools that help data professionals wrangle data and fully harness the power of ANSI SQL over this effectively limitless, yet economical, storage medium. On the IBM Cloud, SQL Query provides serverless execution of SQL for rectangular data stored in IBM Cloud Object Storage. SQL Query is a great tool for data exploration, preparation, transformation, and analytics.

For self-managed capacity hybrid or private cloud workloads, Db2 Big SQL provides a Db2-compatible SQL interface to Object Storage APIs and other data stores through the IBM Cloud Pak for Data on Red Hat OpenShift. And Spark, available with Cloud Pak for Data and via IBM Analytics Engine on the IBM Cloud, has built-in integration with object storage.

To get you started, we have provided free data lake architecture patterns for Big Data Modernization, Logging, and IoT use cases.

Learn more about the 50th anniversary of the relational model and other announcements.

Was this article helpful?
YesNo

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters