VIDEO – Using SQL to Combine Databases and Object Storage

4 min read

What do we mean by a "SQL Sandwich"?

In my latest lightboarding video, I'm going to explain the concept of a "SQL Sandwich" and go through the architecture. Using this framework, I'll cover the many benefits you can gain from using SQL combined with databases and object storage.

Learn more

Video Transcript

What is the "SQL Sandwich"?

Hello, this is Torsten Steinbach, Architect at IBM for Data and Analytics in the Cloud, and I'm going to talk to you today about the "SQL Sandwich."

This is actually about databases and object storage. So, as you can see in the image, we're putting it together as a nice and tasty sandwich.

The SQL Sandwich

Components of the SQL Sandwich

So, at the center of the sandwich, we have our data warehouse.

So, this is a very well-established mechanism of storing and analyzing data—big data—but we’re surrounding it now with more recent technology that it is especially popular in the cloud—and this is object storage.

We have it as the bonds basically for a sandwich at the top and at the bottom.

Data warehouses and object storage

Storing raw data in object storage

Now, what's the purpose of having all of these components? What we have actually at the top is our initial place where we are landing and storing all of our raw data.

Storing raw data in object storage.

So, this might be things like log messages from applications, IoT messages from devices that are just coming in and in a pretty raw format—we are able to just store them in the object storage for very little money in a highly elastic manner.

Storing high-quality data in the data warehouse

And in the data warehouse, however, we want to have high-quality data, because the data warehouse is a much more sophisticated but also much more expensive component to operate, to host, and to purchase.

Storing high-quality data in the data warehouse

Storing archived data in object storage

And at the bottom again, why do we have another object storage at the bottom?

Well, this is where we have our archived data.

Storing archived data in object storage

Exploring the data in object storage

Now, all of these components basically make sense in an end-to-end, big data analytics use case because we have new data arriving and being stored in the object storage, and here in object storage basically we are going to explore the data.

So, we try to find out what's actually in the data. And we prep the data, we prepare the data, cleanse it, make it higher quality, more curated—and we also conduct to some extent batch analytics directly at the data that is stored in the object storage.

Exploring the data in object storage

Interactive analytics in the data warehouse

In a data warehouse, however, we are basically able to do interactive analytics that require certain SLAs for latencies, for performance response times, and so on.

Interactive analytics in the data warehouse

Cost savings with archived data

And why are you having this archived data, again? Well, this is basically because you do not want to keep all of the data for years around in a data warehouse because—as I told you—this is the more expensive thing to run with all of these components.

So, it makes sense for the data that is not hot anymore, and is not required for your daily business, to archive it off again into an object storage.

So, for instance, you can still run things like compliance reports that you're required to do to be compliant with certain regulations.

Compliance

An ETL mechanism to glue the components together

So, these are the different types of analytics and things that you want to do with the data at the different stages of its life cycle basically.

Of course, a real sandwich should not be this dry so you will have some "sauce" in there.

So, what is the sauce that glues these things together? It is basically an ETL mechanism.

An ETL mechanism to glue the components together

It allows you, basically, to read data from here and transform it so that it can be inserted here and same from here to here. So, basically, data is traveling this way. 

An ETL mechanism to glue the components together

Pulling everything together

Now, finally, you see that a sandwich is often something that you do not eat in pieces, right? You do not just eat the patty or just the bun. You eat it as a whole thing.

So, we consume it as a whole thing, and that's also the same analogy that holds here—that while it makes sense to prepare a system that can serve these different types of workloads in the most efficient way and cost-effective way possible, it's, of course, hard if you always have to think about do I have to go here, here, or here for this query.  

It should be automatically figured out by the system. And for that reason we are putting this all into a nice box, putting it all together.

And we are using Federation as a mechanism on top of all of that basically virtualizes the location of the data depending on its age or state of the pipeline that it is in.

The SQL Sandwich?

Now finally it is called a SQL Sandwich—why is it called a SQL Sandwich?

Well, because SQL is the essential thing for all of these things that you can see here. SQL is used for basically doing these analytics; these are SQL queries that we are running here. 

SQL is also used to do detail.

And finally, SQL is also used in order to federate those things together, It's an SQL federation.

Why is it called a SQL Sandwich?

Okay, that's our SQL sandwich and I hope this helps you to put in perspective the roles of object storage and data warehouses and how these two things can be put together into this nice and tasty format so that you can get the most out of the technology in combination.

Be the first to hear about news, product updates, and innovation from IBM Cloud