Exploring the relationship of serverless technology and big data and analytics.

Big data has been around the past few decades in traditional form factors of big data systems, such as a data warehouses. Starting around the year 2000, however, Hadoop helped expand these integrated systems to be more open in terms of the data and analytics that could be supported.

In this lightboarding video, I trace the evolution of big data and analytics and explain how a new trend around serverless technology has made a big impact.

Learn more

Video Transcript

Big data analytics and serverless technology

Hello, this is Torsten Steinbach, Architect here at IBM for Data and Analytics in the Cloud, and today, I’m going to talk to you about serverless technology and how it is applied to big data analytics.

Data warehouses

When we look at big data in the past decades, we can see that there has been—well, there is a traditional form factor of big data systems that has been used for many decades already, and this is the form factor of a data warehouse.

So, this is a highly integrated system—highly optimized for handling big data queries, big data analytics in a very efficient manner.


Nevertheless, we had (around the year 2000) Hadoop coming up and being adopted very rapidly and gaining a lot of popularity in this now widely adopted industry.

Even though there was already big data analytics, so why is that Hadoop came up? So, this is because it brought—in addition to this integrated system—more openness to the table. More openness, in terms of the type of data that it could handle, data formats, bring-your-own-data formats, the types of analytics, analytics libraries, and languages that can be supported. And also, the flexibility in terms of the hardware, the deployment options that you can have. You can bring your custom hardware—even heterogeneous hardware.

So, that’s why Hadoop basically gained a lot of traction and is now widely adopted.

The rise of cloud and big data analytics

Today, however, we are seeing a trend that basically results in yet another form factor of doing big data analytics, and this trend is driven by, actually, one thing that is happening, which is the era of the rise of cloud.

Consumer behavior and the sharing economy

And another thing to actually goes hand in hand a little bit with the rise of cloud is the consumption behavior of many people—of end users—to be more oriented on the sharing economy. So, people are using more and more ride shares instead of just renting a car and not to speak of buying a car just to get around. Or, they are just going with Airbnb to sleep a night somewhere.

Serverless as the sharing economy for IT

So, this consumer behavior is also applied now to IT.  And this term serverless is actually explained as this: serverless is, in fact, the sharing economy for IT. And it is it is enabled by cloud.

And it is, in fact, the most consequent usage model of cloud—serverless.

Functions-as-a-Service (FaaS)

Many of you have heard the term serverless, and probably most of you will associate a thing called Functions-as-a-Service with serverless. Many of you may think it’s synonymous, which is not exactly true, but that is what basically what many people think of. 

Functions-as-a-Service is: I have my code that I need to run—my business logic—but I don’t provision dedicated systems, dedicated hardware, or not yet not even dedicated software; I’m just sending it to the server and saying, please run it for me. Run it for me maybe that many times. 

So, how to scale out, and it’s all done ad-hoc. It’s, basically, hiding the fact that there are servers. That’s why it’s called serverless.

Big data and analytics

Now, as I said, this is what many people think of when they hear the term serverless, but serverless is more than just Functions-as-a-Service. Especially when we now look back again here at our domain here, which is data—big data and analytics.

The problem with big data analytics is that we are talking about state. State has to be kept—my data has to be kept safely and durable and reliably. I need to be able to access it anytime I want it. And that’s what these systems provide.

Data storage in the cloud

But now in the cloud, we have new options. We can actually abstract the storage of data itself as a cloud service on its own.

That’s also what’s happening on the cloud, and there is, basically, cloud-native storage of object storage.

Object storage is, basically, serverless storage because you do not provision disk volumes, you do not configure disk volumes—you just bring your data and the system figures out how to store it and how to distribute it to make it highly available and so on.

It’s highly abstracted—you just have a REST API where you upload and download your data. You can come with kilobytes of data, going up to terabytes of data, in the same organizational unit.

Pay-as-you-go consumption model

And to think about why it is serverless—it is also that it’s a pay-as-you-go consumption model. You just don’t use it as you go, you also to pay as you go, which means you’re just paying for the gigabytes if you’re storing at this point right now. And if you store less, you will be paying less in a very elastic, completely seamlessly elastic way.

Analyzing and processing data

Now, when we talk about big data analytics, it’s not just about storage of data but, also, how can we analyze this data and process this data. And that’s exactly what we are now seeing as well driven by cloud; we are seeing additional services that are made available around object storage such as SQL-as-a-Service or, also, it allows you to run SQL, basically, on the data in object storage and just be built for this one SQL, depending on how big the SQL was in terms of data it had to scan. And you do not pay for database that is provisioned and standing around—just a single SQL and that’s it. 

And there are other things that basically play in, like, for instance, Messaging-as-a-Service—Kafka-as-a-Service—where you are just paying by the number of messages being processed and then eventually stored to the object storage.

Complementary big data and analytics form factors

So there’s a series of these services basically coming up, and, in combination, they are providing this new form factor of a big data and analytics system that is augmenting and actually complementing the existing form factors because even though they are more established and older, there is still a point for using them. They have their sweet spots in terms of their own performance characteristics and response time guarantees, but, on the other side, there are maybe cost-effectiveness benefits here. 

So, depending on your business model and requirements, you may use this or this or the combination of those things.

So, I hope this helps to put in perspective how serverless play into big data analytics and how it basically generates a whole new form factor with big data and analytics systems.


More from Cloud

IBM Cloud VMware as a Service introduces multitenant as a new, cost-efficient consumption model

4 min read - Businesses often struggle with ongoing operational needs like monitoring, patching and maintenance of their VMware infrastructure or the added concerns over capacity management. At the same time, cost efficiency and control are very important. Not all workloads have identical needs and different business applications have variable requirements. For example, production applications and regulated workloads may require strong isolation, but development/testing, training environments, disaster recovery sites or other applications may have lower availability requirements or they can be ephemeral in nature,…

IBM accelerates enterprise AI for clients with new capabilities on IBM Z

5 min read - Today, we are excited to unveil a new suite of AI offerings for IBM Z that are designed to help clients improve business outcomes by speeding the implementation of enterprise AI on IBM Z across a wide variety of use cases and industries. We are bringing artificial intelligence (AI) to emerging use cases that our clients (like Swiss insurance provider La Mobilière) have begun exploring, such as enhancing the accuracy of insurance policy recommendations, increasing the accuracy and timeliness of…

IBM NS1 Connect: How IBM is delivering network connectivity with premium DNS offerings

4 min read - For most enterprises, how their users access applications and data is an essential part of doing business, and how they service those application and data responses has a direct correlation to revenue generation.    According to We Are Social’s Digital 2023 Global Overview Report, there are 5.19 billion people around the world using the internet in 2023. There’s an imperative need for businesses to trust their networks to deliver meaningful content to address customer needs.  So how responsive is the…

IBM Cloud Databases for MongoDB (Enterprise Edition): Changes to backup functionality

< 1 min read - We are announcing that IBM Cloud Databases for MongoDB (Enterprise Edition) will no longer support the creation of On Demand backups beginning on March 1, 2024. On Demand backups are being replaced by the recently deployed Point in Time Recovery (PITR) capabilities in the Enterprise Edition of our popular fully managed MongoDB service. With PITR, you can restore a copy of your database to any point in the past seven days. This gives you granular access to the past state…