Exploring the relationship of serverless technology and big data and analytics.

Big data has been around the past few decades in traditional form factors of big data systems, such as a data warehouses. Starting around the year 2000, however, Hadoop helped expand these integrated systems to be more open in terms of the data and analytics that could be supported.

In this lightboarding video, I trace the evolution of big data and analytics and explain how a new trend around serverless technology has made a big impact.

Big Data Explained


Big Data Explained

Learn more

Video Transcript

Big data analytics and serverless technology

Hello, this is Torsten Steinbach, Architect here at IBM for Data and Analytics in the Cloud, and today, I’m going to talk to you about serverless technology and how it is applied to big data analytics.

Data warehouses

When we look at big data in the past decades, we can see that there has been—well, there is a traditional form factor of big data systems that has been used for many decades already, and this is the form factor of a data warehouse.

So, this is a highly integrated system—highly optimized for handling big data queries, big data analytics in a very efficient manner.


Nevertheless, we had (around the year 2000) Hadoop coming up and being adopted very rapidly and gaining a lot of popularity in this now widely adopted industry.

Even though there was already big data analytics, so why is that Hadoop came up? So, this is because it brought—in addition to this integrated system—more openness to the table. More openness, in terms of the type of data that it could handle, data formats, bring-your-own-data formats, the types of analytics, analytics libraries, and languages that can be supported. And also, the flexibility in terms of the hardware, the deployment options that you can have. You can bring your custom hardware—even heterogeneous hardware.

So, that’s why Hadoop basically gained a lot of traction and is now widely adopted.

The rise of cloud and big data analytics

Today, however, we are seeing a trend that basically results in yet another form factor of doing big data analytics, and this trend is driven by, actually, one thing that is happening, which is the era of the rise of cloud.

Consumer behavior and the sharing economy

And another thing to actually goes hand in hand a little bit with the rise of cloud is the consumption behavior of many people—of end users—to be more oriented on the sharing economy. So, people are using more and more ride shares instead of just renting a car and not to speak of buying a car just to get around. Or, they are just going with Airbnb to sleep a night somewhere.

Serverless as the sharing economy for IT

So, this consumer behavior is also applied now to IT.  And this term serverless is actually explained as this: serverless is, in fact, the sharing economy for IT. And it is it is enabled by cloud.

And it is, in fact, the most consequent usage model of cloud—serverless.

Functions-as-a-Service (FaaS)

Many of you have heard the term serverless, and probably most of you will associate a thing called Functions-as-a-Service with serverless. Many of you may think it’s synonymous, which is not exactly true, but that is what basically what many people think of. 

Functions-as-a-Service is: I have my code that I need to run—my business logic—but I don’t provision dedicated systems, dedicated hardware, or not yet not even dedicated software; I’m just sending it to the server and saying, please run it for me. Run it for me maybe that many times. 

So, how to scale out, and it’s all done ad-hoc. It’s, basically, hiding the fact that there are servers. That’s why it’s called serverless.

Big data and analytics

Now, as I said, this is what many people think of when they hear the term serverless, but serverless is more than just Functions-as-a-Service. Especially when we now look back again here at our domain here, which is data—big data and analytics.

The problem with big data analytics is that we are talking about state. State has to be kept—my data has to be kept safely and durable and reliably. I need to be able to access it anytime I want it. And that’s what these systems provide.

Data storage in the cloud

But now in the cloud, we have new options. We can actually abstract the storage of data itself as a cloud service on its own.

That’s also what’s happening on the cloud, and there is, basically, cloud-native storage of object storage.

Object storage is, basically, serverless storage because you do not provision disk volumes, you do not configure disk volumes—you just bring your data and the system figures out how to store it and how to distribute it to make it highly available and so on.

It’s highly abstracted—you just have a REST API where you upload and download your data. You can come with kilobytes of data, going up to terabytes of data, in the same organizational unit.

Pay-as-you-go consumption model

And to think about why it is serverless—it is also that it’s a pay-as-you-go consumption model. You just don’t use it as you go, you also to pay as you go, which means you’re just paying for the gigabytes if you’re storing at this point right now. And if you store less, you will be paying less in a very elastic, completely seamlessly elastic way.

Analyzing and processing data

Now, when we talk about big data analytics, it’s not just about storage of data but, also, how can we analyze this data and process this data. And that’s exactly what we are now seeing as well driven by cloud; we are seeing additional services that are made available around object storage such as SQL-as-a-Service or, also, it allows you to run SQL, basically, on the data in object storage and just be built for this one SQL, depending on how big the SQL was in terms of data it had to scan. And you do not pay for database that is provisioned and standing around—just a single SQL and that’s it. 

And there are other things that basically play in, like, for instance, Messaging-as-a-Service—Kafka-as-a-Service—where you are just paying by the number of messages being processed and then eventually stored to the object storage.

Complementary big data and analytics form factors

So there’s a series of these services basically coming up, and, in combination, they are providing this new form factor of a big data and analytics system that is augmenting and actually complementing the existing form factors because even though they are more established and older, there is still a point for using them. They have their sweet spots in terms of their own performance characteristics and response time guarantees, but, on the other side, there are maybe cost-effectiveness benefits here. 

So, depending on your business model and requirements, you may use this or this or the combination of those things.

So, I hope this helps to put in perspective how serverless play into big data analytics and how it basically generates a whole new form factor with big data and analytics systems.

More from Cloud

Connected products at the edge

6 min read - There are many overlapping business usage scenarios involving both the disciplines of the Internet of Things (IoT) and edge computing. But there is one very practical and promising use case that has been commonly deployed without many people thinking about it: connected products. This use case involves devices and equipment embedded with sensors, software and connectivity that exchange data with other products, operators or environments in real-time. In this blog post, we will look at the frequently overlooked phenomenon of…

6 min read

SRG Technology drives global software services with IBM Cloud VPC under the hood

4 min read - Headquartered in Ft. Lauderdale, Florida, SRG Technology LLC. (SRGT) is a software development company supporting the education, healthcare and travel industries. Their team creates data systems that deliver the right data in real time to customers around the globe. Whether those customers are medical offices and hospitals, schools or school districts, government agencies, or individual small businesses, SRGT addresses a wide spectrum of software services and technology needs with round-the-clock innovative thinking and fresh approaches to modern data problems. The…

4 min read

IBM Tech Now: May 30, 2023

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 77 This episode, we're covering the following topics: IBM Watson Code Assistant IBM Hybrid Cloud Mesh IBM Event Automation Stay plugged in You can check out the IBM Blog Announcements for a full rundown…

< 1 min read

Strengthening cybersecurity in life sciences with IBM and AWS

7 min read - Cloud is transforming the way life sciences organizations are doing business. Cloud computing offers the potential to redefine and personalize customer relationships, transform and optimize operations, improve governance and transparency, and expand business agility and capability. Leading life science companies are leveraging cloud for innovation around operational, revenue and business models. According to a report on mapping the cloud maturity curve from the EIU, 48% of industry executives said cloud has improved data access, analysis and utilization, 45% say cloud…

7 min read