July 1, 2019 By Torsten Steinbach 6 min read

Exploring the relationship of serverless technology and big data and analytics.

Big data has been around the past few decades in traditional form factors of big data systems, such as a data warehouses. Starting around the year 2000, however, Hadoop helped expand these integrated systems to be more open in terms of the data and analytics that could be supported.

In this lightboarding video, I trace the evolution of big data and analytics and explain how a new trend around serverless technology has made a big impact.

Learn more

Video Transcript

Big data analytics and serverless technology

Hello, this is Torsten Steinbach, Architect here at IBM for Data and Analytics in the Cloud, and today, I’m going to talk to you about serverless technology and how it is applied to big data analytics.

Data warehouses

When we look at big data in the past decades, we can see that there has been—well, there is a traditional form factor of big data systems that has been used for many decades already, and this is the form factor of a data warehouse.

So, this is a highly integrated system—highly optimized for handling big data queries, big data analytics in a very efficient manner.

Hadoop

Nevertheless, we had (around the year 2000) Hadoop coming up and being adopted very rapidly and gaining a lot of popularity in this now widely adopted industry.

Even though there was already big data analytics, so why is that Hadoop came up? So, this is because it brought—in addition to this integrated system—more openness to the table. More openness, in terms of the type of data that it could handle, data formats, bring-your-own-data formats, the types of analytics, analytics libraries, and languages that can be supported. And also, the flexibility in terms of the hardware, the deployment options that you can have. You can bring your custom hardware—even heterogeneous hardware.

So, that’s why Hadoop basically gained a lot of traction and is now widely adopted.

The rise of cloud and big data analytics

Today, however, we are seeing a trend that basically results in yet another form factor of doing big data analytics, and this trend is driven by, actually, one thing that is happening, which is the era of the rise of cloud.

Consumer behavior and the sharing economy

And another thing to actually goes hand in hand a little bit with the rise of cloud is the consumption behavior of many people—of end users—to be more oriented on the sharing economy. So, people are using more and more ride shares instead of just renting a car and not to speak of buying a car just to get around. Or, they are just going with Airbnb to sleep a night somewhere.

Serverless as the sharing economy for IT

So, this consumer behavior is also applied now to IT.  And this term serverless is actually explained as this: serverless is, in fact, the sharing economy for IT. And it is it is enabled by cloud.

And it is, in fact, the most consequent usage model of cloud—serverless.

Functions-as-a-Service (FaaS)

Many of you have heard the term serverless, and probably most of you will associate a thing called Functions-as-a-Service with serverless. Many of you may think it’s synonymous, which is not exactly true, but that is what basically what many people think of. 

Functions-as-a-Service is: I have my code that I need to run—my business logic—but I don’t provision dedicated systems, dedicated hardware, or not yet not even dedicated software; I’m just sending it to the server and saying, please run it for me. Run it for me maybe that many times. 

So, how to scale out, and it’s all done ad-hoc. It’s, basically, hiding the fact that there are servers. That’s why it’s called serverless.

Big data and analytics

Now, as I said, this is what many people think of when they hear the term serverless, but serverless is more than just Functions-as-a-Service. Especially when we now look back again here at our domain here, which is data—big data and analytics.

The problem with big data analytics is that we are talking about state. State has to be kept—my data has to be kept safely and durable and reliably. I need to be able to access it anytime I want it. And that’s what these systems provide.

Data storage in the cloud

But now in the cloud, we have new options. We can actually abstract the storage of data itself as a cloud service on its own.

That’s also what’s happening on the cloud, and there is, basically, cloud-native storage of object storage.

Object storage is, basically, serverless storage because you do not provision disk volumes, you do not configure disk volumes—you just bring your data and the system figures out how to store it and how to distribute it to make it highly available and so on.

It’s highly abstracted—you just have a REST API where you upload and download your data. You can come with kilobytes of data, going up to terabytes of data, in the same organizational unit.

Pay-as-you-go consumption model

And to think about why it is serverless—it is also that it’s a pay-as-you-go consumption model. You just don’t use it as you go, you also to pay as you go, which means you’re just paying for the gigabytes if you’re storing at this point right now. And if you store less, you will be paying less in a very elastic, completely seamlessly elastic way.

Analyzing and processing data

Now, when we talk about big data analytics, it’s not just about storage of data but, also, how can we analyze this data and process this data. And that’s exactly what we are now seeing as well driven by cloud; we are seeing additional services that are made available around object storage such as SQL-as-a-Service or, also, it allows you to run SQL, basically, on the data in object storage and just be built for this one SQL, depending on how big the SQL was in terms of data it had to scan. And you do not pay for database that is provisioned and standing around—just a single SQL and that’s it. 

And there are other things that basically play in, like, for instance, Messaging-as-a-Service—Kafka-as-a-Service—where you are just paying by the number of messages being processed and then eventually stored to the object storage.

Complementary big data and analytics form factors

So there’s a series of these services basically coming up, and, in combination, they are providing this new form factor of a big data and analytics system that is augmenting and actually complementing the existing form factors because even though they are more established and older, there is still a point for using them. They have their sweet spots in terms of their own performance characteristics and response time guarantees, but, on the other side, there are maybe cost-effectiveness benefits here. 

So, depending on your business model and requirements, you may use this or this or the combination of those things.

So, I hope this helps to put in perspective how serverless play into big data analytics and how it basically generates a whole new form factor with big data and analytics systems.

Was this article helpful?
YesNo

More from Cloud

Fortressing the digital frontier: A comprehensive look at IBM Cloud network security services

6 min read - The cloud revolution has fundamentally transformed how businesses operate. Its superior scalability, agility and cost-effectiveness have made it the go-to platform for organizations of all sizes. However, this shift to the cloud has introduced a new landscape of ever-evolving security threats. Data breaches and cyberattacks continue to hit organizations, making robust cloud network security an absolute necessity. IBM®, a titan in the tech industry, recognizes this critical need, provides a comprehensive suite of tools and offers unmatched expertise to fortify…

How well do you know your hypervisor and firmware?

6 min read - IBM Cloud® Virtual Private Cloud (VPC) is designed for secured cloud computing, and several features of our platform planning, development and operations help ensure that design. However, because security in the cloud is typically a shared responsibility between the cloud service provider and the customer, it’s essential for you to fully understand the layers of security that your workloads run on here with us. That’s why here, we detail a few key security components of IBM Cloud VPC that aim…

New IBM study: How business leaders can harness the power of gen AI to drive sustainable IT transformation

3 min read - As organizations strive to balance productivity, innovation and environmental responsibility, the need for sustainable IT practices is even more pressing. A new global study from the IBM Institute for Business Value reveals that emerging technologies, particularly generative AI, can play a pivotal role in advancing sustainable IT initiatives. However, successful transformation of IT systems demands a strategic and enterprise-wide approach to sustainability. The power of generative AI in sustainable IT Generative AI is creating new opportunities to transform IT operations…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters