July 1, 2019 By IBM Blog 6 min read

Exploring the relationship of serverless technology and big data and analytics.

Big data has been around the past few decades in traditional form factors of big data systems, such as a data warehouses. Starting around the year 2000, however, Hadoop helped expand these integrated systems to be more open in terms of the data and analytics that could be supported.

In this lightboarding video, I trace the evolution of big data and analytics and explain how a new trend around serverless technology has made a big impact.

Big Data Explained

07:03

Big Data Explained

Learn more

Video Transcript

Big data analytics and serverless technology

Hello, this is Torsten Steinbach, Architect here at IBM for Data and Analytics in the Cloud, and today, I’m going to talk to you about serverless technology and how it is applied to big data analytics.


Data warehouses

When we look at big data in the past decades, we can see that there has been—well, there is a traditional form factor of big data systems that has been used for many decades already, and this is the form factor of a data warehouse.


So, this is a highly integrated system—highly optimized for handling big data queries, big data analytics in a very efficient manner.

Hadoop

Nevertheless, we had (around the year 2000) Hadoop coming up and being adopted very rapidly and gaining a lot of popularity in this now widely adopted industry.


Even though there was already big data analytics, so why is that Hadoop came up? So, this is because it brought—in addition to this integrated system—more openness to the table. More openness, in terms of the type of data that it could handle, data formats, bring-your-own-data formats, the types of analytics, analytics libraries, and languages that can be supported. And also, the flexibility in terms of the hardware, the deployment options that you can have. You can bring your custom hardware—even heterogeneous hardware.

So, that’s why Hadoop basically gained a lot of traction and is now widely adopted.

The rise of cloud and big data analytics

Today, however, we are seeing a trend that basically results in yet another form factor of doing big data analytics, and this trend is driven by, actually, one thing that is happening, which is the era of the rise of cloud.


Consumer behavior and the sharing economy

And another thing to actually goes hand in hand a little bit with the rise of cloud is the consumption behavior of many people—of end users—to be more oriented on the sharing economy. So, people are using more and more ride shares instead of just renting a car and not to speak of buying a car just to get around. Or, they are just going with Airbnb to sleep a night somewhere.

Serverless as the sharing economy for IT

So, this consumer behavior is also applied now to IT.  And this term serverless is actually explained as this: serverless is, in fact, the sharing economy for IT. And it is it is enabled by cloud.

And it is, in fact, the most consequent usage model of cloud—serverless.


Functions-as-a-Service (FaaS)

Many of you have heard the term serverless, and probably most of you will associate a thing called Functions-as-a-Service with serverless. Many of you may think it’s synonymous, which is not exactly true, but that is what basically what many people think of. 


Functions-as-a-Service is: I have my code that I need to run—my business logic—but I don’t provision dedicated systems, dedicated hardware, or not yet not even dedicated software; I’m just sending it to the server and saying, please run it for me. Run it for me maybe that many times. 

So, how to scale out, and it’s all done ad-hoc. It’s, basically, hiding the fact that there are servers. That’s why it’s called serverless.

Big data and analytics

Now, as I said, this is what many people think of when they hear the term serverless, but serverless is more than just Functions-as-a-Service. Especially when we now look back again here at our domain here, which is data—big data and analytics.

The problem with big data analytics is that we are talking about state. State has to be kept—my data has to be kept safely and durable and reliably. I need to be able to access it anytime I want it. And that’s what these systems provide.

Data storage in the cloud

But now in the cloud, we have new options. We can actually abstract the storage of data itself as a cloud service on its own.

That’s also what’s happening on the cloud, and there is, basically, cloud-native storage of object storage.


Object storage is, basically, serverless storage because you do not provision disk volumes, you do not configure disk volumes—you just bring your data and the system figures out how to store it and how to distribute it to make it highly available and so on.

It’s highly abstracted—you just have a REST API where you upload and download your data. You can come with kilobytes of data, going up to terabytes of data, in the same organizational unit.

Pay-as-you-go consumption model

And to think about why it is serverless—it is also that it’s a pay-as-you-go consumption model. You just don’t use it as you go, you also to pay as you go, which means you’re just paying for the gigabytes if you’re storing at this point right now. And if you store less, you will be paying less in a very elastic, completely seamlessly elastic way.


Analyzing and processing data

Now, when we talk about big data analytics, it’s not just about storage of data but, also, how can we analyze this data and process this data. And that’s exactly what we are now seeing as well driven by cloud; we are seeing additional services that are made available around object storage such as SQL-as-a-Service or, also, it allows you to run SQL, basically, on the data in object storage and just be built for this one SQL, depending on how big the SQL was in terms of data it had to scan. And you do not pay for database that is provisioned and standing around—just a single SQL and that’s it. 

And there are other things that basically play in, like, for instance, Messaging-as-a-Service—Kafka-as-a-Service—where you are just paying by the number of messages being processed and then eventually stored to the object storage.


Complementary big data and analytics form factors

So there’s a series of these services basically coming up, and, in combination, they are providing this new form factor of a big data and analytics system that is augmenting and actually complementing the existing form factors because even though they are more established and older, there is still a point for using them. They have their sweet spots in terms of their own performance characteristics and response time guarantees, but, on the other side, there are maybe cost-effectiveness benefits here. 

So, depending on your business model and requirements, you may use this or this or the combination of those things.

So, I hope this helps to put in perspective how serverless play into big data analytics and how it basically generates a whole new form factor with big data and analytics systems.

Was this article helpful?
YesNo

More from Cloud

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

A clear path to value: Overcome challenges on your FinOps journey 

3 min read - In recent years, cloud adoption services have accelerated, with companies increasingly moving from traditional on-premises hosting to public cloud solutions. However, the rise of hybrid and multi-cloud patterns has led to challenges in optimizing value and controlling cloud expenditure, resulting in a shift from capital to operational expenses.   According to a Gartner report, cloud operational expenses are expected to surpass traditional IT spending, reflecting the ongoing transformation in expenditure patterns by 2025. FinOps is an evolving cloud financial management discipline…

IBM Power8 end of service: What are my options?

3 min read - IBM Power8® generation of IBM Power Systems was introduced ten years ago and it is now time to retire that generation. The end-of-service (EoS) support for the entire IBM Power8 server line is scheduled for this year, commencing in March 2024 and concluding in October 2024. EoS dates vary by model: 31 March 2024: maintenance expires for Power Systems S812LC, S822, S822L, 822LC, 824 and 824L. 31 May 2024: maintenance expires for Power Systems S812L, S814 and 822LC. 31 October…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters