Serverless Functions vs. Virtual Machines: A Total Cost of Ownership Comparison
By: Michael Behrendt and Manav Gupta
A total cost of ownership comparison of virtual machine and serverless functions
Contrary to popular belief, you can deploy high-throughput workloads on higher-level abstractions like Serverless Functions-as-a-Service (Faas) instead of defaulting to virtual machines.
This post explores relevant cost, performance, and availability issues for a Total Cost of Ownership (TCO) comparison of virtual machine (VM) and serverless functions. We also highlight how this analysis can be used to compare serverless function costs with traditional Platforms-as-a-service (PaaS) infrastructures.
We recommend comparing both infrastructure and operational management costs to achieve an apples-to-apples comparison and an accurate TCO analysis.
This comparison uses IBM Cloud Virtual Servers to implement the VM option and IBM Cloud Functions as the serverless platform. We modeled a high-availability application that processes about 2.5 billion request calls per day.
Both the VM and Cloud Functions deployments have the following enterprise characteristics:
There are three environments in use for the application (production, pre-production, and dev/test).
They deploy the production and pre-production environments across three geographical cloud regions for resiliency- and latency-optimization purposes.
They spread the deployment in each region across three availability zones to increase availability.
In this scenario, the example application runs on virtual server instances with two processors and 4GB of memory. We used the average price of Red Hat Enterprise Linux (RHEL), Windows, and CentOS , where the incoming load is distributed amongst the VMs using a load balancer . For each server, there are costs for automation tooling  and for labor for operational management tasks (e.g., patching, monitoring, etc.) . We based the operational management cost estimate on discussions with IBM Subject Matter Experts (SMEs).
The same VM setup is deployed in three regions for resiliency purposes. The pre-prod and production deployments comprise three regions. The testing environment is just in one region.
Given that load is usually fluctuating, the VM scenario requires some buffer capacity to accommodate workload spikes. This is reflected in the calculation as “peak-to-average ratio.” It addresses sudden peaks, which are very common even for “constant” workloads when looking at the load profile on a granular time scale. These spikes are so short-term that any kind of VM-based auto-scaling would be too slow to react. The conservative assumption made here is a 1:3 ratio (i.e., the peak is three times the average load), but there can be more extreme ratios. Values of up to 1:80 have been experienced. Figure 1 illustrates fine-grained workload volatility.
Figure 1: Typical workload volatility
Below you’ll find the costs tallied for the VM scenario, including infrastructure, operational management tools, and labor.
Figure 2: Illustrative costs for VM-based deployment
Serverless function platform
The serverless function service—IBM Cloud Functions—executes actions (which are pieces of code written in any language) in response to specific requests or events. An event might be as simple as an HTTP request, a new record appearing in an IBM Cloudant database, or any other event happening within the context of an application.
When creating a new action, the developer assigns the amount of memory to the action for its execution. When the action executes, Cloud Functions measures the time to serve the request. For example, it might take 83 milliseconds to serve a request. For each request, multiply the amount of memory assigned with the time taken to serve the request (rounded up to the next 100ms) to arrive at a GB-second value.
This pay-per-request model, compared to buying pre-allocated, coarse-grained VM capacity, is what enables much of the cost savings laid out in this document.
We estimated that Cloud Functions can process about 75 billion requests per month at a cost comparable to the VM-based deployment. This corresponds to 2.5 billion requests per day or about 29,000 requests per second, assuming a steady workload 24×7. This is clearly an extreme case, but it enables a relatively simple TCO analysis. In general, we expect Cloud Functions to run at a fraction of the cost of VM-based deployments for most workloads. Given the on-demand scaling of Cloud Functions, the cost is the same whether requests are constant or if the load changes frequently, as in Figure 1.
Figure 3 shows cost calculations for the comparable Cloud Functions solution.
Figure 3: Illustrative costs for Cloud Functions deployment
Another advantage for Cloud Functions users is that GB-second costs are the same regardless of the number availability zones or regions used. IBM Cloud deploys Cloud Functions across three availability zones by default, at no extra charge.
Moreover, Cloud Functions can also be deployed in multiple regions at no incremental costs. This is a unique property of the serverless cloud function model. This capability enables customers to achieve more 9’s of availability at no incremental cost. For example, a Cloud Functions solution running in three regions can expect 99.9999999875% platform availability (Cloud Functions has 99.95% availability per region, or 1-(1-0.9995)3 for 3 regions), at the same price as if it would be deployed in just a single region.
In contrast, a VM-based deployment needs permanent capacity in each individual availability zone and region hosting the application, with the application running 24×7 in three availability zones for resiliency. The assumption in both cases is that a global load balancer offering 100% availability is used—if it has less than 100% availability, that needs to be equally factored in.
To further reduce the costs on the serverless side, it is a common approach to process events/requests in bulk if that fits the nature of the workload, rather than one function invocation per event. This is an attractive approach where handling a single event is very lightweight and only takes a fraction of a millisecond. Each Cloud Functions invocation is charged a minimum of 100ms, so single lightweight requests might give away lots of compute cycles. Conversely, if the nature of the workload permits, the application could send events to Cloud Functions in groups. For sub-millisecond functions, 100 or more events could be processed in bulk within the 100ms window. By doing that, the overall Cloud Functions price can drop even further—in this case, by a factor of 100. This advantage is not reflected in our analysis but could be easily added where applicable.
Serverless computing has none of the ongoing infrastructure operational costs that the VM-based solution incurs. For customers, there is nothing to manage on the infrastructure level. These functions shift to IBM as the service provider for Cloud Functions. The GB-second cost is the only incremental cost in the TCO analysis for Cloud Functions.
In addition to potential hard dollar cost advantages of a serverless implementation, we regularly see other benefits, including the following:
Dramatically reduces the time-to-market due to its focus on the app code rather than the infrastructure.
Simplifies security management due to its high level of abstraction. Vulnerabilities such as Meltdown or Spectre are addressed transparently to the app developer.
Compliance standards, such as GDPR, can be addressed with a fraction of the effort it would take to update a typical VM-based solution.
This TCO analysis is just an example. Each situation is different. We recommend using the baseline structure we laid out and adjusting for specific issues.
Comparison to traditional PaaS infrastructures
Comparing traditional platform-as-a-service (PaaS) infrastructures with serverless cloud functions is similar to the VM comparison (although not as extreme) since PaaS represents a higher level of abstraction. The total cost of PaaS can be significant when costs for unused capacity, resiliency, automation, and management are included.
For example, SiteSpirit, a Dutch IT company providing SaaS offerings for travel companies, migrated the core of their application from Cloud Foundry to Cloud Functions, resulting in 90% cost reduction and 10x faster application performance. You can read the full case study here.
A real-world deployment example
An IBM client had an existing VM-based application used by Business Analysts to generate company investment profiles. The solution sources company data from over 150 sources, including Factiva, LexisNexis, Google search engine, and social media. It produces a Rapid Assessment report that summarizes a company’s data in categories such as Key Players, M&A activities, Legal, etc. Currently, it takes about three days to produce a Rapid Assessment Report. 10,000 Rapid Assessment Reports are produced annually.
The client asked IBM to compare costs for solutions that could produce 300,000 Rapid Assessment Reports per year and reduce the time to generate reports to less than one day.
We looked at several alternatives to redesign the solution, including Containers, Cloud Foundry, and Cloud Functions. Furthermore, we estimated that the application re-development costs would be lowest using VMs or Cloud Functions, so we explored these two options in detail.
Figure 4 depicts the initial VM-based architecture for the solution as hosted on the public cloud.
Figure 4: Company-Profile analysis architecture: VM-based
In the VM-based application, a company’s profile is generated as follows:
A Business Analyst generates a request via the User Interface (UI) for an area of interest, such as “Stanford Financial Group.”
A VM-based IBM Watson Explorer engine (previously called Watson Data Explorer) pulls data for the source entity from third-parties that license company data.
The Watson Explorer Engine crawls through the source data and aggregates information for key metrics (e.g., Legal, Key Players, etc.). The Content Analytics component analyzes structured and unstructured content in the source documents. Entity analytics is performed to identify entities (such as an individual named “Allen Stanford”) and generate aggregate metrics.
The aggregated data is stored in an IBM Db2 database.
A Rapid Assessment Report is generated and presented in the UI.
Business Analysts are able to view reports and perform searches on the aggregate data.
This solution to produce 10,000 Rapid Assessment Reports per year costs approximately $150,000 per year.
Expansion scenario: VM-based solution
We estimated that scaling the existing VM-based solution to produce 300,000 reports per year would cost about $1.5 million per year.
The VM-based solution requires a high degree of care, which is provided by the client. For example, middleware components—including IBM Business Process Management, Db2, and the integration layer—must be monitored and managed. In the scaled-out solution, the number of administrators required to manage middleware components would increase.
Expansion scenario: IBM Cloud Functions
The IBM team recommended a new solution architecture based on IBM Cloud Functions that would easily scale to support 300,000 Rapid Assessment Reports per year and generate reports in under 15 minutes.
Figure 5 illustrates the Cloud Functions architecture.
Figure 5: Company-Profile analysis using IBM Cloud Functions
The new solution retains the process flow of the old architecture and implements key capabilities as six serverless cloud functions:
Request Management: Present the Request Management page and respond to a request to execute a new Company research activity.
Get Dataset: Request data from Import.IO and store in IBM Cloud Object Storage (COS).
Upload and Tag Objects: As each object is pushed to COS, tag it with the Company each object is associated with.
Build Summary Report: Execute multiple WDS queries, query result post-processing steps, and codify rules against the enriched objects to prepare data for the Rapid Assessment View.
Notifications: Send an email with a URL to the user that initiated the Rapid Assessment Report request.
Quick Scan View Service: Present the Quick Scan View Page from the data stored in the Graph Database.
Figure 6 summarizes the workload and pricing assumptions for the Cloud Functions solution. Each Rapid Assessment Report request generates the activity described to estimate the cost per report.
Figure 6: Comparative pricing for Cloud Functions solution for Company-Profile analysis
Key for Figure 6:
QTY: The number of functions to execute for each Rapid Assessment. A quantity of “7” indicates the 7 key areas where the Rapid Assessment Report is generated. Report areas include “Legal” and “Key Players”
MS: Milliseconds of execution time
M: Megabytes of RAM allocated to execution
GBS: Gigabyte-Seconds of function execution
The estimated cost for IBM Functions to generate 300,000 Rapid Assessment Reports is $18,143.83, or 300,000 * $0.060479, using the cost calculations in Figure 6.
The recommended solution has other cost components beyond Cloud Functions, including the following:
Watson Knowledge Studio
IBM Cloud Object Storage
We estimated the total cost, including IBM Cloud Functions and other components, would be $360,000 per year. In this scenario, the recommended IBM solution is approximately 75% less per year than the total cost of the scaled-up VM environment.
Prices are current as of (the date) and subject to change without notice.
Results may vary depending on operating environment.
The experiences described in this article draw upon information and opinions provided by the clients. Not all users may get the same results.
The performance data for this article was collected in a controlled, isolated environment. Actual results in other operating environments may vary. While IBM has reviewed each item for accuracy in a specific situation, there is no guarantee that others can achieve same or similar results elsewhere.
The responsibility for use of this information or implementing any of these techniques is a customer responsibility and depends on the customer’s or user’s ability to test and integrate them into their operating environment. Customers or users trying to adapt these techniques to their own environments do so at their own risk. In no event shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity.
 Used the example calculation on this page, which is conservative for our scenario: https://console.bluemix.net/docs/infrastructure/loadbalancer-service/pricing.html#pricing
 Based on discussions with IBM subject matter experts regarding server management costs