A Recap of the Key Advantages Offered by IBM Cloud Functions
5 min read
By: Michael Behrendt and Jeremias Werner
Recent highlights and accomplishments from IBM Cloud Functions
IBM Cloud Functions is IBM’s serverless/Functions as a Service (FaaS) offering. Over the last many months, the team has been working extremely hard on improving the technology, helping to onboard customers, and simplifying the adoption of serverless for existing applications.
In preparation for the upcoming Think 2019 conference (February 12–15 in San Francisco, CA), we would like to recap some important highlights and accomplishments from the last few months. This is certainly not complete and not doing justice to everything that has been done, so I apologize in advance for everything we might have missed.
IBM Cloud Functions being used by ESPN Fantasy Football, The Masters, and The Weather Channel
We’ve been working with customers across various industries—including finance, automotive, insurances, manufacturing, etc.—on running their production workloads on Cloud Functions. In that context, we’ve also seen ESPN Fantasy Football—which has more than 10 million daily active users—adopting Functions for their statistics dashboard. See here to read the full story.
Additionally, the internal web statistics for The Masters are being backed by Cloud Functions. See here for the details: “Digital Analytics powered by Watson and the IBM Cloud”
Finally, The Weather Underground (a part of The Weather Channel) renders all their radar maps using Cloud Functions. You can find these maps here.
IBM Cloud Functions delivers a superior TCO even for high-load scenarios
You often read about serverless only being cost-efficient in scenarios with sporadic load. However, when doing a true apples-to-apples comparison, it turns out that for some workload profiles, you’d have to go into the high double-digit billions (in this example 75 billion) of function invocations per month for VMs getting in the range of becoming more cost-efficient. This blog describes the details of such a TCO comparison: “Serverless Functions vs. Virtual Machines: A Total Cost of Ownership Comparison”
Achieve 99.9999999875% (or more) of availability at the same cost of running in a single region
In order to achieve resiliency against the outage of a single region, the standard approach for any kind of cloud application is to spread across more than one region. In traditional VM-based runtime models, that means doubling, tripling, etc., the resource footprint (and, therefore, the associated costs) based on the number of regions being used. Cloud Functions delivers a per-region SLA of 99.95% for a single region because it is inherently spread across multiple availability zones per region. Since Cloud Functions only allocates resources when needed, it can be deployed across any number of regions at no incremental costs. For example, you can achieve an SLA of 99.9999999875% (=1-(1-0.9995)^3) when running Cloud Functions across 3 regions. See here for details around setting this up: “Deploy serverless apps across multiple regions”
Obviously, this can also be done with a datastore as the backend. This blog posts describes the details around that: “Highly Available Serverless Apps With Cloudant’s Cross-Region Replication”
Existing applications (e.g., node express, python flask) can run on Cloud Functions by adding just two lines of code
There is a lot of discussion going on around the implications of serverless and FaaS on application architectures. While that is very important to understand and be embedded in any IT roadmap, it requires a certain degree of upfront investment before harvesting any benefits. With frameworks like openwhisk-express or flask, it is possible to simply wrap a node express or flask app with literally two lines of code, allowing to then run that app on Cloud Functions. This might not deliver the most optimal runtime behavior, but in many cases, it offers a very low barrier-of-entry option to get started with serverless, including scale-to-zero and from-zero, as well as scaling on a per-request granularity.
Try it out by yourself:
Run your existing python code 1000 times in parallel with just two lines of code
Python is extremely popular amongst data engineers and scientists to develop the data processing logic they want to have applied to their data or any computational tasks. At the same time, this group of people is not necessarily familiar with the details of how to use a cloud. A group at Riselab at the University of Berkeley developed a framework called ‘PyWren,’ which is exactly for this use case—let data engineers/scientists focus on developing their processing tasks in Python and offer them the ability to wrapper that with basically two lines of code. Our research team out of Haifa ported that framework to IBM Cloud Functions and also added additional capabilities for data partitioning and other advanced tasks. This can be applied to data sitting on Cloud Object Storage or on other services, or it can be applied to plain computational tasks.
As an example, this can be applied to running Monte Carlo simulations, as described in this post: “Predicting the future with Monte Carlo simulations over IBM Cloud Functions.” This illustrates that with just a very minor change in the code, processing time can be brought down from 247 minutes on a local machine to 90 seconds when running on the cloud.
The PyWren project for IBM Cloud Functions can be found here.
This blog post provides additional details: “Process large data sets at massive scale with PyWren over IBM Cloud Functions”
Full freedom of choice regarding runtimes and programming languages
Cloud Functions supports any programming language runtime which runs on x86. For certain mainstream languages, there is native support, which allows specific performance optimizations (java, node, python, Go, server-side swift, php). As a catch-all mechanism, the logic which should be executed as a function can also be packaged into a docker container image without any size limit.
See here for details: “Large Applications on OpenWhisk”
Composing Functions into more complex control flows
We created an open source project which allows you to compose individual functions into more complex control flows (e.g., loops, if, etc. ). The first version—supporting node.js for articulating control flows—was released a while ago. More recently, we also released a Python-based version that we think is going to be attractive for data-centric use cases.
Innovative development tools: Command line integrated with graphics
Our research team released the Functions Shell a while ago and has continuously been improving it. This is starting to resonate with an increasing community of developers because it raised the productivity quite substantially for everyday development tasks. For example, it supports clicking resources such as actions listed via the CLI part, which then displays the corresponding action details. It also provides an overview of the execution duration of actions, visualizes composition flows, and much more.
We’d highly recommend trying this out by starting here.
Cloud Functions supports ISO 27001, 27017, and 27019, GDPR, HIPAA, and more to come soon. This sets the foundation for many production scenarios.
Addressing more customer scenarios
In order to address more customer scenarios, we have implemented a few additional requirements.
We increased the maximum amount of memory for an action to 2 GB and allow a maximum execution time (duration) of 10 minutes for an action in order to better support certain machine-learning and other resource-intensive use cases. More info here.
To improve latency for clients that reside in Asia-Pacific region, we have landed with IBM Cloud Functions in Tokyo, with data locality in Tokyo as well (accessible via CLI only).
To provide customers with better insights into their function executions and support troubleshooting use cases, we have integrated with the IBM Monitoring Service. More info here.
Customers who integrate their actions with IBM Cloud Object Storage are now able to connect via the private endpoints, which allows higher throughput and does not incur charges for any outgoing or incoming bandwidth. More info here.