Enjoy the flexibility and cost of IaaS with the convenience of serverless.

For the last several years, serverless has grown in popularity significantly. The following are some key attributes attracting customers:

  1. Only focus on code, not the infrastructure.
  2. Only pay for what you use/no pay for idle. No capacity management needed.
  3. Transparent and granular scaling.

Gap: Any VM config as serverless runtime

However, specifically for load-intensive or more complex workloads, there is still some unsatisfied demand that is not fully addressed by today’s serverless offerings, such as being too constrained with respect to available CPU, memory and disk capacity. This post intends to describe how we think this could be addressed.

The foundational thought is that in established serverless compute offerings, code gets isolated by running in a container, (micro-)VM, etc. As part of that, there is always a dedicated (but relatively limited) amount of CPU, memory, etc. available per invocation. One would ask themselves why there has to be a limitation to this versus giving the user the broadest possible choice, which allows them to pick the configuration that fits their needs best.


Solution: Drive serverless execution from client-side

As shown in the figure above, the user-facing serverless experience would remain the same as what is available today. However, the user would get the option to select the resources available for execution from the full spectrum of VM sizes available, not just a more constrained subset of CPU, mem, disk and network capacities.

Benefits

An additional benefit of this model is that it would be available at the price point of regular VMs. On the negative side, this approach also means increased invocation latency, which is actually acceptable for many workloads that need many minutes of processing anyway. At the same time, this also means that the scope of management responsibility for the user is higher (e.g., OS patch management would be up to them). Having talked to many potential customers about this, however, they seem to be very willing to accept this trade-off, given the strong overall value proposition.

Serverless invocation up to 1 TB of memory size

Certain workloads (e.g., in the ML/AI or data processing space) can benefit significantly from running as much as possible on the same machine for low-latency, in-memory data sharing. To be concrete, this means up to 128 vCPUs, 1TB of memory, many TBs of local SSD disk, etc. are available per invocation and, potentially, hours of execution time if needed.

Driven by real-world use cases

Serverless purists might be claiming this is deviating from the sub-seconds provisioning times associated with serverless since the very early days. However, it’s important to acknowledge that there are many real-world use cases where there is still significant customer value in exposing a serverless interface. For a given workload that runs many minutes, slightly extended provisioning times are not an issue. Usually, VMs can be provisioned within a very short period of time (seconds), so the delta is not that big.

Significant price advantages

Another important aspect addressed by this solution is that it offers price competitiveness at scale. While there is value in a higher level of management abstraction provided by in-market serverless offerings, when comparing their price with the price of a raw CPU-sec or MB-sec level provided by raw VMs, buying the raw VM infrastructure can more cost-attractive, specifically when running at large scale. Obviously, this also comes with management responsibility for the VM shifting to the customer, but this is often accepted.

It’s also very important to note that this solution offers the same “only-pay-for-what-you’re-running” model. That means there is no capacity pool to manage, which is traditionally either always over- or underprovisioned.

Container images continue to be the packaging format

Also, the approach described in this post allows for the acquisition of infrastructure at the regular VM cost-level. While VMs are provisioned per invocation at the lowest level, the actual code still runs within one container on top of it. This means the underlying container base image can continue to be customized by the user by preinstalling libs and taking other configuration steps.

You’d like all infrastructure you’re using to reside in your account, in your VPC, with support for security groups, public gateways, control over the subnets being used, the ability to use various storage options, large fast local SSDs and many 100’s of GBs of memory.

Available as a simple library

We’re making the capability that provides this abstraction available as part of the Lithops open source project. The extension to this project offers a very simple serverless map/reduce interface, where capacity management is entirely transparent to the user and where all features mentioned above are delivered. This is made available in a very initial version to give you and us, jointly, the opportunity to learn how such a capability should be shaped going forward.

With this library, you can run any kind of map or map reduce operation, while not having to take care of provisioning a pool of VMs, updating apt-get, installing the right version of Python, getting all required libs, copying the respective source code onto these machines, etc. Similar to established serverless services like IBM Cloud Code Engine or IBM Cloud Functions, capacity is always perfectly aligned with the actual demand of running code.

Example

The code snippet below illustrates how this works in a hello-world kind of scenario. When executing this, instead of a container, a VM gets spun up and executes the respective logic:

from lithops import FunctionExecutor
 
def hello(name):
    return 'Hello {}!'.format(name)
 
with FunctionExecutor() as fexec:
    fut = fexec.call_async(hello, 'World')
    print(fut.result())
Scroll to view full table

Use IBM Cloud Code Engine wherever possible

How is this related to IBM Cloud Code Engine, our latest GA-level serverless offering?

Wherever IBM Cloud Code Engine is applicable, it should be used as the first choice since it provides the most convenience and quality of service. However, in situations where one or more of the aspects described above are required, this approach should be used as the default fallback solution.

We’d like to express very special thanks to Josep Sampe from URV, who was absolutely instrumental in delivering this capability.

Check out the setup instructions and more detailed documentation.

We’re looking very much forward to your feedback.

More from Cloud

Clients can strengthen defenses for their data with IBM Storage Defender, now generally available

2 min read - We are excited to inform our clients and partners that IBM Storage Defender, part of our IBM Storage for Data Resilience portfolio, is now generally available. Enterprise clients worldwide continue to grapple with a threat landscape that is constantly evolving. Bad actors are moving faster than ever and are causing more lasting damage to data. According to an IBM report, cyberattacks like ransomware that used to take months to fully deploy can now take as little as four days. Cybercriminals…

2 min read

Integrating data center support: Lower costs and decrease downtime with your support strategy

3 min read - As organizations and their data centers embrace hybrid cloud deployments, they have a rapidly growing number of vendors and workloads in their IT environments. The proliferation of these vendors leads to numerous issues and challenges that overburden IT staff, impede clients’ core business innovations and development, and complicate the support and operation of these environments.  Couple that with the CIO’s priorities to improve IT environment availability, security and privacy posture, performance, and the TCO, and you now have a challenge…

3 min read

Using advanced scan settings in the IBM Cloud Security and Compliance Center

5 min read - Customers and users want the ability to schedule scans at the timing of their choice and receive alerts when issues arise, and we’re happy to make a few announcements in this area today: Scan frequency: Until recently, the IBM Cloud® Security and Compliance Center would scan resources every 24 hours, by default, on all of the attachments in an account. With this release, users can continue to run daily scans—which is the recommended option—but they also have the option for…

5 min read

Modernizing child support enforcement with IBM and AWS

7 min read - With 68% of child support enforcement (CSE) systems aging, most state agencies are currently modernizing them or preparing to modernize. More than 20% of families and children are supported by these systems, and with the current constituents of these systems becoming more consumer technology-centric, the use of antiquated technology systems is archaic and unsustainable. At this point, families expect state agencies to have a modern, efficient child support system. The following are some factors driving these states to pursue modernization:…

7 min read