January 11, 2023 By Scott Trent 3 min read

This post describes the use of the Knative Quarkus Bench to explore cold start times of serverless functions running on the IBM Cloud Code Engine.

Our previous blog posts have introduced running and tuning the performance of serverless functions on IBM Cloud Code Engine using the IBM Cloud port of the Knative Quarkus Benchmark.

A primary advantage of serverless functions is automatic and nearly transparent scalability of underlying computational resources, including scale-to-zero, which fully releases unused resources. The ability to only deploy and use resources when needed helps reduce total consumed resources, thereby reducing cost. It can also be considered an environmentally friendly approach by reducing energy consumption. However, intuitively, one would expect a serverless function that has not been used recently and has had its resources scaled to zero to take longer to respond to a request than one that has been recently used and thus has its resources currently available.

Base experiment

The following pseudocode demonstrates an experiment our team used to understand the actual impact scale-to-zero has on the response time for serverless functions in IBM Cloud Code Engine. (Details on deploying and deleting serverless applications and accessing the benchmark with curl can be seen in our first blog post.)

Experiment pseudocode:

For pauseTime in 0, 15, 30, 45, 60, 120, 180, 240, 300:
    Deploy sleep benchmark on IBM Cloud Code Engine
    Repeat five times:
        Pause pauseTime seconds, then access benchmark with curl command
    Delete sleep benchmark from IBM Cloud Code Engine

Running this experiment, we learned that without tuning, a warmed up serverless function that has been accessed within the past 60 seconds will respond on average in 0.22 seconds, and a cold serverless function that has not been accessed for over 60 seconds will respond on average in 17.2 seconds. This does seem reasonable, since in one case, the pod is running and available to reply to requests, and in the other case, a pod and other networking services must be deployed. There are certainly many use cases in which the advantage of resource and cost savings offered by scale-to-zero overcome the disadvantage of a potentially slower response time for cold requests.

Sample command to start Knative Quarkus Bench on IBM Cloud Code Engine for this experiment:

ibmcloud code-engine application create --name sleep --image ghcr.io/ibm/knative-quarkus-bench/graph-sleep:jvm

Sample command used to access and measure sleep bench response time:

$ URL=$(ibmcloud ce app list | grep sleep | tr -s ' ' | cut -d ' ' -f 3)
$ /usr/bin/time curl -s -w "\n" -H 'Content-Type:application/json' -d '"0"' -X POST ${URL}/sleep

Experimental verification of tuning

Next, we experimented with tuning to support use cases where slow cold requests are not acceptable. The min-scale option can be specified when creating applications in IBM Cloud Code Engine. The default value for this option is zero, which permits the number of pods to scale down to zero, thus enabling scale-to-zero.

Theoretically, if min-scale is set to 1, there will always be at least a single pod that can ideally promptly service requests even if there has been no activity for longer than 60 seconds. We verified this behavior by modifying the previous experiment to use --min-scale 1 when creating the serverless function application and then measured the response time after varying pause times, as before. We observed that regardless of pause time, the average response time was 0.22 seconds. Hence, setting --min-scale to at least 1 will significantly improve cold request performance.

Figure 1: Default vs. tuned average serverless function response time after varying periods of inactivity


This post has demonstrated how to use Knative Quarkus Bench to determine the potential performance differences between cold and warm requests to serverless functions. Furthermore, we demonstrated how to use the --min-scale option to avoid the potential performance impacts of scale-to-zero.

We encourage those currently running serverless functions on IBM Cloud Code Engine to consider if their workload could benefit from using the --min-scale option to improve cold-request performance. If you have not already tried out serverless function applications, check out the step-by-step instructions in our first blog post to deploy and access a serverless benchmark application.

Was this article helpful?

More from Cloud

Enhance your data security posture with a no-code approach to application-level encryption

4 min read - Data is the lifeblood of every organization. As your organization’s data footprint expands across the clouds and between your own business lines to drive value, it is essential to secure data at all stages of the cloud adoption and throughout the data lifecycle. While there are different mechanisms available to encrypt data throughout its lifecycle (in transit, at rest and in use), application-level encryption (ALE) provides an additional layer of protection by encrypting data at its source. ALE can enhance…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters