November 14, 2022 By Scott Trent 4 min read

This post describes the major policy tuning options available for applications running on IBM Cloud Code Engine, using the Knative Serverless Benchmark as an example.

Our team has provided an IBM Cloud port of the Knative Serverless Benchmark that can be used to do performance experiments with serverless computing on IBM Cloud. As shown in our previous blog post, deploying a serverless application like this on IBM Cloud Code Engine can be as simple as running a command like the following:

ibmcloud code-engine application create --name graph-pagerank --image

The next step after learning to deploy a workload would be to learn about policy tuning to improve performance, efficiency, cost control, etc. The following are two major categories of policies that can be requested for applications on Code Engine

  1. Pod resources, such as CPU and memory.
  2. Concurrency regarding the number of requests processed per pod.

Let’s look at each in more detail.

Pod resource allocation

The number of CPUs and the amount of memory desired for a Code Engine application pod can be specified initially when the ibmcloud code-engine application create command is run and can be modified after creation with the ibmcloud code-engine application update command.

The number of virtual CPUs desired can be specified with the --cpu <# of vCPUs> option to either the create or the update command. The default vCPU value is 1 and valid values range from 0.125 to 12.

The amount of memory desired can be specified with the —memory option to either the create or the update command. The default memory value is 4 GB and valid values range from 0.25 GB to 48 GB. Since only specific combinations of CPU and memory are supported, it is best to look at this chart to when requesting these resources.

Concurrency control

One of the strengths of the serverless computing paradigm is that pods will be automatically created and deleted in response to the number of ongoing requests. It is not surprising that there are several options to influence this behavior.  The easiest two are --max-scale and -- min-scale, which are used to specify the maximum and minimum number of pods that can be running at the same time.

These options can be specified at either application creation time with the create command or at application modification time with the update command. The default minimum is 0 and the default maximum is 10. Current information on the maximum number of pods that can be specified is documented here.

Increasing the max-scale value can allow for greater throughput. Increasing the min-scale value from 0 to 1 could reduce latency caused by having to wait for a pod to be deployed after a period of low use.

Slightly more interesting (yet more complicated) are the options that control how many requests can be processed per pod. The —concurrency option specifies the maximum number of requests that can be processed concurrently per pod. The default value is 100. The --concurrency-target option is the threshold of concurrent requests per instance at which additional pods are created. This can be used to scale up instances based on concurrent number of requests. If --concurrency-target is not specified, this option defaults to the value of the --concurrency option. The default value is 0. These options can be specified at either application creation time with the create command or at application modification time with the update command.

Theoretically, setting the --concurrency option to a low value would result in more pods being created under load, allowing each request to have access to more pod resources. This can be demonstrated by the following chart where we used the h2load command to send 50,000 requests to each of four benchmark tests in knative-quarkus-bench. The key point is that when the concurrency target is set to 25, all benchmarks create more pods, and as the concurrency target is increased fewer pods are created:

Pod creation chart.

The following chart demonstrates the effect that changing the concurrency target has on the same four benchmarks. In general, higher throughput (in terms of requests per second) can be seen with lower concurrency targets since more pods are created and fewer requests are running simultaneously on the same pod. The exact impact on throughput, however, depends on the workload and the resources that are required. For example, the throughput for the sleep benchmark is nearly flat. This benchmark simply calls the sleep function for one second for each request. Thus, there is very little competition for pod resources and modifying the concurrency target has little effect in this case. Other benchmarks like dynamic-html and graph-pagerank require both memory and CPU to run, and therefore see a more significant impact to changing the concurrency target than sleep (which uses nearly no pod resources) and uploader (which mostly waits on relatively slow remote data transfer):

Throughput chart.


Specifying resource policy options with an IBM Cloud Code Engine application can have a clear impact on both resources consumed and performance in terms of throughput and latency.

We encourage you to review your IBM Cloud Code Engine application requirements and experiment to see if your workload would benefit from modifying pod CPU, pod memory, pod scale and concurrency.

Was this article helpful?

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside, the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters