Policy Tuning on IBM Cloud Code Engine Demonstrated with the Knative Quarkus Bench

4 min read

This post describes the major policy tuning options available for applications running on IBM Cloud Code Engine, using the Knative Serverless Benchmark as an example.

Our team has provided an IBM Cloud port of the Knative Serverless Benchmark that can be used to do performance experiments with serverless computing on IBM Cloud. As shown in our previous blog post, deploying a serverless application like this on IBM Cloud Code Engine can be as simple as running a command like the following:

ibmcloud code-engine application create --name graph-pagerank --image ghcr.io/ibm/knative-quarkus-bench/graph-pagerank:jvm

The next step after learning to deploy a workload would be to learn about policy tuning to improve performance, efficiency, cost control, etc. The following are two major categories of policies that can be requested for applications on Code Engine

  1. Pod resources, such as CPU and memory.
  2. Concurrency regarding the number of requests processed per pod.

Let’s look at each in more detail.

Pod resource allocation

The number of CPUs and the amount of memory desired for a Code Engine application pod can be specified initially when the ibmcloud code-engine application create command is run and can be modified after creation with the ibmcloud code-engine application update command.

The number of virtual CPUs desired can be specified with the --cpu <# of vCPUs> option to either the create or the update command. The default vCPU value is 1 and valid values range from 0.125 to 12.

The amount of memory desired can be specified with the —memory option to either the create or the update command. The default memory value is 4 GB and valid values range from 0.25 GB to 48 GB. Since only specific combinations of CPU and memory are supported, it is best to look at this chart to when requesting these resources.

Concurrency control

One of the strengths of the serverless computing paradigm is that pods will be automatically created and deleted in response to the number of ongoing requests. It is not surprising that there are several options to influence this behavior.  The easiest two are --max-scale and -- min-scale, which are used to specify the maximum and minimum number of pods that can be running at the same time.

These options can be specified at either application creation time with the create command or at application modification time with the update command. The default minimum is 0 and the default maximum is 10. Current information on the maximum number of pods that can be specified is documented here.

Increasing the max-scale value can allow for greater throughput. Increasing the min-scale value from 0 to 1 could reduce latency caused by having to wait for a pod to be deployed after a period of low use.

Slightly more interesting (yet more complicated) are the options that control how many requests can be processed per pod. The —concurrency option specifies the maximum number of requests that can be processed concurrently per pod. The default value is 100. The --concurrency-target option is the threshold of concurrent requests per instance at which additional pods are created. This can be used to scale up instances based on concurrent number of requests. If --concurrency-target is not specified, this option defaults to the value of the --concurrency option. The default value is 0. These options can be specified at either application creation time with the create command or at application modification time with the update command.

Theoretically, setting the --concurrency option to a low value would result in more pods being created under load, allowing each request to have access to more pod resources. This can be demonstrated by the following chart where we used the h2load command to send 50,000 requests to each of four benchmark tests in knative-quarkus-bench. The key point is that when the concurrency target is set to 25, all benchmarks create more pods, and as the concurrency target is increased fewer pods are created:

Pod creation chart.

Pod creation chart.

The following chart demonstrates the effect that changing the concurrency target has on the same four benchmarks. In general, higher throughput (in terms of requests per second) can be seen with lower concurrency targets since more pods are created and fewer requests are running simultaneously on the same pod. The exact impact on throughput, however, depends on the workload and the resources that are required. For example, the throughput for the sleep benchmark is nearly flat. This benchmark simply calls the sleep function for one second for each request. Thus, there is very little competition for pod resources and modifying the concurrency target has little effect in this case. Other benchmarks like dynamic-html and graph-pagerank require both memory and CPU to run, and therefore see a more significant impact to changing the concurrency target than sleep (which uses nearly no pod resources) and uploader (which mostly waits on relatively slow remote data transfer):

Throughput chart.

Throughput chart.

Conclusion

Specifying resource policy options with an IBM Cloud Code Engine application can have a clear impact on both resources consumed and performance in terms of throughput and latency.

We encourage you to review your IBM Cloud Code Engine application requirements and experiment to see if your workload would benefit from modifying pod CPU, pod memory, pod scale and concurrency.

Be the first to hear about news, product updates, and innovation from IBM Cloud