This post describes the major policy tuning options available for applications running on IBM Cloud Code Engine, using the Knative Serverless Benchmark as an example.
Our team has provided an IBM Cloud port of the Knative Serverless Benchmark that can be used to do performance experiments with serverless computing on IBM Cloud. As shown in our previous blog post, deploying a serverless application like this on IBM Cloud Code Engine can be as simple as running a command like the following:
The next step after learning to deploy a workload would be to learn about policy tuning to improve performance, efficiency, cost control, etc. The following are two major categories of policies that can be requested for applications on Code Engine
- Pod resources, such as CPU and memory.
- Concurrency regarding the number of requests processed per pod.
Let’s look at each in more detail.
Pod resource allocation
The number of CPUs and the amount of memory desired for a Code Engine application pod can be specified initially when the ibmcloud code-engine application create
command is run and can be modified after creation with the ibmcloud code-engine application update
command.
The number of virtual CPUs desired can be specified with the --cpu <# of vCPUs>
option to either the create
or the update
command. The default vCPU value is 1 and valid values range from 0.125 to 12.
The amount of memory desired can be specified with the —memory
option to either the create
or the update
command. The default memory value is 4 GB and valid values range from 0.25 GB to 48 GB. Since only specific combinations of CPU and memory are supported, it is best to look at this chart to when requesting these resources.
Concurrency control
One of the strengths of the serverless computing paradigm is that pods will be automatically created and deleted in response to the number of ongoing requests. It is not surprising that there are several options to influence this behavior. The easiest two are --max-scale
and -- min-scale
, which are used to specify the maximum and minimum number of pods that can be running at the same time.
These options can be specified at either application creation time with the create
command or at application modification time with the update
command. The default minimum is 0 and the default maximum is 10. Current information on the maximum number of pods that can be specified is documented here.
Increasing the max-scale
value can allow for greater throughput. Increasing the min-scale
value from 0 to 1 could reduce latency caused by having to wait for a pod to be deployed after a period of low use.
Slightly more interesting (yet more complicated) are the options that control how many requests can be processed per pod. The —concurrency
option specifies the maximum number of requests that can be processed concurrently per pod. The default value is 100. The --concurrency-target
option is the threshold of concurrent requests per instance at which additional pods are created. This can be used to scale up instances based on concurrent number of requests. If --concurrency-target
is not specified, this option defaults to the value of the --concurrency
option. The default value is 0. These options can be specified at either application creation time with the create
command or at application modification time with the update
command.
Theoretically, setting the --concurrency
option to a low value would result in more pods being created under load, allowing each request to have access to more pod resources. This can be demonstrated by the following chart where we used the h2load
command to send 50,000 requests to each of four benchmark tests in knative-quarkus-bench. The key point is that when the concurrency target is set to 25, all benchmarks create more pods, and as the concurrency target is increased fewer pods are created:
The following chart demonstrates the effect that changing the concurrency target has on the same four benchmarks. In general, higher throughput (in terms of requests per second) can be seen with lower concurrency targets since more pods are created and fewer requests are running simultaneously on the same pod. The exact impact on throughput, however, depends on the workload and the resources that are required. For example, the throughput for the sleep
benchmark is nearly flat. This benchmark simply calls the sleep
function for one second for each request. Thus, there is very little competition for pod resources and modifying the concurrency target has little effect in this case. Other benchmarks like dynamic-html
and graph-pagerank
require both memory and CPU to run, and therefore see a more significant impact to changing the concurrency target than sleep
(which uses nearly no pod resources) and uploader
(which mostly waits on relatively slow remote data transfer):
Conclusion
Specifying resource policy options with an IBM Cloud Code Engine application can have a clear impact on both resources consumed and performance in terms of throughput and latency.
We encourage you to review your IBM Cloud Code Engine application requirements and experiment to see if your workload would benefit from modifying pod CPU, pod memory, pod scale and concurrency.