Part 3 of the blog series on Kubernetes operators.

In Part 1 of this blog series, we introduced the idea that Kubernetes operators—when deployed at significant scale—can consume substantial resources, both in terms of real resource consumption and consuming schedulable capacity. We also introduced the idea that serverless technology may play a part in reducing this impact on a Kubernetes cluster by scaling down active controller deployments when they go idle. 

In Part 2, we introduced a technology which is capable of reducing the resource overhead of existing controllers without source modification, based simply around the idea of scaling the number of pod instances to zero when idle. 

In this final post in the three-part series, we will show how to adapt existing operators to leverage the built-in scale-to-zero capability provided by Knative serving.  

For a review of operators, see “Kubernetes Operators Explained”:

Operator architecture

At the low level, the main task of a typical operator is to watch for changes occurring in the Kubernetes backing store (etcd) and to react to them (e.g., by installing and managing Kafka clusters). The Informer object watches for events and puts events it receives into a Workqueue, ensuring that only one reconciler (Handle Object in the figure below) is active at a given time for a given object. The Informer object constantly watches for events, whereas the reconciler only runs when an item is inserted into the workqueue—a prime candidate for applying scale-to-zero in Knative serving.

Since 0.6, Knative Eventing provides a Cloud Event importer (or source) for Kubernetes API server events. By combining this importer with the scale-to-zero capability provided by Knative serving, we can achieve our goal of scale-to-zero for the reconciler. In this new architecture, the informer does not scale-to-zero but it is now shared across multiple operators, significantly reducing the overall resource consumption.

Serverless sample controller

Let’s show how to adapt an existing controller to run in Knative. Consider the Kubernetes sample controller project, which demonstrates how to implement operators directly on top of the Go client library. This project defines a new CRD called Foo and provides a controller to create a deployment for Foo objects. 

apiVersion: samplecontroller.k8s.io/v1alpha1
kind: Foo
metadata:
  name: example-foo
spec:
  deploymentName: example-foo
  replicas: 1

generates:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: example-foo
  ownerReferences:
  - apiVersion: samplecontroller.k8s.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Foo
    name: example-foo
spec:
  ...

Modifying sample code to make Foo operator serverless

We modified the original sample code to make the Foo operator serverless (the new code is available on GitHub).  Here’s what we did: 

We removed the creation and configuration of all informers: The informers watch for changes occurring in the Kubernetes backing store. This is now done by the API server event source (see below).

We added a generic informer listening for incoming Cloud Events and enqueueing them in the workqueue: This informer decouples cloud events consumption from processing, for vertical scaling and (most importantly) for guaranteeing that only one given object is being reconciled at a given time.

All calls to the indexer (internal Informer component) are instead made against the API server: Self-explanatory. 

We added a configuration file to watch for events of type Foo and Deployment

apiVersion: sources.eventing.knative.dev/v1alpha1
kind: ApiServerSource
metadata:
  name: example-foo
spec:
  serviceAccountName: example-foo
  resources:
    - apiVersion: samplecontroller.knative.io/v1alpha1
      kind: Foo
    - apiVersion: apps/v1
       kind: Deployment
      controller: true
  sink:
    apiVersion: serving.knative.dev/v1alpha1
    kind: Service
    name: example-foo-reconcile

The resources section specifies what kinds of objects to watch for (Foo and Deployment). controller: true tells the API server source controller to watch for deployments objects and to send a cloud event containing a reference to the object controlling it.

We added a configuration file to deploy the reconciler as a Knative service:

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: example-foo-reconcile
spec:
  runLatest:
    configuration:
      revisionTemplate: 
        metadata:
          annotations:
            autoscaling.knative.dev/maxScale: "1"
            autoscaling.knative.dev/window: "30s"
        spec:
         container:
            image: $DOCKER_USER/knative-sample-controller
          serviceAccountName: example-foo-reconcile

We added two annotations controlling the Knative pod autoscaler behavior. The first one sets the maximum of concurrent pods to one so that they don’t interfere with each other. The second one adjusts the stable window to give enough time to the reconciler to complete. 

You can try to run it by yourself by following these instructions to observe the reconciler scaling down to zero. 

Compared to the original sample controller example, the Knative variant does have some limitations:

  • It does not watch for Deployments due to limited event filtering in Knative eventing 0.6. 
  • If the reconciler pod crashes, the event importer does not replay events, potentially leaving the system in an inconsistent state
  • There is no periodic event synchronization

All these limitations will be addressed in future Knative eventing releases. 

A serverless future

This post concludes our series on serverless operators. We have shown two approaches to scale operators down to zero, the first one suitable for existing operator deployments and the second one leveraging Knative built-in serverless capability. You choose!

Categories

More from Cloud

IBM Cloud VMware as a Service introduces multitenant as a new, cost-efficient consumption model

4 min read - Businesses often struggle with ongoing operational needs like monitoring, patching and maintenance of their VMware infrastructure or the added concerns over capacity management. At the same time, cost efficiency and control are very important. Not all workloads have identical needs and different business applications have variable requirements. For example, production applications and regulated workloads may require strong isolation, but development/testing, training environments, disaster recovery sites or other applications may have lower availability requirements or they can be ephemeral in nature,…

IBM accelerates enterprise AI for clients with new capabilities on IBM Z

5 min read - Today, we are excited to unveil a new suite of AI offerings for IBM Z that are designed to help clients improve business outcomes by speeding the implementation of enterprise AI on IBM Z across a wide variety of use cases and industries. We are bringing artificial intelligence (AI) to emerging use cases that our clients (like Swiss insurance provider La Mobilière) have begun exploring, such as enhancing the accuracy of insurance policy recommendations, increasing the accuracy and timeliness of…

IBM NS1 Connect: How IBM is delivering network connectivity with premium DNS offerings

4 min read - For most enterprises, how their users access applications and data is an essential part of doing business, and how they service those application and data responses has a direct correlation to revenue generation.    According to We Are Social’s Digital 2023 Global Overview Report, there are 5.19 billion people around the world using the internet in 2023. There’s an imperative need for businesses to trust their networks to deliver meaningful content to address customer needs.  So how responsive is the…

Kubernetes version 1.28 now available in IBM Cloud Kubernetes Service

2 min read - We are excited to announce the availability of Kubernetes version 1.28 for your clusters that are running in IBM Cloud Kubernetes Service. This is our 23rd release of Kubernetes. With our Kubernetes service, you can easily upgrade your clusters without the need for deep Kubernetes knowledge. When you deploy new clusters, the default Kubernetes version remains 1.27 (soon to be 1.28); you can also choose to immediately deploy version 1.28. Learn more about deploying clusters here. Kubernetes version 1.28 In…