Being Frugal with Kubernetes Operators

5 min read

By: Kyle Schlosser

What is an operator?

The Kubernetes platform is built around the software design pattern of a controller, which is a software component that manages the flow of data between two entities. In Kubernetes, controllers watch for changes to the declarative state found in one resource and then react to requests for state changes by creating or changing other downstream resources. This process is known as “active reconciliation” since the controller reconciliation process occurs continuously. This is depicted in Figure 1.

Operators Figure 1

Figure 1

An example of this behavior can be observed when creating a deployment. After creating a new deployment resource, the deployment controller is notified of the resource change and reacts by creating a new replica set. The replica set controller, in turn, reacts to the replica set resource and causes one or more pods to be created. Later, if you were to modify the deployment’s image attribute, the deployment controller will create a new replica set using the new image attribute while phasing out the old replica set. Other controllers behave similarly, though the action taken on the downstream resource varies according to the resource.

Operators, like other controllers, watch for Kubernetes resource modifications. However, unlike Kubernetes platform concepts such as deployments, stateful sets, and services (which are generic across many types of software), operators embody software-specific knowledge into a controller. Consider a complex workload like a clustered database, where common operational activities need to be orchestrated in precise sequences that are unique to that software.

Operators in practice

Let’s consider an example. Perhaps upgrading a database requires a prerequisite step of upgrading the data format prior to starting the latest version of the container software, and all pods need to be stopped prior to the data migration. Or maybe the pods need to be started in a specific sequence in order to assure that a consensus algorithm recognizes all the cluster members. The operator is charged with orchestrating these activities while leveraging a declarative or desired state within the resource model that an end user can edit.

Separating the declared state from the implementation-specific activities enables the user to control instances of the software without software-specific knowledge; that knowledge is encoded into the controller provided by the operator. Meanwhile, the operational characteristics of another piece of software is unique in its own way, and, therefore, it has its own operator.

Operators at scale

When operators are deployed individually, they consume very few resources. To put this in real terms, we performed some analysis by generating a controller using the Kube Builder SDK and the golang language. We then analyzed the real CPU and memory use of the generated controller along with introspecting the generated resource requests and limits. This information is summarized in the following table:

Operators Table 1

These numbers are for a single controller pod—the total number of pods in the cluster being determined by the following:

  • The number of software-specific operators by software package (one operator for Redis, one for Postgres).
  • The number of unique instances of an individual operator. The Redis operator might be installed in one namespace while another instance of the Redis operator instance exists in another namespace for the sake of isolation.
  • The above metrics are per pod—but, for the sake of redundancy, each operator deployment might be deployed three times.

If we were to project 10 operators, isolated by 10 namespaces, with a redundancy of 3, this would result in the following resource consumption:

Operators Table 2

We can make a few important observations about this data:

  1. At the mentioned scale, over one core will be dedicated just to keeping idle operators running.
  2. In addition to the real resource consumption, operators also count against the resource quotas of the cluster.
  3. Which operators you choose to install, and at what acting scope (such as namespace or cluster scope), does matter at scale.

Can we go serverless?

Certainly, the resource utilization of many operator instances can have an impact on cluster resource demands, but is it a good fit for serverless? The reality is that many controllers are not under constant demand, especially when the scope of an individual operator instance have been limited to a particular namespace.

Kubernetes resource modification events commonly originate from both users modifying individual resources as well as through machine-driven or batch jobs. Individual users tend to manipulate resources intensely for a time and then not again for a while, perhaps. For example, you might create a Redis cluster and then edit the individual paramters as you fine-tune that cluster to your particular needs, but after that, you move on to editing other parts of your application. In the case of machine-driven jobs, some of these are run on schedule while others are driven by source change events which tend to be clustered around the business day.

The tendancy towards clusters of activity on a single resource or resource kind favors a serverless model. In this model, container processes remain active only as long as work is arriving, but these containers can be stopped for time periods where activity ceases.

Stay tuned for more posts on existing operator deployments and new design patterns 

As operators continue to gain traction within the Kubernetes ecosystem, and custom controllers become more prevalent, the resource demands from these container processes are worth noting. In Part 2 of this series, we will consider some specific technology approaches suitable for both existing operator deployments as well as new design patterns which leverage Knative to provide serverless capability.

Be the first to hear about news, product updates, and innovation from IBM Cloud