What is an operator?

The Kubernetes platform is built around the software design pattern of a controller, which is a software component that manages the flow of data between two entities. In Kubernetes, controllers watch for changes to the declarative state found in one resource and then react to requests for state changes by creating or changing other downstream resources. This process is known as “active reconciliation” since the controller reconciliation process occurs continuously. This is depicted in Figure 1.

Figure 1

An example of this behavior can be observed when creating a deployment. After creating a new deployment resource, the deployment controller is notified of the resource change and reacts by creating a new replica set. The replica set controller, in turn, reacts to the replica set resource and causes one or more pods to be created. Later, if you were to modify the deployment’s image attribute, the deployment controller will create a new replica set using the new image attribute while phasing out the old replica set. Other controllers behave similarly, though the action taken on the downstream resource varies according to the resource.

Operators, like other controllers, watch for Kubernetes resource modifications. However, unlike Kubernetes platform concepts such as deployments, stateful sets, and services (which are generic across many types of software), operators embody software-specific knowledge into a controller. Consider a complex workload like a clustered database, where common operational activities need to be orchestrated in precise sequences that are unique to that software.

For a closer look, see our video “Kubernetes Operators Explained”:

Operators in practice

Let’s consider an example. Perhaps upgrading a database requires a prerequisite step of upgrading the data format prior to starting the latest version of the container software, and all pods need to be stopped prior to the data migration. Or maybe the pods need to be started in a specific sequence in order to assure that a consensus algorithm recognizes all the cluster members. The operator is charged with orchestrating these activities while leveraging a declarative or desired state within the resource model that an end user can edit.

Separating the declared state from the implementation-specific activities enables the user to control instances of the software without software-specific knowledge; that knowledge is encoded into the controller provided by the operator. Meanwhile, the operational characteristics of another piece of software is unique in its own way, and, therefore, it has its own operator.

Operators at scale

When operators are deployed individually, they consume very few resources. To put this in real terms, we performed some analysis by generating a controller using the Kube Builder SDK and the golang language. We then analyzed the real CPU and memory use of the generated controller along with introspecting the generated resource requests and limits. This information is summarized in the following table:

These numbers are for a single controller pod—the total number of pods in the cluster being determined by the following:

  • The number of software-specific operators by software package (one operator for Redis, one for Postgres).
  • The number of unique instances of an individual operator. The Redis operator might be installed in one namespace while another instance of the Redis operator instance exists in another namespace for the sake of isolation.
  • The above metrics are per pod—but, for the sake of redundancy, each operator deployment might be deployed three times.

If we were to project 10 operators, isolated by 10 namespaces, with a redundancy of 3, this would result in the following resource consumption:

We can make a few important observations about this data:

  1. At the mentioned scale, over one core will be dedicated just to keeping idle operators running.
  2. In addition to the real resource consumption, operators also count against the resource quotas of the cluster.
  3. Which operators you choose to install, and at what acting scope (such as namespace or cluster scope), does matter at scale.

Can we go serverless?

Certainly, the resource utilization of many operator instances can have an impact on cluster resource demands, but is it a good fit for serverless? The reality is that many controllers are not under constant demand, especially when the scope of an individual operator instance have been limited to a particular namespace.

Kubernetes resource modification events commonly originate from both users modifying individual resources as well as through machine-driven or batch jobs. Individual users tend to manipulate resources intensely for a time and then not again for a while, perhaps. For example, you might create a Redis cluster and then edit the individual paramters as you fine-tune that cluster to your particular needs, but after that, you move on to editing other parts of your application. In the case of machine-driven jobs, some of these are run on schedule while others are driven by source change events which tend to be clustered around the business day.

The tendancy towards clusters of activity on a single resource or resource kind favors a serverless model. In this model, container processes remain active only as long as work is arriving, but these containers can be stopped for time periods where activity ceases.

Stay tuned for more posts on existing operator deployments and new design patterns 

As operators continue to gain traction within the Kubernetes ecosystem, and custom controllers become more prevalent, the resource demands from these container processes are worth noting. In Part 2 of this series, we will consider some specific technology approaches suitable for both existing operator deployments as well as new design patterns which leverage Knative to provide serverless capability.

More from Cloud

Using advanced scan settings in the IBM Cloud Security and Compliance Center

5 min read - Customers and users want the ability to schedule scans at the timing of their choice and receive alerts when issues arise, and we’re happy to make a few announcements in this area today: Scan frequency: Until recently, the IBM Cloud® Security and Compliance Center would scan resources every 24 hours, by default, on all of the attachments in an account. With this release, users can continue to run daily scans—which is the recommended option—but they also have the option for…

5 min read

Modernizing child support enforcement with IBM and AWS

7 min read - With 68% of child support enforcement (CSE) systems aging, most state agencies are currently modernizing them or preparing to modernize. More than 20% of families and children are supported by these systems, and with the current constituents of these systems becoming more consumer technology-centric, the use of antiquated technology systems is archaic and unsustainable. At this point, families expect state agencies to have a modern, efficient child support system. The following are some factors driving these states to pursue modernization:…

7 min read

IBM Cloud Databases for Elasticsearch End of Life and pricing changes

2 min read - As part of our partnership with Elastic, IBM is announcing the release of a new version of IBM Cloud Databases for Elasticsearch. We are excited to bring you an enhanced offering of our enterprise-ready, fully managed Elasticsearch. Our partnership with Elastic means that we will be able to offer more, richer functionality and world-class levels of support. The release of version 7.17 of our managed database service will include support for additional functionality, including things like Role Based Access Control…

2 min read

Connected products at the edge

6 min read - There are many overlapping business usage scenarios involving both the disciplines of the Internet of Things (IoT) and edge computing. But there is one very practical and promising use case that has been commonly deployed without many people thinking about it: connected products. This use case involves devices and equipment embedded with sensors, software and connectivity that exchange data with other products, operators or environments in real-time. In this blog post, we will look at the frequently overlooked phenomenon of…

6 min read