etcd is an open source distributed key-value store used to hold and manage the critical information that distributed systems need to keep running. Most notably, it manages the configuration data, state data, and metadata for Kubernetes, the popular container orchestration platform.
Like all distributed workloads, containerized workloads have complex management requirements that become more complex as the workload scales. Kubernetes simplifies the process of managing these workloads by coordinating tasks such as configuration, deployment, service discovery, load balancing, job scheduling, and health monitoring across the across all clusters, which can run on multiple machines in multiple locations.
But to achieve this coordination, Kubernetes needs a data store that provides a single, consistent source of the truth about the status of the system—all its clusters and pods and the application instances within them—at any given point in time. etcd is the data store used to create and maintain this version of the truth.
etcd serves a similar role for Cloud Foundry (link resides outside ibm.com)—the open source, multicloud Platform-as-a-Service (PaaS)—and is a viable option for coordinating critical system and metadata across clusters of any distributed application. The name “etcd” comes from a naming convention within the Linux directory structure: In UNIX, all system configuration files for a single system are contained in a folder called “/etc;” “d” stands for “distributed.”
See the video "What is etcd?" for a deeper dive (6:09):
It’s no small task to serve as the data backbone that keeps a distributed workload running. But etcd is built for the task, designed from the ground up for the following qualities:
Note that because etcd’s performance is heavily dependent upon storage disk speed, it’s highly recommended to use SSDs in etcd environments. For more information this and other etcd storage requirements, check out “Using Fio to Tell Whether Your Storage is Fast Enough for etcd.”
etcd is built on the Raft consensus algorithm to ensure data store consistency across all nodes in a cluster—table stakes for a fault-tolerant distributed system.
Raft achieves this consistency via an elected leader node that manages replication for the other nodes in the cluster, called followers. The leader accepts requests from the clients, which it then forwards to follower nodes. Once the leader has ascertained that a majority of follower nodes have stored each new request as a log entry, it applies the entry to its local state machine and returns the result of that execution—a ‘write’—to the client. If followers crash or network packets are lost, the leader retries until all followers have stored all log entries consistently.
If a follower node fails to receive a message from the leader within a specified time interval, an election is held to choose a new leader. The follower declares itself a candidate, and the other followers vote for it or any other node based on its availability. Once the new leader is elected, it begins managing replication, and the process repeats itself. This process enables all etcd nodes to maintain highly available, consistently replicated copes of the data store.
etcd is included among the core Kubernetes components and serves as the primary key-value store for creating a functioning, fault-tolerant Kubernetes cluster. The Kubernetes API server stores each cluster’s state data in etcd. Kubernetes uses etcd’s “watch” function to monitor this data and to reconfigure itself when changes occur. The “watch” function stores values representing the actual and ideal state of the cluster and can initiate a response when they diverge.
For a high-level overview of how Kubernetes manages clusters, services, worker nodes, please see our video “Kubernetes Explained”:
etcd was created by the same team responsible for designing CoreOS Container Linux, a widely used container operating system that can be run and managed efficiently on a massive scale. They originally built etcd on Raft to coordinate multiple copies of Container Linux simultaneously, to ensure uninterrupted application uptime.
In December 2018, the team donated etcd to the Cloud Native Computing Foundation (CNCF), a neutral nonprofit organization that maintains etcd’s source code, domains, hosted services, cloud infrastructure, and other project property as open source resources for the container-based cloud development community. CoreOS has merged with Red Hat.
Other databases have been developed to manage coordinate information between across distributed application clusters. The two most commonly compared to etcd are ZooKeeper and Consul.
ZooKeeper was originally created to coordinate configuration data and metadata across Apache Hadoop clusters. (Apache Hadoop (link resides outside of ibm.com) is an open source framework, or collection of applications, for storing and processing large volumes of data on clusters of commodity hardware.) ZooKeeper is older than etcd, and lessons learned from working with ZooKeeper influenced etcd’s design.
As a result, etcd has some important capabilities that ZooKeeper does not. For example, unlike ZooKeeper, etcd can do the following:
Consul is a service networking solution for distributed systems, the capabilities of which sit somewhere between those of etcd and the Istio service mesh for Kubernetes. Like etcd, Consul (link resides outside ibm.com) includes a distributed key-value store based on the Raft algorithm and supports HTTP/JSON application programming interfaces (APIs). Both offer dynamic cluster membership configuration, but Consul doesn’t control as strongly against multiple concurrent versions of configuration data, and the maximum database size with which it will reliably work is smaller.
Like etcd, Redis is an open source tool, but their basic functionalities are different.
Redis is an in-memory data store and can function as a database, cache, or message broker. Redis supports a wider variety of data types and structures than etcd and has much faster read/write performance.
But etcd has superior fault tolerance, stronger failover and continuous data availability capabilities, and, most importantly, etcd persists all stored data to disk, essentially sacrificing speed for greater reliability and guaranteed consistency. For these reasons, Redis is better suited for serving as a distributed memory caching system than for storing and distributed system configuration information.
Enterprise-ready, fully managed etcd, built with native integration into the IBM Cloud®.
IBM Cloud® Databases for etcd is a fully-managed distributed system configuration management solution that’s enterprise-ready, highly available, and secure. IBM was an early contributor to the etcd project and has supported the project and continued to contribute since it was brought on board by the CNCF.