Learn more about etcd, the fault-tolerant open source key-value database that serves as the primary data backbone for Kubernetes and other distributed platforms.
What is etcd?
etcd is an open source distributed key-value store used to hold and manage the critical information that distributed systems need to keep running. Most notably, it manages the configuration data, state data, and metadata for Kubernetes, the popular container orchestration platform.
Like all distributed workloads, containerized workloads have complex management requirements that become more complex as the workload scales. Kubernetes simplifies the process of managing these workloads by coordinating tasks such as configuration, deployment, service discovery, load balancing, job scheduling, and health monitoring across the across all clusters, which can run on multiple machines in multiple locations.
But to achieve this coordination, Kubernetes needs a data store that provides a single, consistent source of the truth about the status of the system—all its clusters and pods and the application instances within them—at any given point in time. etcd is the data store used to create and maintain this version of the truth.
etcd serves a similar role for Cloud Foundry—the open source, multicloud Platform-as-a-Service (PaaS)—and is a viable option for coordinating critical system and metadata across clusters of any distributed application. The name “etcd” comes from a naming convention within the Linux directory structure: In UNIX, all system configuration files for a single system are contained in a folder called “/etc;” “d” stands for “distributed.”
See the video "What is etcd?" for a deeper dive (6:09):
It’s no small task to serve as the data backbone that keeps a distributed workload running. But etcd is built for the task, designed from the ground up for the following qualities:
- Fully replicated: Every node in an etcd cluster has access the full data store.
- Highly available: etcd is designed to have no single point of failure and gracefully tolerate hardware failures and network partitions.
- Reliably consistent: Every data ‘read’ returns the latest data ‘write’ across all clusters.
- Fast: etcd has been benchmarked at 10,000 writes per second.
- Secure: etcd supports automatic Transport Layer Security (TLS) and optional secure socket layer (SSL) client certificate authentication. Because etcd stores vital and highly sensitive configuration data, administrators should implement role-based access controls within the deployment and ensure that team members interacting with etcd are limited to the least-privileged level of access necessary to perform their jobs.
- Simple: Any application, from simple web apps to highly complex container orchestration engines such as Kubernetes, can read or write data to etcd using standard HTTP/JSON tools.
Note that because etcd’s performance is heavily dependent upon storage disk speed, it’s highly recommended to use SSDs in etcd environments. For more information this and other etcd storage requirements, check out “Using Fio to Tell Whether Your Storage is Fast Enough for etcd.”
Raft consensus algorithm
etcd is built on the Raft consensus algorithm to ensure data store consistency across all nodes in a cluster—table stakes for a fault-tolerant distributed system.
Raft achieves this consistency via an elected leader node that manages replication for the other nodes in the cluster, called followers. The leader accepts requests from the clients, which it then forwards to follower nodes. Once the leader has ascertained that a majority of follower nodes have stored each new request as a log entry, it applies the entry to its local state machine and returns the result of that execution—a ‘write’—to the client. If followers crash or network packets are lost, the leader retries until all followers have stored all log entries consistently.
If a follower node fails to receive a message from the leader within a specified time interval, an election is held to choose a new leader. The follower declares itself a candidate, and the other followers vote for it or any other node based on its availability. Once the new leader is elected, it begins managing replication, and the process repeats itself. This process enables all etcd nodes to maintain highly available, consistently replicated copes of the data store.
etcd and Kubernetes
etcd is included among the core Kubernetes components and serves as the primary key-value store for creating a functioning, fault-tolerant Kubernetes cluster. The Kubernetes API server stores each cluster’s state data in etcd. Kubernetes uses etcd’s “watch” function to monitor this data and to reconfigure itself when changes occur. The “watch” function stores values representing the actual and ideal state of the cluster and can initiate a response when they diverge.
For a high-level overview of how Kubernetes manages clusters, services, worker nodes, please see our video “Kubernetes Explained”:
To learn how to connect a Kubernetes services application using etcd, review this tutorial.
CoreOS and the history and maintenance of etcd
etcd was created by the same team responsible for designing CoreOS Container Linux, a widely used container operating system that can be run and managed efficiently on a massive scale. They originally built etcd on Raft to coordinate multiple copies of Container Linux simultaneously, to ensure uninterrupted application uptime.
In December 2018, the team donated etcd to the Cloud Native Computing Foundation (CNCF), a neutral nonprofit organization that maintains etcd’s source code, domains, hosted services, cloud infrastructure, and other project property as open source resources for the container-based cloud development community. CoreOS has merged with Red Hat.
etcd vs. ZooKeeper vs. Consul
Other databases have been developed to manage coordinate information between across distributed application clusters. The two most commonly compared to etcd are ZooKeeper and Consul.
ZooKeeper was originally created to coordinate configuration data and metadata across Apache Hadoop clusters. (Apache Hadoop is an open source framework, or collection of applications, for storing and processing large volumes of data on clusters of commodity hardware.) ZooKeeper is older than etcd, and lessons learned from working with ZooKeeper influenced etcd’s design.
As a result, etcd has some important capabilities that ZooKeeper does not. For example, unlike ZooKeeper, etcd can do the following:
- Allow for dynamic reconfiguration of cluster membership.
- Remain stable while performing read/write operations under high loads.
- Maintain a multi-version concurrency control data model.
- Offer reliable key monitoring that never drops events without giving a notification.
- Use concurrency primitives that decouple connections from sessions.
- Support a wide range of languages and frameworks (ZooKeeper has its own custom Jute RPC protocol that supports limited language bindings).
Consul is a service networking solution for distributed systems, the capabilities of which sit somewhere between those of etcd and the Istio service mesh for Kubernetes. Like etcd, Consul includes a distributed key-value store based on the Raft algorithm and supports HTTP/JSON application programming interfaces (APIs). Both offer dynamic cluster membership configuration, but Consul doesn’t control as strongly against multiple concurrent versions of configuration data, and the maximum database size with which it will reliably work is smaller.
etcd vs. Redis
Like etcd, Redis is an open source tool, but their basic functionalities are different.
Redis is an in-memory data store and can function as a database, cache, or message broker. Redis supports a wider variety of data types and structures than etcd and has much faster read/write performance.
But etcd has superior fault tolerance, stronger failover and continuous data availability capabilities, and, most importantly, etcd persists all stored data to disk, essentially sacrificing speed for greater reliability and guaranteed consistency. For these reasons, Redis is better suited for serving as a distributed memory caching system than for storing and distributed system configuration information.
etcd and IBM Cloud
IBM was an early contributor to the etcd project and has supported the project and continued to contribute since it was brought on board by the CNCF. One of the project’s nine maintainers is also an IBM employee.
IBM offers IBM Cloud Databases for etcd, a fully-managed distributed system configuration management solution that’s enterprise-ready, highly available, and secure.
Start by signing up for a free IBM Cloud account, and you can set up an etcd cluster today.