What Is etcd?

What is etcd?

etcd is an open source distributed key-value store used to hold and manage the critical information that distributed systems need to keep running. Most notably, it manages the configuration data, state data, and metadata for Kubernetes, the popular container orchestration platform.

Like all distributed workloads, containerized workloads have complex management requirements that become more complex as the workload scales. Kubernetes simplifies the process of managing these workloads by coordinating tasks such as configuration, deployment, service discovery, load balancing, job scheduling, and health monitoring across the across all clusters, which can run on multiple machines in multiple locations.

But to achieve this coordination, Kubernetes needs a data store that provides a single, consistent source of the truth about the status of the system—all its clusters and pods and the application instances within them—at any given point in time. etcd is the data store used to create and maintain this version of the truth.

etcd serves a similar role for Cloud Foundry (link resides outside ibm.com)—the open source, multicloud Platform-as-a-Service (PaaS)—and is a viable option for coordinating critical system and metadata across clusters of any distributed application. The name “etcd” comes from a naming convention within the Linux directory structure: In UNIX, all system configuration files for a single system are contained in a folder called “/etc;” “d” stands for “distributed.”

Build responsible AI workflows with AI governance

Learn the building blocks and best practices to help your teams accelerate responsible AI.

Related content

Why etcd?

It’s no small task to serve as the data backbone that keeps a distributed workload running. But etcd is built for the task, designed from the ground up for the following qualities:

Fully replicated: Every node in an etcd cluster has access the full data store.
Highly available: etcd is designed to have no single point of failure and gracefully tolerate hardware failures and network partitions.
Reliably consistent: Every data ‘read’ returns the latest data ‘write’ across all clusters.
Fast: etcd has been benchmarked at 10,000 writes per second.
Secure: etcd supports automatic Transport Layer Security (TLS) and optional secure socket layer (SSL) client certificate authentication. Because etcd stores vital and highly sensitive configuration data, administrators should implement role-based access controls within the deployment and ensure that team members interacting with etcd are limited to the least-privileged level of access necessary to perform their jobs.
Simple: Any application, from simple web apps to highly complex container orchestration engines such as Kubernetes, can read or write data to etcd using standard HTTP/JSON tools.

Note that because etcd’s performance is heavily dependent upon storage disk speed, it’s highly recommended to use SSDs in etcd environments.

Raft consensus algorithm

etcd is built on the Raft consensus algorithm to ensure data store consistency across all nodes in a cluster—table stakes for a fault-tolerant distributed system.

Raft achieves this consistency via an elected leader node that manages replication for the other nodes in the cluster, called followers. The leader accepts requests from the clients, which it then forwards to follower nodes. Once the leader has ascertained that a majority of follower nodes have stored each new request as a log entry, it applies the entry to its local state machine and returns the result of that execution—a ‘write’—to the client. If followers crash or network packets are lost, the leader retries until all followers have stored all log entries consistently.

If a follower node fails to receive a message from the leader within a specified time interval, an election is held to choose a new leader. The follower declares itself a candidate, and the other followers vote for it or any other node based on its availability. Once the new leader is elected, it begins managing replication, and the process repeats itself. This process enables all etcd nodes to maintain highly available, consistently replicated copes of the data store.

etcd and Kubernetes

etcd is included among the core Kubernetes components and serves as the primary key-value store for creating a functioning, fault-tolerant Kubernetes cluster. The Kubernetes API server stores each cluster’s state data in etcd. Kubernetes uses etcd’s “watch” function to monitor this data and to reconfigure itself when changes occur. The “watch” function stores values representing the actual and ideal state of the cluster and can initiate a response when they diverge.

For a high-level overview of how Kubernetes manages clusters, services, worker nodes, please see our video “Kubernetes Explained”:

CoreOS and the history and maintenance of etcd

etcd was created by the same team responsible for designing CoreOS Container Linux, a widely used container operating system that can be run and managed efficiently on a massive scale. They originally built etcd on Raft to coordinate multiple copies of Container Linux simultaneously, to ensure uninterrupted application uptime.

In December 2018, the team donated etcd to the Cloud Native Computing Foundation (CNCF), a neutral nonprofit organization that maintains etcd’s source code, domains, hosted services, cloud infrastructure, and other project property as open source resources for the container-based cloud development community. CoreOS has merged with Red Hat.

etcd vs. ZooKeeper vs. Consul

Other databases have been developed to manage coordinate information between across distributed application clusters. The two most commonly compared to etcd are ZooKeeper and Consul.

ZooKeeper

ZooKeeper was originally created to coordinate configuration data and metadata across Apache Hadoop clusters. (Apache Hadoop (link resides outside ibm.com) is an open source framework, or collection of applications, for storing and processing large volumes of data on clusters of commodity hardware.) ZooKeeper is older than etcd, and lessons learned from working with ZooKeeper influenced etcd’s design.

As a result, etcd has some important capabilities that ZooKeeper does not. For example, unlike ZooKeeper, etcd can do the following:

Allow for dynamic reconfiguration of cluster membership.
Remain stable while performing read/write operations under high loads.
Maintain a multi-version concurrency control data model.
Offer reliable key monitoring that never drops events without giving a notification.
Use concurrency primitives that decouple connections from sessions.
Support a wide range of languages and frameworks (ZooKeeper has its own custom Jute RPC protocol that supports limited language bindings).

Consul

Consul is a service networking solution for distributed systems, the capabilities of which sit somewhere between those of etcd and the Istio service mesh for Kubernetes. Like etcd, Consul (link resides outside ibm.com) includes a distributed key-value store based on the Raft algorithm and supports HTTP/JSON application programming interfaces (APIs). Both offer dynamic cluster membership configuration, but Consul doesn’t control as strongly against multiple concurrent versions of configuration data, and the maximum database size with which it will reliably work is smaller.

etcd vs. Redis

Like etcd, Redis is an open source tool, but their basic functionalities are different.

Redis is an in-memory data store and can function as a database, cache, or message broker. Redis supports a wider variety of data types and structures than etcd and has much faster read/write performance.

But etcd has superior fault tolerance, stronger failover and continuous data availability capabilities, and, most importantly, etcd persists all stored data to disk, essentially sacrificing speed for greater reliability and guaranteed consistency. For these reasons, Redis is better suited for serving as a distributed memory caching system than for storing and distributed system configuration information.