February 7, 2019 | Written by: Etai Lev Ran
Categorized: Compute Services
Share this post:
Istio 1.1 multicluster functionality
There is a growing interest in running workloads across clusters. Multicluster deployments can result in better scaling, failure isolation, and application agility. Towards that end, Istio v1.1 builds on and enhances the Istio 1.0 multicluster support.
This blog post highlights the multicluster functionality in Istio. We describe what capabilities exist and how to use them.
Before jumping into implementation details, it’s important to agree on terminology. When implementing multicluster support in mid-2018, there was a lack of clarity regarding what we needed to support. The term “multicluster” meant different things to different people, in different contexts.
The terminology used in this blog will be as follows:
- Cluster: A collection of Kubernetes nodes with shared API masters. While Istio supports other cluster types, the focus was on Kubernetes clusters.
- Network: A set of connected endpoints or service instances. That is, barring any security devices and policies, any two endpoints can communicate. The network may use a Virtual Private Cloud (VPC), a Virtual Private Network (VPN), or any kind of overlay.
- Mesh: A set of workloads under a common administrative control.
It is important to note that the order of terms above does not define a specific relation, such as containment. We’ve seen a variety of combinations between meshes, clusters, and networks. Istio will likely support many of these multicluster combinations in the future. We’ll add support based on commonality and prevalence of the use case.
At a high level, two common patterns or use cases emerged—single mesh and mesh federation. Single mesh combines clusters into one unit that is managed by a single Istio control plane. Single mesh can be one “physical” control plane or a set of control planes with replicated configuration. This would often use tooling, driven by shared CICD pipelines or GitOps practices.
The mesh federation pattern keeps clusters separate as independent management domains. There is no assumption on access permission, uniformity of namespaces, or service names. Operators control which services in their cluster to expose to other clusters.
Multicluster support in Istio 1.0
Istio 1.0 supports single mesh multicluster pattern and assumes a single network. That is, addresses for pods and services in all clusters are routable and do not conflict. Furthermore, operators must define namespaces, services, and service accounts in all clusters. This is required for name resolution and identities to work across clusters.
For cases that meet the one network requirement, Istio 1.0 provides basic support that works “out of the box.” With some additional configuration, operators can define multi-network and mesh federation designs. For example, combine clusters from different networks by adding VPNs and NATs or enable mesh federation by creating the relevant service entries in clusters.
The following diagram (from istio.io) shows the call sequence using the multicluster support in 1.0:
In this architecture:
- Cluster 1 runs the Istio control plane. It is the often called the “local” cluster, with all other clusters referred to as “remote” clusters. You can substitute “local” for “hub,” “master,” or “control plane” cluster if it makes things clearer.
- Other clusters, such as Cluster 2, run a smaller Istio installation. They only run Citadel, admission controller, and sidecar proxies for workloads.
- Pilot has access to all Kubernetes API masters in all clusters, so it has a global mesh view. Citadel and auto-injection run with cluster local scope.
- Each cluster has a unique Pod and Service CIDR and is connected by a shared “flat” network to other clusters. This allows direct routes to any workload, including to Istio control plane (e.g., remote Envoys need to get configuration from Pilot, check and report to Mixer, etc.).
Istio 1.1 features
Istio 1.1 introduces two features for multicluster scenarios: Split Horizon EDS and SNI aware routing. EDS is short for Endpoint Discovery Service (EDS), a part of Envoy’s API. Pilot uses EDS configure the Envoys’ data plane with service and endpoint information. With Split Horizon EDS, Pilot returns endpoint information based on the calling sidecar. SNI aware routing leverages the “Server Name Indication” TLS extension to make routing decisions. Istio Gateways intercept and parse the TLS handshake and use the SNI data to decide on the destination service endpoints.
Single Mesh using Split Horizon EDS implementation
Each cluster has a “network” label associated with it. The label indicates the network that the cluster belongs to. Each cluster has an associated ingress gateway, separate from the cluster ingress and not exposed to end users. The gateway has the same network label value as other workloads in its cluster. The label creates an association of the in-cluster service endpoints with the ingress gateway.
Pilot collects the list of services and their endpoints along with the network label of each. Endpoints under the same service name are considered part of the same service. That is, a client can call any endpoint, in any cluster, if they share a service name.
When connecting to Pilot, sidecar proxies offer their own network label. Pilot responds to EDS calls with endpoints of two types. It returns IP addresses of in-network instances and gateway addresses for instances with different label values. Pilot assigns gateway endpoints a weight proportional to the number of instances behind it. Envoy can then load-balance requests between local and remote endpoints. Due to the use of SNI-based routing, routing configuration is minimal. SNI propagates arbitrary information so existing Istio functionality continues to work.
From a management perspective, the mesh functions as a single logical domain:
- Pilot maintains a list of remote clusters in one place. Therefore, the design provides a mesh-wide view into participating clusters.
- Services names and instances are shared between clusters. Access control requires Istio RBAC policies.
- The “control plane” cluster has access to remote API masters, etc.
In this architectural pattern:
- Only gateways need to be routable from remote clusters. Internal network CIDRs are not exposed.
- We support pass-through mTLS between endpoints (via gateways, with SNI based routing).
- Users set up and manage root CA configuration across clusters.
Single or federated mesh using cluster aware connectivity
Cluster aware connectivity is another feature introduced in Istio 1.1. Like Split Horizon EDS, it also uses gateways and SNI for inter-cluster communications. Cluster aware connectivity relies on DNS resolution to get access to remote instances. The client first tries local cluster resolution and uses remote instances as a fallback.
In this pattern, each cluster runs the full Istio control plane. If all control planes are configured identically, it functions as a single mesh. However, since each control plane could be configured separately, it also supports a mesh federation. For example, services in the local cluster and remote clusters are not merged. The scope of administrative sharing is lower, but we’re still assuming some shared cluster management:
- Service entries for remote services are manually defined in every cluster.
- Any change (such as Gateway IP change) is synchronized across all possible clusters.
Manual management would benefit from automation and does not scale to more than a few clusters. Note that this pattern trades ease of configuration and access control. Knowing the remote gateway IP and SNI encoding could expose any remote service. So, keep gateway IP addresses private and apply RBAC policies to limit service access.
In this architectural pattern:
- Users are responsible for setting up and managing a shared root CA, (same as in previous patterns).
- Pod and service CIDRs may overlap, only gateways are exposed to remote clusters.
- Clients use DNS resolution to resolve local or remote services
- mTLS pass-through (via gateways) to remote service.
Cluster aware implementation
Istio auto-injection adds a “.global” search suffix in the pod’s DNS resolution. This acts as a fallback to the suffixes used by Kubernetes (e.g., “cluster.local” and “<namespace>.cluster.local”).
Operators configure remote services using a Service Entry. The service entry defines a “.global” service name, a remote cluster gateway, and a host local address. These addresses are not routable outside the pod. They are only used to allow name resolution to complete. The administrator must assign a unique host local address to each remote service.
When a workload attempts to connect to service name “foo,” its DNS client must first resolve the name. The DNS treats “foo” as a partial name and starts iterating through the list of suffixes. For local services, Kubernetes resolves the name to the in-cluster service VIP. If resolution fails, the DNS client will continue processing through the suffix list. Eventually, it will attempt to resolve the “.global” name. Istio configures the Kubernetes cluster’s DNS server to forward the “.global” names to an Istio provided DNS server. The Istio DNS server uses the service entry definition to return a host local address to the client. The client then creates a new connection, which Envoy intercepts and routes using SNI.
In general, we expect Istio’s features to continue working. Routing, security policy, metric collection, etc., should all work as expected. Operator configuration might be a bit more complicated than in the single cluster case.
The current implementation focuses on solving networking and connectivity. Other multicluster concerns, such as providing local and global observability, are out of scope. Users should resolve these based on their configuration and needs. Even on the networking side, there is work left to do. Some tasks are minor, others are larger areas for improvement. For example, cross-cluster load balancing ignores network latency or bandwidth to remote clusters.
While constantly improving, usability might still be somewhat of a concern. Operators need better documentation automation to efficiently manage multicluster scenarios. As noted, both designs call for configurations across many clusters. This could be cumbersome and error-prone as clusters and services are added or removed.
In closing, we’d like to suggest a possible view on when single mesh and mesh federation patterns make sense.
Single-mesh scenarios seem better aligned to use cases where clusters are configured identically (e.g., sharing namespaces, services, service accounts, etc.). Teams deploy applications to any cluster, based on reliability, availability, or locality requirements. The service mesh then combines service instances from many clusters into one unit. You should make sure that clusters in the same mesh are close (latency-wise) to each other so that cross-cluster calls are efficient.
Mesh federation scenarios seem better aligned when clusters are units of isolation. Teams manage their assigned clusters independently, and there are no assumptions on naming. Namespace or service name in clusters belonging to different teams are independent. With mesh federation, users only expose some services to remote workloads. Most services are kept private. This may be useful in cases where cross-cluster calls are expensive.
Lastly, nothing prevents you from mixing the two patterns in your deployment. For example, an operator uses single mesh for clusters in the same availability zone (AZ) to provide high availability (HA). Mesh federation is then used for selective sharing between different geographical regions or environments.
We would like to better understand multicluster use cases in the community; this would help us in requirement gathering and prioritization. Please consider taking a few minutes to answer the multicluster questionnaire: https://goo.gl/forms/GFMQ6AL0tQFbGCYx1.