What is a service mesh?

Authors

Staff Writer

IBM Think

Staff Editor

IBM Think

Service mesh, defined

A service mesh is a software layer in a modern application architecture that manages connectivity between microservices, enabling applications to function. Service meshes provide many critical capabilities, such as service-to-service communication, service discovery, load balancing and authentication.

One of the biggest challenges that app developers in the modern business landscape face is scalability. As the number of application users increases, it becomes difficult for engineers in DevOps (the software development methodology that accelerates app delivery through automation) to monitor service performance. A service mesh provides key features that help monitor and manage critical services such as logging, tracing and traffic control.

As applications have become fundamental to digital transformation, the importance of service meshes has increased. Today, they are the key enablers of some of the most advanced application technology available, including cloud-native applications, microservices and containers.

According to Forbes, in 2022, 70% of organizations already ran a service mesh and 19% were evaluating one.¹

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

What are microservices?

Microservices, also known as microservices architecture, is a cloud-native architectural approach where applications are built out of many independent, smaller components or services. This approach allows developers to update code more easily and add or remove features and functionality without impacting the rest of the application, leading to high scalability.

Service meshes are critical to microservices architecture. They provide a highly configurable infrastructure layer where all the services in the microservices application can connect and exchange information. In addition to a service mesh, microservices architecture is also used along with container technology and its most popular platform, Kubernetes.

What are containers and Kubernetes?

The technology known as containers, along with one of its most popular container orchestration platforms, Kubernetes, has become indispensable to service mesh functionality because of how it enables developers to manage complex, microservices-based applications.

Containers are executable units of software that package application code along with its libraries and dependencies, allowing it to run in any computing environment. With the proliferation of containers in modern application architecture, managing large container groups quickly became a challenge.

Enter Kubernetes (also known as k8s or Kube), a container orchestration platform that has become one of the most popular orchestration solutions available on the market today. Kubernetes clusters—collections of nodes that represent physical and virtual machines (VMs)—are managed on the control plane.

Another important aspect of service mesh functionality is the way access to services within a specific cluster is managed, a process known as ingress.

How does a service mesh work?

Today’s most popular applications demand that many workloads or computing tasks and processes be deployed at once. A microservices architecture allows developers to build each application as a collection of small, independent services that are easier to manage.

However, for application code to function, microservices need to communicate quickly and accurately, and this is where service mesh architecture is critical. Service meshes are designed in a way that gives developers more control over service-to-service communication within an application.

At their most fundamental level, service meshes rely on a type of communication known as proxy-based communication to enhance the manageability and control of microservices-based applications. Proxy-based communication features proxy servers (also known as proxies) that function as intermediaries between the microservice and an organization’s network, allowing traffic to be routed to and from a service through proxies. This capability, known as network proxy communication, is critical to maintaining the manageability, observability and security of many applications.

In addition to proxies, a service mesh architecture relies on 2 primary components that enable it to function: the control plane and data plane.

Control plane

The control plane is the part of the computer network that controls how data is routed between users and devices (also known as nodes). Control planes follow routing rules or protocols that are informed by algorithms that determine the best route for data to take over a network.

In a service mesh, the control plane includes specifically designed proxies called sidecar proxies that abstract certain functionalities, such as monitoring and security, to make them more efficient. When a service needs to communicate with another service in a service mesh, the sidecar proxy intercepts the request and creates a secure and encrypted channel for it to travel on.

Data plane

The data plane, also known as the forwarding plane, enables data to be sent around the network through devices like routers and switches. Data planes use sidecar proxies to manage messaging between services and important functionalities like circuit breaking and request retries. The data plane is also where key capabilities such as load balancing, service discovery and routing are carried out.

Service mesh versus API gateway

In addition to the data plane and the control plane, the application protocol interfaces (API) gateway is another important part of microservices architecture that is closely related to service mesh functionality.

APIs are protocols that enable software applications to communicate and exchange data. API gateways are tools that act as intermediaries between API clients—for example, the popular REST API that helps developers build applications—and backend services located on a server.

API gateways and service meshes are similar in that they both enable more efficient application development. However, while an API gateway controls access to APIs, a service mesh connects microservices within the application. Service meshes and API gateways are frequently deployed together to increase flexibility and observability in an application development ecosystem.

Benefits of a service mesh

Service meshes, and the microservices architecture they make possible, deliver many critical benefits to an organization. Here are a few of the most common.

Observability

A service mesh provides built-in observability, a deeper understanding of the condition of a complex system, for an entire microservices architecture. This allows developers to monitor important metrics such as dependencies, latency and error rates that are important to understanding how an app is functioning.

Observability also helps with troubleshooting, performance optimization, telemetry (recording of system behavior) and debugging by giving developers a full and unobstructed view into the internal workings of a microservices ecosystem.

Control

For apps to function properly, developers and application administrators need to control how services communicate with each other within the app. A service mesh increases governance capabilities for organizations deploying microservices architectures—for example, the way teams enforce security and compliance requirements in heavily regulated sectors.

Also, service meshes provide dedicated infrastructure layers specifically to handle the demands of service-to-service communication with distributed applications—applications that run on more than one, connected computer at once.

Security

Service meshes help ensure secure communication between services through features like Mutual Transport Layer Security (mTLS) encryption and authentication. mTLS authentication helps ensure that traffic in an application is secure and trusted in both directions between client and server.

mTLS also provides data confidentiality by encrypting the information that’s sent over service-to-service communication. It allows administrators to enforce authorization policies such as access to specific endpoints, a process known as endpoint security.

Service discovery

Service meshes have a capability known as automated service discovery that helps reduce the workload of managing service endpoints, the location in the service mesh where a specific microservice can be reached. A service registry allows services to find and communicate with each other automatically, regardless of where they are located, enabling developers to deploy new services quickly and easily.

Load balancing

Load balancing, the distribution of network traffic among multiple servers to optimize app performance, is a key capability of service meshes. Using algorithms, the service mesh helps balance workloads between nodes, optimize compute resources and generally ensure the high availability of an application.

Traffic management and traffic splitting

Service meshes provide advanced traffic management and traffic splitting features to help optimize the flow of information and resources over a network. While both traffic management and traffic splitting are used to control information flow on a network, they have 1 important difference worth noting. Traffic management focuses on long-term, systemic changes to infrastructure to improve information traffic flow, while traffic splitting involves the weight-based distribution of traffic across backends or service versions.

Service meshes provide fine-grained, highly specific control over routing and traffic behavior, allowing for smoother transitions when an application is updated to a newer software version. For example, in the widely popular canary deployment, a new version of an app is only released to a small group of users to test features and performance before it is made available to everyone else.

Choosing the right service mesh solution

The global market for service mesh providers is already strong and growing rapidly. According to a recent survey, it was valued at USD 0.22 billion in 2023 and is projected to grow to USD 5.05 billion by 2032, a compound annual growth rate (CAGR) of 41.3%.²

In an industry growing at such a rapid pace, selecting the right service mesh solution can be a challenge. Some key factors to consider include cost, ease of implementation, compatibility with existing technologies, security, performance and support.

Here are 5 of the most popular service mesh solutions available and a brief overview of what makes them unique.

Istio

The most popular service mesh available is Istio. Its large set of features means it is both highly adaptive and well-suited to enterprise-level workloads. Istio is well-known for its advanced traffic management capabilities, security features and extensibility, which is enhanced by its large ecosystem of contributors.

Like some other service mesh offerings, Istio is an open source project, meaning that it was developed and maintained through open collaboration and made available for anyone to use. It works well with Kubernetes and many other service-mesh-adjacent technologies.

Linkerd

More lightweight and straightforward than Istio, Linkerd is a simple service mesh solution that enhances performance while maintaining low latency. Linkerd has all the basic functionalities that enterprises have come to expect from a service mesh, including load balancing, service discovery, encryption and more. Like Istio, Linkerd is also open source.

NGINX Service Mesh

NGINX is more comprehensive than most other service meshes. It functions both as a web server and a reverse proxy, which means it can offer a wider range of protocols than other service meshes. NGINX was designed for maximum performance and stability and is used by many high-traffic websites. Like Linkerd, Istio and other service mesh solutions, it is open source.

Consul

Created by the popular, cloud-based infrastructure company HashiCorp, Consul is a multicloud service mesh that offers many of the same features as other popular service meshes. Consul is one of the most flexible service mesh solutions available and can be used on many different operating systems (OSs), including Windows, Linux, macOS, FreeBSD and Solaris. Consul is also open source and is best known for its popular Prometheus plug-in that enhances monitoring capabilities.

AWS app mesh

Designed specifically for use with Amazon Web Services (AWS) cloud deployments, AWS app mesh is a service mesh with many of the same features as Linkerd, Istio and others, including enhanced security, traffic management and observability. However, AWS is not flexible and is only considered a good fit for organizations that are already integrated into the AWS architecture.

Want to modernize your application architecture? Start here.

This practical eBook explains how DevOps can help developers and architects deliver applications faster, more reliably, and with less stress.

Resources

Integrate Storage into Your Three-Tier Architecture with OpenShift

See how to connect the data tier of your architecture to containerized application tiers using FlashSystem and Red Hat OpenShift — ensuring performance, resilience, and scalability.

Optimize Enterprise Storage Architecture for Maximum Performance

Get in-depth guidance on designing, implementing, and tuning IBM DS8A00 storage systems. Learn how to streamline I/O, ensure high availability, and strengthen disaster recovery in mission-critical environments.

Build a Scalable, Resilient Data Tier with IBM Storage Ceph

Explore the architecture and concepts behind IBM Storage Ceph. Learn how to design a massively scalable, self-healing data layer that supports AI workloads, consolidates storage, and lowers costs across multi-tier applications.

Accelerate Your Data Tier with Parallel File Systems

IDC insights on how parallel file systems boost throughput and reliability in the data tier for high-demand, multi-tier applications.

Leverage Analytics Across Every Tier

Learn how analytics in the data tier can inform application logic and front-end experiences for smarter decision-making.

See How Enterprise AI Works Across the Stack

Explore real-world examples of generative AI adoption in industries like finance and supply chain. Learn how these systems are supported by modern, layered architecture.

Create Responsible AI Systems with End-to-End Governance

Gain a structured approach to managing risk, trust, and compliance in AI systems. Learn how governance frameworks apply across layered cloud architectures.

Footnotes

¹Service Mesh As The Bridge To App Modernization, Forbes, March 2023

² Service Mesh Market Report, Business Research Insights, October 2024