A data mesh is a decentralized data architecture that organizes data by business domain—such as marketing, sales or customer service. Domain data producers treat their data as a product, enabling business users to easily find, understand and use data from across the organization.
This domain-driven design addresses many of the operational bottlenecks found in centralized, monolithic data systems. However, adopting a data mesh does not make traditional data storage systems (such as data lakes or data warehouses) obsolete. Instead, their roles shift from serving as single, centralized data platforms to supporting multiple decentralized data repositories.
The concept of data mesh was introduced and popularized by Zhamak Dehghani, a director of emerging technology for IT consultancy firm ThoughtWorks. She proposed this distributed data architecture as a solution to the inherent challenges of centralized data architectures, such as limited accessibility and organizational silos.
Data mesh is commonly compared to a microservices architecture—where a single application is composed of many smaller, loosely coupled services—because both emphasize decentralization, autonomy and scalability.
Every day, organizations create and collect massive amounts of data. Each department or business unit generates datasets that are often stored in disparate repositories and typically managed by a centralized data team.
This separation creates data silos—isolated collections of operational and analytical data that impede data sharing, reduce data quality and weaken data-driven decision-making. Data silos also limit the effectiveness of big data, machine learning (ML) and artificial intelligence (AI) initiatives.
In fact, according to the IBM Data Differentiator, 82% of enterprises report that data silos disrupt critical workflows, and that 68% of enterprise data remains unanalyzed.
Distributed data mesh architectures address these challenges by decentralizing data ownership and management. Rather than relying on a centralized data team and traditional pipelines, data ownership is transferred to domain teams. These teams manage their own data and provide it as a product to the rest of the organization via self-service data infrastructure.
This data-as-a-product approach emphasizes accessibility, governance and utility. It is grounded in the principle that data, just like any high-quality consumer product, should be managed and organized to meet the specific data needs of its users.
A data product is a reusable, self-contained asset that includes data, metadata, semantics and templates. It is designed for specific use cases and to serve a broad range of users across the enterprise, helping them extract meaningful business value from data that might otherwise be siloed.
Data products are developed with a product-thinking approach and by applying traditional product development principles. This approach involves understanding users’ data needs, prioritizing high-value features and iterating based on feedback.
Effective data products should be discoverable, understandable, interoperable, shareable, secure and reusable.
Industry newsletter
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
The data mesh paradigm is more than just a technical implementation. It involves a cultural shift in how organizations think about data ownership and access. Traditionally, organizations treated domain data as a byproduct of a process or system. However, since data mesh treats data as a product, domain teams become data product owners.
According to Zhamak Dehghani, there are four core principles of data mesh:1
Traditionally, a centralized infrastructure or data engineering team would maintain data ownership across domains. In a data mesh model, this ownership is decentralized and shifts to domain teams—those closest to the data and most familiar with how it’s used. These data owners are responsible for producing data products tailored to these specific uses.
Domain teams also manage their own extract, transform, load (ETL)/extract, load, transform (ELT) pipelines within a data mesh architecture. However, this responsibility does not eliminate the need for a centralized data engineering team. Instead, their role shifts to provide and maintain the best data infrastructure solutions for storing and delivering data products.
A data-as-a-product (DaaP) approach treats datasets as marketable products that can be served to various users inside and outside an organization. Domain data products are made accessible to users across the organization through application programming interfaces (APIs) or data sharing platforms.
In this way, a data mesh approach enables more flexible data integration and interoperable data products. Data from multiple domains can be readily consumed for data analytics, data science, machine learning and other use cases.
A self-serve data platform has tooling that helps domain teams—with less specialized product build knowledge—create, maintain and share new data products. The data platform team may provide data services such as scalable data storage, data pipeline orchestration, data lineage and more.
The self-serve platform can also have different planes, or layers, to serve different users. Dehghani lists three examples: a data infrastructure provisioning plane, a data product developer experience plane and a data mesh supervision plane.
In a data mesh ecosystem, domain teams are responsible for defining data governance policies related to documentation, quality and access. This includes maintaining semantic definitions, cataloging metadata and setting permissions and usage policies.
This standardization supports self-service data access across an organization, while a centralized data governance team establishes and maintains organizational standards.
Data fabric and data mesh are complementary data architectures. In fact, data fabrics often enhance the functionality and enable the implementation of a data mesh.
A data fabric uses intelligent and automated systems to break down silos, manage data assets and optimize data management at scale. It focuses on the automation of data ingestion, data integration, data engineering and governance. For example, a data fabric can automate key parts of data mesh such as creating data products and managing their lifecycle.
Organizations that adopt data mesh architectures can experience a range of benefits, including:
Data mesh architectures can facilitate self-service data access by making datasets discoverable and usable. This democratization broadens data access beyond technical teams—such as data scientists, data engineers and developers. With proper governance, this approach can also reduce data silos and operational bottlenecks, enabling faster, more agile decision-making.
The distributed architecture of data mesh can encourage the adoption of cloud data platforms and pipelines for real-time data streaming. These tools can improve visibility into storage and processing costs, enabling better budget and resource allocation for engineering teams.
When organizations implement data mesh on cloud infrastructure, data teams can scale storage and compute resources as needed. For instance, if additional compute power is required to complete a job in hours instead of days, the business can easily provision temporary, additional compute nodes.
Distributing data pipeline responsibility by domain removes the complexity and collaboration required to maintain a centralized data system. This decentralized approach reduces technical strains and debt, and accelerates delivery to data consumers.
Data mesh encourages domain teams to agree on standardized, domain-agnostic data fields and formats (such as field type, metadata and schema flags). These shared rules facilitate integration and reuse by making it quick and easy to apply relevant rules across domains.
Data mesh architectures help enforce data rules and access controls at the domain level through standardized rules and embedded observability. This strong governance posture helps ensure that organizations are following regulations pertaining to sensitive data, such as the US Health Insurance Portability and Accountability Act (HIPAA).
Through domain ownership and a decentralized data ecosystem, data mesh architectures help organizations improve data accessibility and usability across a variety of use cases, including:
Discoverable, domain-owned and curated datasets support BI initiatives. Teams can easily add these datasets to BI dashboards and data visualizations without the technical assistance of a central data engineering team.
Chatbots and virtual agents perform best when they have access to quality, relevant data. A data mesh architecture helps make more high-quality data sources from across domains available to these systems.
Organizations can gain a more unified view of their customers by combining standardized customer data from across domains. This view can improve overall customer experience, including personalization and targeting efforts.
Standardized data reduces the time data scientists need to spend combining data from various domains. This time savings accelerates data processing and increases the number of models which can move into a production environment.
Discover, govern and share your data—wherever it resides—to fuel AI that delivers accurate, timely and relevant insights.
Transform raw data into actionable insights swiftly, unify data governance, quality, lineage and sharing, and empower data consumers with reliable and contextualized data.
Unlock the value of enterprise data with IBM Consulting, building an insight-driven organization that delivers business advantage.
1 “Data Mesh Principles and Logical Architecture,” Martin Fowler, 3 December 2020.