What is a data mesh?

Authors

Staff Editor

IBM Think

Staff Writer

IBM Think

What is a data mesh?

A data mesh is a decentralized data architecture that organizes data by business domain—such as marketing, sales or customer service. Domain data producers treat their data as a product, enabling business users to easily find, understand and use data from across the organization.

This domain-driven design addresses many of the operational bottlenecks found in centralized, monolithic data systems. However, adopting a data mesh does not make traditional data storage systems (such as data lakes or data warehouses) obsolete. Instead, their roles shift from serving as single, centralized data platforms to supporting multiple decentralized data repositories.

The concept of data mesh was introduced and popularized by Zhamak Dehghani, a director of emerging technology for IT consultancy firm ThoughtWorks. She proposed this distributed data architecture as a solution to the inherent challenges of centralized data architectures, such as limited accessibility and organizational silos.

Data mesh is commonly compared to a microservices architecture—where a single application is composed of many smaller, loosely coupled services—because both emphasize decentralization, autonomy and scalability.

Why use a data mesh?

Every day, organizations create and collect massive amounts of data. Each department or business unit generates datasets that are often stored in disparate repositories and typically managed by a centralized data team.

This separation creates data silos—isolated collections of operational and analytical data that impede data sharing, reduce data quality and weaken data-driven decision-making. Data silos also limit the effectiveness of big data, machine learning (ML) and artificial intelligence (AI) initiatives.

In fact, according to the IBM Data Differentiator, 82% of enterprises report that data silos disrupt critical workflows, and that 68% of enterprise data remains unanalyzed.

Distributed data mesh architectures address these challenges by decentralizing data ownership and management. Rather than relying on a centralized data team and traditional pipelines, data ownership is transferred to domain teams. These teams manage their own data and provide it as a product to the rest of the organization via self-service data infrastructure.

This data-as-a-product approach emphasizes accessibility, governance and utility. It is grounded in the principle that data, just like any high-quality consumer product, should be managed and organized to meet the specific data needs of its users.

What is a data product?

A data product is a reusable, self-contained asset that includes data, metadata, semantics and templates. It is designed for specific use cases and to serve a broad range of users across the enterprise, helping them extract meaningful business value from data that might otherwise be siloed.

Data products are developed with a product-thinking approach and by applying traditional product development principles. This approach involves understanding users’ data needs, prioritizing high-value features and iterating based on feedback.

Effective data products should be discoverable, understandable, interoperable, shareable, secure and reusable.

Deep dive: What is a data product?

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think Newsletter. See the IBM Privacy Statement.

How does a data mesh work?

The data mesh paradigm is more than just a technical implementation. It involves a cultural shift in how organizations think about data ownership and access. Traditionally, organizations treated domain data as a byproduct of a process or system. However, since data mesh treats data as a product, domain teams become data product owners.

According to Zhamak Dehghani, there are four core principles of data mesh:¹

Domain-oriented decentralized data ownership and architecture
Data as a product
Self-serve data infrastructure as a platform
Federated computational governance

Domain-oriented decentralized data ownership and architecture

Traditionally, a centralized infrastructure or data engineering team would maintain data ownership across domains. In a data mesh model, this ownership is decentralized and shifts to domain teams—those closest to the data and most familiar with how it’s used. These data owners are responsible for producing data products tailored to these specific uses.

Domain teams also manage their own extract, transform, load (ETL)/extract, load, transform (ELT) pipelines within a data mesh architecture. However, this responsibility does not eliminate the need for a centralized data engineering team. Instead, their role shifts to provide and maintain the best data infrastructure solutions for storing and delivering data products.

Data as a product

A data-as-a-product (DaaP) approach treats datasets as marketable products that can be served to various users inside and outside an organization. Domain data products are made accessible to users across the organization through application programming interfaces (APIs) or data sharing platforms.

In this way, a data mesh approach enables more flexible data integration and interoperable data products. Data from multiple domains can be readily consumed for data analytics, data science, machine learning and other use cases.

Self-serve data infrastructure as a platform

A self-serve data platform has tooling that helps domain teams—with less specialized product build knowledge—create, maintain and share new data products. The data platform team may provide data services such as scalable data storage, data pipeline orchestration, data lineage and more.

The self-serve platform can also have different planes, or layers, to serve different users. Dehghani lists three examples: a data infrastructure provisioning plane, a data product developer experience plane and a data mesh supervision plane.

Federated governance and pipeline management

In a data mesh ecosystem, domain teams are responsible for defining data governance policies related to documentation, quality and access. This includes maintaining semantic definitions, cataloging metadata and setting permissions and usage policies.

This standardization supports self-service data access across an organization, while a centralized data governance team establishes and maintains organizational standards.

Mixture of Experts | 20 February, episode 95

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch all episodes of Mixture of Experts

Data mesh vs. data fabric

Data fabric and data mesh are complementary data architectures. In fact, data fabrics often enhance the functionality and enable the implementation of a data mesh.

A data fabric uses intelligent and automated systems to break down silos, manage data assets and optimize data management at scale. It focuses on the automation of data ingestion, data integration, data engineering and governance. For example, a data fabric can automate key parts of data mesh such as creating data products and managing their lifecycle.

Learn more about the differences between data fabric and data mesh

Benefits of a data mesh

Organizations that adopt data mesh architectures can experience a range of benefits, including:

Data democratization and discoverability
Cost efficiencies
Flexibility to scale
Reduced technical debt
Improved interoperability
Stronger security and compliance

Data democratization and discoverability

Data mesh architectures can facilitate self-service data access by making datasets discoverable and usable. This democratization broadens data access beyond technical teams—such as data scientists, data engineers and developers. With proper governance, this approach can also reduce data silos and operational bottlenecks, enabling faster, more agile decision-making.

Cost efficiencies

The distributed architecture of data mesh can encourage the adoption of cloud data platforms and pipelines for real-time data streaming. These tools can improve visibility into storage and processing costs, enabling better budget and resource allocation for engineering teams.

Flexibility to scale

When organizations implement data mesh on cloud infrastructure, data teams can scale storage and compute resources as needed. For instance, if additional compute power is required to complete a job in hours instead of days, the business can easily provision temporary, additional compute nodes.

Reduced technical debt

Distributing data pipeline responsibility by domain removes the complexity and collaboration required to maintain a centralized data system. This decentralized approach reduces technical strains and debt, and accelerates delivery to data consumers.

Improved interoperability

Data mesh encourages domain teams to agree on standardized, domain-agnostic data fields and formats (such as field type, metadata and schema flags). These shared rules facilitate integration and reuse by making it quick and easy to apply relevant rules across domains.

Stronger security and compliance

Data mesh architectures help enforce data rules and access controls at the domain level through standardized rules and embedded observability. This strong governance posture helps ensure that organizations are following regulations pertaining to sensitive data, such as the US Health Insurance Portability and Accountability Act (HIPAA).

Use cases of a data mesh

Through domain ownership and a decentralized data ecosystem, data mesh architectures help organizations improve data accessibility and usability across a variety of use cases, including:

Business intelligence (BI) dashboards

Discoverable, domain-owned and curated datasets support BI initiatives. Teams can easily add these datasets to BI dashboards and data visualizations without the technical assistance of a central data engineering team.

Automated virtual assistants

Chatbots and virtual agents perform best when they have access to quality, relevant data. A data mesh architecture helps make more high-quality data sources from across domains available to these systems.

Customer experience

Organizations can gain a more unified view of their customers by combining standardized customer data from across domains. This view can improve overall customer experience, including personalization and targeting efforts.

Machine learning and AI projects

Standardized data reduces the time data scientists need to spend combining data from various domains. This time savings accelerates data processing and increases the number of models which can move into a production environment.

Four steps to better business forecasting with analytics

Use the power of analytics and business intelligence to plan, forecast and shape future outcomes that best benefit your company and customers.

Resources

The hybrid, open data lakehouse for AI

Simplify data access and automate data governance. Discover the power of integrating a data lakehouse strategy into your data architecture, including cost-optimizing your workloads and scaling AI and analytics, with all your data, anywhere.

From data chaos to AI clarity: Activating AI through high-quality enterprise data

Understand how focusing on well-governed, secure and collaborative access to data at scale empowers enterprises to maximize their AI investments

Decision intelligence: Thoughtful, data-driven choices

Learn how data intelligence helps leaders make sense of data, use generative AI wisely and make decisions based on what truly matters.

Streamlining and evolving fraud investigations with AI

Discover how Cogniware leverages AI solutions from IBM to drive efficiency in the financial crime space.

Turning data strategy into AI impact

Discover how to scale AI with a strong data foundation, deliver explainable and governed outcomes, and apply real-world lessons to your own AI roadmap.

How the C-suite is turning information into impact

Explore insights from 1,700 CDOs in this cross-industry report for data leaders.

Unify and access your data to help scale your AI

Learn why the path to AI-ready data often starts with effective access to both structured and unstructured data and the challenges that can impede data leaders.

Unleash the power of AI for seamless data integration

Understand why organizations need to adopt a unified approach that lets them manage the full spectrum of integration capabilities from a single pane of glass, eliminating the need to rely on numerous tools.

Footnotes

¹“Data Mesh Principles and Logical Architecture,” Martin Fowler, 3 December 2020.

What is a data mesh?

Authors

What is a data mesh?

Why use a data mesh?

What is a data product?

The latest tech news, backed by expert insights

Thank you! You are subscribed.

How does a data mesh work?

Domain-oriented decentralized data ownership and architecture

Data as a product

Self-serve data infrastructure as a platform

Federated governance and pipeline management

Decoding AI: Weekly News Roundup

Data mesh vs. data fabric

Benefits of a data mesh

Data democratization and discoverability

Cost efficiencies

Flexibility to scale

Reduced technical debt

Improved interoperability

Stronger security and compliance

Use cases of a data mesh

Resources

Footnotes