Data fabric marketplace: The heart of data economy

By and Sandipan Sarkar | 5 minute read | July 25, 2022

In older civilizations, where transportation and communication were primitive, the marketplace was where people came to buy and sell products. This was the only way to know what was on offer and who needed it. Modern-day enterprises face a similar situation regarding data assets. On one side there is a need for data. Businesses ask: “Do we have this kind of data in the enterprise?” “How do we get that data?” “Can I trust that data?” On other side, enterprises and organizations sit on piles of data, and they have no clue that others need it and are ready to pay. Like a medieval marketplace, a data marketplace can bring these two sides together to trade.

This discussion is more relevant with the advent of data fabric. Data fabric is a distributed heterogeneous architecture that makes data available in the right shape, at the right time and place. A data marketplace is often the first step toward the data fabric vision of an enterprise. A data marketplace tops our major clients’ wish lists, a trend also observed by industry analysts. For example, Deloitte identified data sharing made easy as one of the top seven technology trends. Gartner predicts that by 2023, organizations that promote data sharing will outperform their peers in most business metrics.

Why is marketplace the centerpiece of data fabric?

The main purpose of data fabric is to make data sharing easier. Today, when data sharing roughly equates to data copying, enterprises spend a lot to move data from one place to another and to curate the data to make it fit for purpose. This long journey of data discovery and processing can lengthen the application development lifecycle or delay insight delivery. Enterprises must address the inefficiencies to remain competitive. Business users and decision makers should be able to discover and explore the data by themselves to perform their jobs through self-service capabilities. Enterprises want a platform where data providers and consumers can exchange data as a commodity using a common and consistent set of metadata. Doesn’t that sound very similar to the marketplace model?

How does a marketplace make it happen?

To make data sharing an integral part of the culture, the data governance practice of an organization must associate certain measurable KPIs against it. Those KPIs can be met through incentivization schemes. So, the marketplace must have some monetization policy defined for the data, even for internal sharing. (The currency may not always be money. Reward points can also serve the purpose.)

From the technical perspective, a marketplace depends on two capabilities: a strong foundation of metadata and the capability to virtualize or materialize data. The metadata creates a data catalogue similar to the product catalogue in any typical e-commerce platform. This allows data providers to publish their data products to the platform with appropriate levels of detail (including functional and non-functional SLAs), where data consumers can discover them easily. Data virtualization or materialization capabilities also help to reduce the cost of data movement.

Data marketplace vs. e-commerce platform

A data marketplace has a few differences from an e-commerce platform. Most e-commerce platforms are either marketplace-based or inventory-based. But a hybrid approach is essential for a data marketplace. Individual organization units of the enterprise will offer their respective data products. But some data products should be owned and offered at the enterprise level. This is because some of the datasets (e.g., master or reference data) may need to be cleansed, standardized and de-duplicated from multiple sources to offer a single view of truth across the enterprise.

For example, a financial institution may have several lines of business (LoB) such as banking, wealth management, loans and deposit. A single customer may have presence in all four LoBs, causing separate footprints. When the analytics department wants to get a 360-degree view of the customer to run an integrated campaign, customer data is integrated into one place to generate a single value of truth. The marketplace can be this place of consolidation. In these cases, the marketplace may have to maintain its own inventory of data — thereby adopting a hybrid approach.

The second difference between a data marketplace and a typical e-commerce platform is the nature of the product. Unlike any typical product of e-commerce, a data product is non-rival in nature, meaning the same product can be provisioned for multiple consumers. The provisioning of data follows certain data rules as defined in the policy of the concerned dataset. So, data as a service would involve the hidden complexities of creating dynamic subsets (on-the-fly or cached) that are transparent to the consumer.

The third difference is the desired marketplace experience. The consumers of a data marketplace would like to explore the available datasets before procurement. This exploration is much deeper than a “preview” of the product that is typically available on e-commerce platforms. This means the marketplace should integrate with some development environment (such as Jupyter notebook) for better data exploration.

The fourth and final difference, which might be available only in a matured data fabric, is the capability to aggregate the data. Data marketplace consumers should be able to make intuitive queries that can be resolved through a synthesis of multiple data sources. This requires a highly illustrated business and technical metadata which form a knowledge graph to resolve such intuitive or semantic queries.

At present there is no single product in the market that provides all the typical e-commerce platform features and fulfills all of these requirements. But there are players who provide subsets of the features. For example, most market players have improved their capabilities in data cataloging, and there is increased interest on the client side to properly define their enterprise data sets to ease classification, discovery, collaboration, quality management and more. We have seen data cataloging interest grow from 53% to 66% within a year.

The Watson Knowledge Catalog, available within the Cloud Pak for Data suite, is one of the most powerful products in the cataloging space. On the other hand, Snowflake’s Data Marketplace and Exchange and Google’s Dataplex are ahead of the curve in providing access to external data in a pure marketplace model. The data marketplace of today would likely be a combination of many products.

Where can you start?

Data marketplace is likely to go through various maturity cycles within each organization. It can begin as a catalog of data products available from multiple data sources. Then the marketplace owner can create a few foundational capabilities. For example, clients would need business and technical metadata to define, describe, classify and categorize the data products. The users of the catalog would also be able to associate governance policies and rules to control access to the data for the intended recipients, which could be reused when appropriate data provisioning workflow is in place. At a later stage, marketplace features can be added to the catalog to publish internal and external data for the consumer to provision through a self-service channel. Once that is done, the marketplace can further mature to become a full-scale platform that facilitates data exploration, contract negotiation, governance and monitoring.

To learn more visit IBM Consulting.