Data Fabric for the Hybrid Multi Cloud

By | 3 minute read | December 16, 2021

Enterprise data exist both on-premises as well as in private and public cloud infrastructures. In this complex landscape, effective management and governance is difficult. Oftentimes, it requires manual processes to classify data, understand quality and remediate data sovereignty issues.

An Enterprise Data Lake was the go-to solution for such challenges. We observed the benefits of faster and easier access to governed data through our Enterprise Data Lake and expected these benefits to compound over time. However, with increasingly complex regulatory requirements and the necessity for self-service, our centralized data lake was becoming insufficient and infeasible.

IBM’s Global Chief Data Office (GCDO) focuses on delivering trusted, enterprise-wide data and services to IBM. We have experienced the many challenges of this mission first-hand, and in response we have embraced an enterprise-wide Data Fabric strategy.

Data Fabric; the new paradigm

“Data Fabric is an emerging architectural paradigm that enables organizations to centrally monitor, manage, orchestrate, and govern data regardless of where they reside across multiple clouds, on prem databases, data lakes, data warehouses or at the edge. It supports provisioning quality data in the right form, at the right time, to the right consumer for trusted insights and accelerates discovering, cataloging, integrating and sharing data across the hybrid multicloud.”

– Ranjan Sinha, CTO for IBM GCDO

The following capabilities form the core of a Data Fabric architecture:

  1. Orchestration layer that integrates data/analytics tools and sits in the data path.
  2. Business and technical metadata asset catalog capable of representing federated data from 3rd party catalogs, as well as governance policies/rules.
  3. Knowledge Graph that holds contextual knowledge of relationships between and among assets.

A few key use cases for a Data Fabric include:

  1. Discovery, search and access to relevant data based on user needs
  2. Delivery of high-quality data in the right form and in a timely manner
  3. Hybrid, multi-cloud data integration
  4. Consistent Data & AI governance
  5. 360-degree view of data

Our Data Fabric Strategy

IBM’s GCDO embraces the vision of Data Fabric and leverages Cloud Pak for Data heavily. Several pieces of our Data Fabric solution have been developed and are in production. Yet, there is much more to do.

Key capabilities that have been developed and deployed include:

  1. Collect, catalog, analyze, and understand enterprise data where it resides and take measures of data related KPIs including quality, compliance with enterprise data standards, and how it flows between systems. Watson Knowledge Catalog enables users to self-serve data with appropriate access privileges.
  2. Modernize toward a hybrid multi-cloud data architecture with efficient and trusted data movement.
  3. Streamline the data pipelines using DataOps principles to provide quality data with lineage tracking for better insights generation.
  4. Data Discovery and semantic search using ontologies that enable knowledge workers to find data using intuitive business language.

Value to the organization

“Data Fabric enables data integration across heterogenous and distributed data landscapes at scale. At IBM, we use this data to power sales and marketing insights, providing meaningful benefit to our company.”

-Brian Donohue, VP Master Data for IBM GCDO

The GCDO Client360 and MasterData360 solutions rely on Data Fabric approaches and tooling to integrate data efficiently and at scale across IBM’s heterogeneous data landscape. Our integration of IBM client and product data with 3rd party competitive install information allows us to create impactful insights and recommendation for our sellers.

In one case, GCDO was asked by a sales team to develop a program supporting 50 opportunities worth more than $5M. Leveraging Data Fabric and our MD360 solution, we supported this request easily, saving many hours for marketing staff and sellers, and decreasing overall cycle time. When a typical campaign represents an investment of $275,000+ considering upfront analysis, program development, and seller enablement, efficiency gains and cycle time improvements are very meaningful.

Equally important, use of our MD360 capabilities provides competitive insights which would have been impossible for our sellers to develop on their own. MD360 has delivered targeted, accurate insights to sellers leading to $20M+ in incremental sales across three quarters of the year.

Data Fabric-enabled, seller solutions provide both efficiency and effectiveness improvements. What makes this particularly exciting at IBM is that these improvements are leveraged across dozens of markets and hundreds of sellers representing a broad array of IBM’s products and brands.

Conclusion

Data Fabric is providing real business value. Since hyper-automation of end-to-end data and AI lifecycle is the essence of Data Fabric, then the right tooling is a key to success. We leverage elements of IBM Cloud Pak for Data in combination with architectural principles and world-class data governance processes to build and expand our Data Fabric. Data Fabrics provides us an opportunity to fully leverage the value of data across our enterprise.