Four use cases defining the new wave of data management
See which topics are most pressing and how a data fabric can help
A confluence of events in the data management and AI landscape is bearing down on companies, no matter their size, industry or geographical location. Some of these, such as the continued sprawl of data across multicloud environments have been looming for years, if not decades. Others have come into sharper focus relatively recently: a global effort to create new data privacy laws, a post-pandemic expectation by customers to know them individually across all touchpoints, and increased attention on any racial, gender-based, or socioeconomic bias in AI models.
While individual point solutions have been able to address some of these concerns in the past, it is rapidly becoming clear that a more robust solution is needed – one that can address a business’s most pressing data and AI need while providing an easy path to solve additional challenges. That solution is the data fabric.
A data fabric is an architectural approach to simplify data access in an organization to facilitate self-service data consumption. This architecture is agnostic to data environments, processes, utility and geography, all while integrating end-to-end data-management capabilities. A data fabric automates data discovery, governance and consumption, enabling enterprises to use data to maximize their value chain. With a data fabric, enterprises elevate the value of their data by providing the right data, at the right time, regardless of where it resides. We’ve identified four of the top use cases for the data fabric below along with a brief overview and links to a more in-depth eBook and trial. These use cases provide a foundation that delivers a rich and intuitive data shopping experience. This data marketplace capability will enable organizations to efficiently deliver high quality governed data products at scale across the enterprise.
Multicloud data integration
The rapid growth of data continues to proceed unabated and is now accompanied by not only the issue of siloed data but a plethora of different repositories across numerous clouds. The reasoning is simple and well justified with the exception of data silos – more data allows the opportunity to provide more accurate insights while using multiple clouds helps avoid vendor lock-in and allows data to be stored where it best fits. The challenge, of course, is the added complexity which hinders the actual use of that data for analysis and AI.
As part of a data fabric, multicloud data integration aims to ensure that the right data can be delivered to the right person at the right time. The availability of integration strategies including ETL and ELT, data replication, change data capture and data virtualization are key so that the widest possible range of data integration can be enacted. Similarly, data cataloging and governance helps establish what the “right data” is in any given situation and who the “right people” are that should have access. As for data delivery at the “right time” automated data engineering tasks, workload balancing and elastic scaling should provide the needed alacrity for all businesses.
Data governance and privacy
Data privacy laws such as the GDPR in the EU, CCPA in California and PIPEDA in Canada have been enacted at the same time businesses are revitalizing efforts to establish data quality, not just data volume. The cost of ignoring these imperatives is staggering. Poor data quality costs organizations an average of $12.9 million each year and $1.2 billion in fines have been issued due to GDPR non-compliance since Jan. 28, 2021.
The governance and privacy component of the data fabric focuses on organization and automation. As discussed in the previous section data virtualization and data cataloging help get the right data to the right people by making it easier to find the data that best fits their needs and access it. Automated metadata generation is essential in order to turn a manual process into one that is better controlled. In this way it helps avoid human error and tags data so that policy enforcement can be achieved at the point of access rather than individual repositories. Automated governance of data access and lineage alongside reporting and auditing also contribute to a company culture that understands the regulatory landscape, adheres to it, and stays on top of how each piece of data has been used. The end result is more useful data with less hassle and better compliance. We are excited to announce that in June, we will release a new capability: MANTA Automated Data Lineage for IBM Cloud Pak for Data. This capability will provide data users with visibility into origin, transformations, and destination of data as it is used to build products.
The global pandemic accelerated adoption of digital interactions with businesses out of necessity and highlighted for customers the benefits of a business that was attuned to their specific needs online, in-person, and in hybrid situations (such as curbside). As we settle back into a more normal day-to-day experience the customer expectation of convenience and personalized service remains. High-performing organizations have realized this and list improving the customer experience as their top priority over the next two to three years.
The data fabric addresses this need with a series of capabilities designed to give a more complete, 360° view of each customer. Self-service data preparation tools provide a useful first step to get the data ready for matching across data sets. Customer attributes can then be automapped for a trainable intelligent matching algorithm. Once matched, entity resolution helps ensure that identity records are of high quality and reveals relationships between entities. Data is then cataloged to apply greater information via metadata, virtualized for access no matter where it resides, and visualized to make identification of data quality and distribution easier and enable quicker transformations of data for analysis.
MLOps and Trustworthy AI
As the public becomes more aware of how AI is used within organizations, greater scrutiny is being placed upon models. Any semblance of bias – particularly as it relates to race, gender or socioeconomic status – has the potential to erase years of goodwill. Yet, even beyond public optics and moral imperatives, being able to trust AI implementations and easily explain why models arrived at certain results in better business decisions.
The data fabric helps enable MLOps and Trustworthy AI by establishing trust in data, trust in models and trust in processes. Trust in data is created with the help of many capabilities noted earlier that deliver high-quality data that’s ready for self-service consumption by those who should have access. Trust in models relies upon MLOps-automated data science tools with built in transparency and accountability at each stage of the model lifecycle. Finally, trust in processes through AI governance delivers consistent repeatable processes that assist not only with model transparency and traceability but also time-to-production and scalability.
If you are interested in any of these four use cases or the data fabric in general, reach out to your IBM representative or business partner or schedule time to speak with one of our experts. They would be happy to answer any questions you may have as you explore these topics further.
 Gartner, “How to Improve Your Data Quality,” July 14, 2021 https://www.gartner.com/smarterwithgartner/how-to-improve-your-data-quality