Data Fabric and the search for the single source of truth
About once a year, usually around this time and for unknown reasons, I find myself watching the movie National Treasure. I guess there’s just something about watching Nicolas Cage connecting the dots to find hidden treasure that I can’t pass up when I see it on TV. I know I’m not alone in this; whether it’s Nic stealing the Declaration of Independence or Indiana Jones seeking out the holy grail – putting together clues to complete a quest is a time-honored tale. For decades, businesses and Chief Data Officers — specifically once the position was created —have been seeking their own holy grail. Namely, they want a single source of the truth for data – one that’s easily accessible, responsibly governed, works with current systems, integrates across a disparate data estate, and isn’t too costly. As they learn more about the data fabric, it appears to be the perfect protagonist. And just as unsurprisingly, connecting data plays a crucial role in obtaining the treasure being sought.
The story so far
During Think 2021, IBM announced the launch of a modular and composable data fabric that enables a dynamic and intelligent data orchestration across a distributed landscape, creating a network of instantly available information for data consumers. Its self-service consumption capabilities allow users to have a complete view of their data, connecting them as one no matter where the data resides or how siloed they had previously been. This optimized data access means businesses can immediately reduce the amount of data duplication and migration processes required. Moreover, the data can be queried where it resides. As a result, businesses can speed results, and access fresher data. The addition of Watson Query (formerly AutoSQL) provides access to databases, data warehouses, data lakes, and streaming data, and the ability to have queries executed without additional manual changes or data movement. And because all of the data is visible and accessible from a single point, automated data cataloging and the enforcement of data governance policies is much easier. Businesses no longer needed to apply these crucial components across myriad individual silos.
Are you ready for the data fabric sequel?
Today, just over six months from that initial announcement, IBM is unveiling new data fabric capabilities that further connect data and make it readily available for use even in the most stringent regulatory environments. Foremost is the inclusion of distributed data processing. Clients can now execute cloud runtimes remotely using IBM Cloud Satellite, which means workloads can be executed wherever the data resides. Because of this ability to execute runtimes in place, data movement needs are further reduced helping to save up to 47% by minimizing data egress costs, eliminate the need to use different tools on different workloads, and maintain data sovereignty by allowing data to remain in the geographic area it was created. 195% performance improvements when co-locating the workload with the data as a result.
Advanced Data Privacy features are also being introduced into the data fabric. Through this capability, in addition to dynamic masking of structured data, masking of unstructured data can now be automatically applied in a consistent manner, as opposed to the typical manual process. Static masked structured or unstructured data copies can be sent clients’ desired target data sources. This capability is particularly important for facilitating anonymized training data and creation of data test sets. In other words, it provides one more way in which the data fabric allows businesses to take full advantage of their data while respecting their customers’ privacy and local regulations.
Time for your close-up
While we believe in the value of a robust data infrastructure for every business, we also believe that each organization has unique challenges that differentiate their implementation from anyone else’s. While you’re considering how a data fabric can help you obtain the data-driven utopia you’ve been pursuing, let us help by sharing our expertise. You can schedule a call with one of our experts for free or learn more about the data fabric at your own pace with this helpful smart paper.
 Runtimes available on IBM Cloud and AWS, with Azure and GCP coming in 2022.
 Based on internal testing
 Based on internal testing