Don’t let your data pipeline slow to a trickle of low-quality data

Adopt a proactive approach to data reliability with

By | 3 minute read | July 6, 2022

Businesses of all sizes, in all industries are facing a data quality problem. 73% of business executives are unhappy with data quality and 61% of organizations are unable to harness data to create a sustained competitive advantage1. With the average cost of bad data reaching $15M, ignoring the problem is a significant pitfall. 

 To help companies avoid that pitfall, IBM has recently announced the acquisition of, a leading provider of data observability solutions. Data observability takes traditional data operations to the next level by using historical trends to compute statistics about data workloads and data pipelines directly at the source, determining if they are working, and pinpointing where any problems may exist. 

 The data observability difference 

With traditional approaches, data issues are reported by data users as they try to access and use the data and may take weeks to fix, if they’re found at all. Instead, starts at the data source, collecting data pipeline metadata across key solutions in the modern data stack like Airflow, dbt, Databricks and many more. It then builds historical baselines on data pipeline behavior so it can detect and alert on anomalies while the data pipelines run. Automated resolution of anomalies through workflows is then enacted to trigger changes without impacting delivery SLAs. 

 Catching data quality problems at the source helps enable the delivery of more reliable data. Mean time to discovery (MTTD) is improved as issues are detected in real time and pipelines execute instead of reacting afterward. Moreover, mean time to repair (MTTR) is also improved as contextual metadata helps data engineers focus on the source of the problem, rather than debugging where the problem stems from. In this way, monitoring both static and in motion pipelines while delivering high quality metadata enables a faster time to value than would otherwise be possible. 

 Data observability as part of a data fabric will be available to IBM clients through our data fabric architecture. While the data observability capability may be utilized independently, we recommend leveraging a more complete data fabric architecture in conjunction with it to help automate the data lifecycle. In addition to data observability, IBM clients can take advantage of use cases such as multicloud data integration, data governance and privacy, customer 360, and MLOps and trustworthy AI. Data observability will also integrate with these other use cases for improved results where both are applied. For example, multicloud data integration benefits from the ability to resolve data anomalies in real time so that the delivery of reliable data isn’t interrupted, no matter where it resides. Data governance & privacy is helped by having more reliable data with constant management instead of a snapshot approach. Customer 360 is improved when fewer data quality issues make their way to applications and skew customer views. And MLOps and trustworthy AI benefit from the more holistic look at the lifecycle from data to implementation of AI. The net result is a virtuous cycle where each data fabric use case strengthens the other. 

 The acquisition of expands IBM’s end-to-end enterprise observability capability. will be a core component of observability use case alongside IBM Observability by Instana APM and Watson® Studio on IBM Cloud Pak® for Data. Instana delivers end-to-end observability across applications, data and machine learning; provides data observability for dynamic and static pipelines; and Watson Studio provides model observability for reliable, trusted AI across its lifecycle. In this way, they collectively deliver an end-to-end enterprise observability and reliability solution. 

Looking toward the future 

We’re pleased to be bringing into our suite of data & AI solutions. Not only does it signify the continued evolution and improvement of our data fabric approach, but it also brings additional value to clients across the data lifecycle from end to end.  

You can learn more about the IBM’s data fabric solution by visiting our website, starting a trial, or kick-starting your project with the IBM data and AI elite team. 

 Check out the Data Differentiator to learn more about Data Fabric.