Data lineage seems to be a hot topic for data platform teams. In this blog, we’re going to walk you through how IBM® Databand® provides automated data lineage so you can easily diagnose pipeline failures and analyze downstream impacts.

Watch the video to see Databand in action or continue reading for detailed information.

Analyze alerts

Utilizing automated data lineage typically begins with an alert. You can jump right into a lineage graph, but it’s important to first know why the graph is relevant.

For example, on the Databand alert screen, you can see all data incidents and their alerts in a consolidated view.

This particular alert shows a critical alert fired in our “daily_sales_ingestion” pipeline, a vital business pipeline that processes our daily sales from SAP, does some transformations for different regions, and then sends it over into a BI layer.

Needless to say, this pipeline is critical for our business since it processes sales from around the world and eventually shows the results to the business.

To diagnose the alert, select ‘View Details’, and now you are on an alert overview screen.

Understand impacted datasets

Before seeing the lineage graph, you can see the impact analysis across your affected datasets, pipelines and operations.

View data lineage

Once you’ve seen what has been impacted, you can now visualize these impacts by selecting the data lineage tab. This graph shows all the dependent relationships between the initial pipeline that failed and any other dependencies that are impacted.

For example, we’re looking at tasks that are writing to a particular dataset and that same dataset being read by a subsequent task. All the red text in each pipeline represents anything that was impacted by the initial failed task.

Let’s zoom to the specific pipeline that failed. Here you can see the specific task named “extract_regional_sales_to_S3” failed the pipeline.

By selecting the failed task, you can see which specific downstream datasets or tasks are impacted with a highlighted red box.

Each time you select a different task, the graph will change which boxes display.

For example, if you select the dataset named “S3 – North America Daily SAP Sales Extract” a lot of red text still remains, but the red boxes have changed.

This indicates that the “S3 – North America Daily SAP Sales Extract” dataset only impacts the highlighted red boxes downstream.

You’ll notice that this dataset had no dependencies on a downstream pipeline in the EU or Asia, but does have dependencies in the North America pipeline labeled “na_sentiment_impact_analysis” and the “serve_sales_results_to_bi” pipeline that serves our BI layer.

Quicky debug data incident

And to make debugging easier, you can jump directly to a task from the data lineage graph. Now you can see the error that caused the pipeline to fail.

This allows you to quickly debug errors and resolve them before any downstream impacts occur.

Achieving automated data lineage

See how Databand helps break communication silos and get the whole data story with end-to-end data lineage. If you’re ready to take a deeper look, book a demo today.

Was this article helpful?
YesNo

More from Databand

IBM Databand achieves Snowflake Ready Technology Validation 

< 1 min read - Today we’re excited to announce that IBM Databand® has been approved by Snowflake (link resides outside ibm.com), the Data Cloud company, as a Snowflake Ready Technology Validation partner. This recognition confirms that the company’s Snowflake integrations adhere to the platform’s best practices around performance, reliability and security.  “This is a huge step forward in our Snowflake partnership,” said David Blanch, Head of Product for IBM Databand. “Our customers constantly ask for data observability across their data architecture, from data orchestration…

Introducing Data Observability for Azure Data Factory (ADF)

< 1 min read - In this IBM Databand product update, we’re excited to announce our new support data observability for Azure Data Factory (ADF). Customers using ADF as their data pipeline orchestration and data transformation tool can now leverage Databand’s observability and incident management capabilities to ensure the reliability and quality of their data. Why use Databand with ADF? End-to-end pipeline monitoring: collect metadata, metrics, and logs from all dependent systems. Trend analysis: build historical trends to proactively detect anomalies and alert on potential…

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

4 min read - What are DataOps tools? DataOps, short for data operations, is an emerging discipline that focuses on improving the collaboration, integration and automation of data processes across an organization. DataOps tools are software solutions designed to simplify and streamline the various aspects of data management and analytics, such as data ingestion, data transformation, data quality management, data cataloging and data orchestration. These tools help organizations implement DataOps practices by providing a unified platform for data teams to collaborate, share and manage…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters