[

Data engineers often get inundated by alerts from data issues. 

The last thing an engineer wants to is to be woken up at night for a minor issue, or worse, miss a critical one that requires immediate attention.

IBM® Databand® helps fix this problem by breaking through noisy alerts with focused alerting and routing when data pipeline and quality issues occur. 

This blog will walk you through how you can set up alert notifications for data pipelines and datasets so that the team can detect and quickly resolve incidents like data SLAs, column changes, null records and much more.

Data pipeline alerts

Data pipeline alerts include errors such as failed runs, longer than expected durations, or missing data operations.

Watch the video to see it in action, or continue reading for more information.

Create a pipeline alert

To create a data pipeline alert, select the “Create Alert” button and then select pipeline.

Now, all you have to do is select the pipeline to which you want to assign an alert and pick the type of alert. We’ll walk through each of the alerts below.

Run state alert

A run state alert notifies you when a pipeline hits a certain status like running, success, failed, shutdown and canceled.

You select the severity value corresponding to the pipeline’s status and criticality.

In the “Recent Values”, you can see all the recent instances when this pipeline has produced different values. This helps you understand how often you or your team might get this alert if it’s activated.

Run duration alert

Next is a run duration alert based on a metric value measured in seconds.

For example, if you’re expecting the pipeline to complete within a certain time frame, you can set an alert to trigger if it’s outside that time window.

Anomaly detection might be the coolest run duration metric because you might not know how long this pipeline executes. 

By selecting anomaly detection, Databand creates a baseline of run durations and tells you when the run deviates from what’s expected.

You can adjust the sensitivity levels so Databand knows how sensitive the alert trigger should be when it’s live in production. Using the lookback range gives you more granularity by specifying how many runs back you want Databand should look at to build the anomaly detection.

Missing data operations alerts

Missing data operations alerts tell you when a dependent operation to your pipeline doesn’t work. 

In this case, four datasets relate to this pipeline, meaning that the tasks in this pipeline depend on reading and writing from these datasets. The alert will show you which operations didn’t read or write.

Schema change alerts

Schema change alerts notify you about any changes in the schema (column type change, new or removed columns, and more).

Dataset alerts

Starting off, you have two options for dataset alerts. 

  • Data delay: Alerts that let you know if the data arrived on time and as expected. 
  • Data quality check: Alerts to check the quality of each dataset.

We’re going to go through both options so you can see which ones you would likely use with Databand.

Watch the video to see it in action or continue reading for detailed information.

Data delay alerts

Let’s look at data delay alerts first. 

This type of SLA alert you trigger when a dataset isn’t updated. 

For example, if you expect this dataset to be updated each day at 3 PM Eastern, you can create an alert that will tell you if the dataset wasn’t updated. 

And you can apply this alert to one or multiple datasets.

You can further customize the alert by isolating it to a certain pipeline. If you want to receive all alerts regardless of the pipeline, you can just leave this field blank. Select your alert severity to complete the setup.

Run quality alerts

In this setup, you’re setting up alerts for the quality of the data columns within the dataset. You first define where Databand checks the data while it is being processed. 

Then apply the validations to the dataset columns. For example, a popular validation would be to select one or multiple columns that you want Databand to check for null percentages or counts. 

This way, you’ll know immediately if zero records are about to be sent downstream to a data consumer. 

Connect to a receiver

The last page of both alerts is to hook them up to a receiver like Slack, PagerDuty, or email. This helps data engineering focus only on the alerts that pertain to them.

In this example, we have details about the things data engineers care about, such as:

  • Missing operations
  • Schema changes
  • Alert trigger time
  • Run name
  • Source
  • Affected datasets

Implementing data SLA alerting

See how IBM Databand provides data pipeline monitoring to quickly detect data incidents like failed jobs and runs so you can handle pipeline growth. If you’re ready to take a deeper look, book a demo today.

Was this article helpful?
YesNo

More from Databand

What is ELT (Extract, Load, Transform)? A Beginner’s Guide

4 min read - ELT is a data processing method that involves extracting data from its source, loading it into a database or data warehouse, and then later transforming it into a format that suits business needs. This transformation could involve cleaning, aggregating, or summarizing the data. ELT is commonly used in big data projects and real-time processing where speed and scalability are critical. In the past, data was often stored in a single location, such as a database or a data warehouse. However,…

How Databand Achieves Automated Data Lineage

3 min read - Data lineage seems to be a hot topic for data platform teams. In this blog, we’re going to walk you through how IBM® Databand® provides automated data lineage so you can easily diagnose pipeline failures and analyze downstream impacts. Watch the video to see Databand in action or continue reading for detailed information. Analyze alerts Utilizing automated data lineage typically begins with an alert. You can jump right into a lineage graph, but it’s important to first know why the…

Airflow’s best kept secrets: How to track metadata with Airflow Cluster Policies & Task Callbacks

6 min read - The time that it takes to detect problems within your Airflow tasks is a major problem. There’s one big reason for that. You didn’t write every DAG. It’s hard to find a scalable way to make your platform maintainable when there are hundreds of pipelines written in logic you might not understand without proper documentation. If you don’t understand how the pipelines function, then you cannot find a way to track & alert on important data quality and pipeline issues. That’s a huge problem.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters