Data engineers often get inundated by alerts from data issues.
The last thing an engineer wants to is to be woken up at night for a minor issue, or worse, miss a critical one that requires immediate attention.
IBM® Databand® helps fix this problem by breaking through noisy alerts with focused alerting and routing when data pipeline and quality issues occur.
This blog will walk you through how you can set up alert notifications for data pipelines and datasets so that the team can detect and quickly resolve incidents like data SLAs, column changes, null records and much more.
Data pipeline alerts
Data pipeline alerts include errors such as failed runs, longer than expected durations, or missing data operations.
Watch the video to see it in action, or continue reading for more information.
Create a pipeline alert
To create a data pipeline alert, select the “Create Alert” button and then select pipeline.
Now, all you have to do is select the pipeline to which you want to assign an alert and pick the type of alert. We’ll walk through each of the alerts below.
Run state alert
A run state alert notifies you when a pipeline hits a certain status like running, success, failed, shutdown and canceled.
You select the severity value corresponding to the pipeline’s status and criticality.
In the “Recent Values”, you can see all the recent instances when this pipeline has produced different values. This helps you understand how often you or your team might get this alert if it’s activated.
Run duration alert
Next is a run duration alert based on a metric value measured in seconds.
For example, if you’re expecting the pipeline to complete within a certain time frame, you can set an alert to trigger if it’s outside that time window.
Anomaly detection might be the coolest run duration metric because you might not know how long this pipeline executes.
By selecting anomaly detection, Databand creates a baseline of run durations and tells you when the run deviates from what’s expected.
You can adjust the sensitivity levels so Databand knows how sensitive the alert trigger should be when it’s live in production. Using the lookback range gives you more granularity by specifying how many runs back you want Databand should look at to build the anomaly detection.
Missing data operations alerts
Missing data operations alerts tell you when a dependent operation to your pipeline doesn’t work.
In this case, four datasets relate to this pipeline, meaning that the tasks in this pipeline depend on reading and writing from these datasets. The alert will show you which operations didn’t read or write.
Schema change alerts
Schema change alerts notify you about any changes in the schema (column type change, new or removed columns, and more).
Dataset alerts
Starting off, you have two options for dataset alerts.
Data delay: Alerts that let you know if the data arrived on time and as expected.
Data quality check: Alerts to check the quality of each dataset.
We’re going to go through both options so you can see which ones you would likely use with Databand.
Watch the video to see it in action or continue reading for detailed information.
Data delay alerts
Let’s look at data delay alerts first.
This type of SLA alert you trigger when a dataset isn’t updated.
For example, if you expect this dataset to be updated each day at 3 PM Eastern, you can create an alert that will tell you if the dataset wasn’t updated.
And you can apply this alert to one or multiple datasets.
You can further customize the alert by isolating it to a certain pipeline. If you want to receive all alerts regardless of the pipeline, you can just leave this field blank. Select your alert severity to complete the setup.
Run quality alerts
In this setup, you’re setting up alerts for the quality of the data columns within the dataset. You first define where Databand checks the data while it is being processed.
Then apply the validations to the dataset columns. For example, a popular validation would be to select one or multiple columns that you want Databand to check for null percentages or counts.
This way, you’ll know immediately if zero records are about to be sent downstream to a data consumer.
Connect to a receiver
The last page of both alerts is to hook them up to a receiver like Slack, PagerDuty, or email. This helps data engineering focus only on the alerts that pertain to them.
In this example, we have details about the things data engineers care about, such as:
Missing operations
Schema changes
Alert trigger time
Run name
Source
Affected datasets
Implementing data SLA alerting
See how IBM Databand provides data pipeline monitoring to quickly detect data incidents like failed jobs and runs so you can handle pipeline growth. If you’re ready to take a deeper look, book a demo today.