Creating alerts
You can create alerts directly on the Data Observability page. Because alerts are tied to specific assets within their respective projects or spaces, you don't need to create a separate project.
Required permissions
For more information about permissions, see Data Observability roles and permissions.
Creating alerts - selecting assets and conditions
The process of creating an alert is the same for all alert types. However, the steps vary depending on the specific condition that triggers the alert. Note that alerts are only available for jobs that have a completed run.
Selecting assets
You can create alerts for both job runs and any metrics that are created by DataStage.
Selecting workspaces and jobs
To create an alert for a job, follow these steps:
- Select a workspace that contains the jobs you want to create an alert for. You can select multiple workspaces.
- From the selected workspace, choose the job for which you want to create an alert. You can select multiple jobs.
Selecting metrics
If you want to create an alert for a specific metric, you can select it after you choose a job of which the metric was part.
Defining a condition that triggers an alert
The condition is directly linked to the selected type of alert.
State alerts
State alert is triggered when a job state changes to the state defined in the alert condition. You can select one of the following states: Running, Success, Failed, or Canceled.
An example use case If you're monitoring a critical ETL pipeline, you might want to get an alert the moment it fails so you can jump in and troubleshoot before it affects downstream processes.
Threshold-type alerts
Threshold-type alerts are triggered based on static values or dynamic ranges. For this type of alerts, you can create duration and metric alerts. The difference between duration and metric alerts is the metric that you track. For duration alerts, you observe how long it took to complete a job run. For metric alerts, you can observe any other metric on an operation, such as:
Rows writtenRows readAverage throughputElapsed time
When to use threshold-type alerts based on static values Use fixed value alerts when you want to trigger an alert based on a specific, static value. This type of alert is useful in situations where you need to monitor a metric that must not exceed or fall below a certain threshold.
Example use case Monitoring system uptime: Set a fixed value alert to trigger when a system is down for more than 5 minutes.
When to use threshold-type alerts based on dynamic ranges Use threshold-type alerts with percent deviation to monitor changes in a metric's value relative to its baseline or expected value. This type of alert is useful when you want to detect anomalies or unusual patterns in your data.
Example use cases
- Detecting sudden spikes or drops in a metric's value
- Identifying trends or shifts in a metric's behavior
- Monitoring changes in a metric's value over time
How to define the percent deviation for the selected metric
- Set a baseline ratio value: Establish a standard value for your metric.
- Specify the acceptable percentage variation: Define the percentage range within which the actual value can deviate from the baseline.
- Calculate the acceptable range: The acceptable range is the baseline value ± the percentage deviation. Values within this range will not trigger an alert.
Example
- Baseline ratio: 6 (for example, job run duration in minutes)
- Percent deviation: 30%
- Acceptable range: 4.2 - 7.8 (6 ± 1.8, where 1.8 is 30% of 6)
- Alerts will be triggered for values outside this range (less than 4.2 or greater than 7.8)
Note that the units of the baseline ratio match the units of the selected metric. For instance, if the metric is duration in minutes, the baseline ratio will also be in minutes.