Observe your data with Data Observability

With IBM Data Observability, you can observe your data, set up alerts and notifications for any problems that occur, and use AI-powered issue remediation to troubleshoot the issues with your data. You can investigate issues in data quality and access.

Use Data Observability to:

  • Make sure that your job runs behave as expected.
  • Detect and resolve issues with your data before they lead to Service Level Agreement (SLA) misses.
  • Identify problems in data quality.
  • Quickly assess your observability coverage, including the number of detected issues, triggered alerts, open alerts, and their trends over the selected time frame.

With Data Observability, you can quickly identify common data issues, investigate their likely root causes, observe alerted incidents, and detect emerging problems that may not yet be covered by alerts. Additionally, you can use AI to troubleshoot the issues with your failed DataStage jobs and get recommendations on how to solve them. You can also configure custom alerts to notify your team in real time, equipping them with the insights and tools that are needed to quickly diagnose and resolve issues.

In addition, you can create and assign alert receivers to your alerts. Alert receivers are notification endpoints, such as email or Slack to which alert payloads are sent when an alert is triggered. Alert receivers are paired with alert definitions to make sure that each triggered alert goes directly to the individuals or teams who need to see them, without engaging others in your organization.

Requirements

The following requirements exist for Data Observability:

Cloud platforms
Required service
IBM watsonx.data integration
Supported tools
You can use Data Observability to observe DataStage jobs.
Data size
Data Observability works with data of any size.
Required permissions and roles
Your role and permissions determine which data observability tasks you can perform. You need a different set of roles and permissions for working with alerts and working with alert receivers.
Workspaces
You can create alerts for DataStage jobs in projects.

Sample flow: Observing your data with predefined alerts

The following graphic shows the relationship between alerts, alert receivers, and jobs. In this example, the alert is a Job run duration with the static value condition.

Fig. 1 Observing your data - a sample flow with predefined alerts.

A graphic with 9 boxes that describe the flow of an alert through the watsonx.data integration system. The boxes contain the following content: 1: Create an alert. 2: Create and assign an alert receiver. 3: Run a DataStage job. 4: The job run falls outside the defined limit. 5: The system detects the failed job and triggers an alert. 6: An alert is sent to the assigned alert receiver. 7: Acknowledge the triggered alert. 8: Go to the data source and fix the issue that caused the alert. 9: Mark the alert Resolved in the Data Observability UI.

A data observability flow might have the following tasks:

  1. Create an alert definition.
    In the context of a project, a data engineer creates an alert definition within Data Observability for the DataStage job. They decide to create a Job run duration alert definition, which is used to track and report any anomalies in the job run duration based on the set condition. In this example, an alert is triggered when the job run duration differs from the defined value.
  2. Create and assign an alert receiver.
    A data engineer creates and assigns an email alert receiver to the created alert definition to get an email notification with the details of the triggered alert.
  3. Run a DataStage job in a project.
    To create an alert, a data engineer needs to first run a DataStage job in a project.
  4. The job fails.
  5. The system detects the time anomaly and triggers an alert.
    The system detects a difference between the job's actual run time and the expected time. As a result, an alert is triggered.
  6. The alert is sent to the assigned alert receiver.
    Because the data engineer assigned an email alert receiver to the created alert definition, they receive an email with the triggered alert and its details.
  7. Acknowledge the triggered alert.
    A data engineer opens the Triggered alerts tab to display triggered alerts and their logs. The data engineer can also go to the job run details section and directly from the run details section to the DataStage project where the issue was found. To make sure that other users know that the issue is being worked on, a data engineer can mark the alert Acknowledged.
  8. Go to the data source and fix the issue that caused the alert.
    From this tab, a data engineer can go directly to the DataStage canvas with the job where the issue was found.
  9. Mark the alert Resolved.
    When the problem is solved, a data engineer can mark the alert Resolved to let others know that the issue is fixed.

Sample flow: Observing your data with alerts based on the issue detection powered by AI

The following graphic shows the relationship between detected issues, alerts, alert receivers, and jobs.

Fig. 2 Observing your data - a sample flow based on issue detection.

A graphic with 13 boxes that describe the flow of an alert through the watsonx.data integration system. The boxes contain the following content: 1: Run DataStage job. 2: AI identifies an issue with your data. 3: Investigate the issue. 4: Create an alert definition. 5: Create and assign an alert receiver. 6: An alert is sent to the assigned alert receiver. 7: Run a DataStage job. 8: The job fails. 9: The system detects the failed job and triggers an alert. 10: An alert is sent to the assigned alert receiver. 11: Acknowledge the triggered alert. 12: Go to the data source and fix the issue that caused the alert. 13: Mark the alert Resolved in the Data Observability UI.

A data observability flow might have the following tasks:

  1. Run a DataStage job.
  2. The AI identifies an issue with your data. The AI detected a problem in one of your completed DataStage jobs and provides key information about the issue, including a description of the problem, the time it occurred, the specific job affected, the frequency of the occurrence, and the datasets that were impacted.
  3. Investigate the issue. The data engineer initiates an AI-powered analysis to pinpoint the likely root cause of the problem. After reviewing the logs to inform their decision, they determine the best course of action. The engineer decides to set up an alert and configures a Slack alert receiver to be notified on a designated Slack channel when the issue happens again.
  4. Create an alert definition.
    A data engineer creates an alert definition within Data Observability for the DataStage job to assess the scale of the problem. Because the alert definition is based on the issue that is detected by the system, the condition for the alert is automatically set to trigger the alert when the job fails.
  5. Create and assign alert receiver. The data engineer creates and assigns a Slack alert receiver to get notified when an alert triggers in the Slack channel.
  6. The alert is sent to the assigned alert receiver.
  7. Run a DataStage job in a project.
    A data engineer runs the DataStage job in the project.
  8. The job fails.
    The system detects the failed job and triggers an alert.
  9. The alert is sent to the assigned alert receiver.
    Because the data engineer assigned a Slack alert receiver to the created alert definition, they receive a notification about the triggered alert in the Slack channel.
  10. Acknowledge the triggered alert.
    A data engineer opens the Triggered alerts tab to display triggered alerts and their logs. The data engineer can also go to the job run details section and directly from the run details section to the DataStage project where the issue was found. To make sure that other users know that the issue is being worked on, the data engineer can mark the alert Acknowledged.
  11. Go to the data source and fix the issue that caused the alert.
    From this tab, a data engineer can go directly to the DataStage canvas with the job where the issue was found.
  12. Mark the alert Resolved.
    When the problem is solved, a data engineer can mark the alert Resolved to let others know that the issue is fixed.

Learn more