Data sets

The page displays all data sets that Databand is monitoring. Data sets play a significant role because they let the Databand system evaluate the quality of the data at various points throughout the data lifecycle. Because the system also evaluates intermediate data sets, problems can potentially be detected earlier - before your stakeholders receive bad data.

Filters

The data sets are grouped into Active and Archived ones, and they can be filtered by using Add filters. The following filters are available:

Data set type

A list with data set types.

Reported from

A list of integrations for which you want to display a data set.

Databases

A list of databases for which you want to display a data set.

You can also search for data sets by using the search box.

Data sets list

The list provides you with the name of data sets and the total number of records of the data sets. You can also check the data set path and the time of the last modification and operation that were performed on the data set.

Data set details

Click the name of the data set to display more details such as the type of the data set, its path, when it was synced for the first and last time, and the number of rows in it.

Additionally, the page displays the following tabs:

Overview

The tab displays the trends for both:

  • Daily rows (both written and read)
  • Daily data operations

The graphs present the trend over a specified time frame. As a result, you can check the transaction sizes as the preliminary data is ingested and transformed, instead of looking at the data when it has reached the final destination. In this way, you can spot and resolve problems during the data set lifecycle and not only when the lifecycle is completed.

To display more detailed information for a specific date, hover your cursor over the points on the graph.

In the Overview tab, you can also check the Issue summary table. It displays the problems that were discovered for a selected data set, if any.

The data on this page can be filtered by using the Seen during field that has options for predefined and custom periods.

History

The tab shows the historical data operations and can be filtered by a period, operation type, and issue type. For more information, go to Data sets history.

Operations

In the tab, you can display column statistics on a per-operation basis. To do so:

  1. Select the specific operation that you want to review from the Operation dropdown.
  2. Click a column name in the Column name column of the Operations schema table. The system will display the historical data quality metrics for the selected column over the last 15 runs.

Column statistics are only available when you log dataframes by using Databand's Python or Java SDK.