Data set logging

By logging your data sets with Databand, you can capture metadata about the operations your code or pipeline is performing. As a result, you get key insights into your data and the success or failure of your data set operations.

Data in motion

With Databand, you can monitor your data in motion through minor changes to your existing code or by setting up monitors for your ETL tool. In most cases, only a few extra lines of code are required to integrate our SDK. Logged data set operations are accessible through the Data Interactions tab of the Run Overview page for a pipeline. Additionally, you can use the Data sets page to view historical operations for each of your data sets.

Logged metadata

The metadata captured as part of logged data sets can include:

Data set path

The URI that is associated with the logged data set.

Operation type

Whether the operation was of a read or write type.

Schema

The column names and data types of your data set.

Operation volume

The number of records that was read or written as part of the operation.

Data preview

Rows that are taken from the head of the data set to provide sample data.

Column statistics

Aggregated metrics for each column in the logged data set.

Instructions and examples

The data set path, operation type, schema, and operation volume are captured in most cases of data set logging. Advanced metadata (for example, column statistics) is not available for all integrations. Review the following pages for information about what metadata is collected for each integration type:

ETL inputs and outputs

Dataframe objects

Cursor functions in Python

Data at rest