Dataset logging

By logging your datasets with Databand, you can capture metadata about the operations your code or pipeline is performing. As a result, you get key insights into your data and the success or failure of your dataset operations.

Data in motion

With Databand, you can monitor your data in motion through minor changes to your existing code or by setting up monitors for your ETL tool. In most cases, only a few extra lines of code are required to integrate our SDK. Logged dataset operations are accessible through the Data Interactions tab of the Run overview page for a pipeline. Additionally, you can use the Datasets page to view historical operations for each of your datasets.

Logged metadata

The metadata captured as part of logged datasets can include:

Dataset path

The URI that is associated with the logged dataset.

Operation type

Whether the operation was of a read or write type.

Schema

The column names and data types of your dataset.

Operation volume

The number of records that was read or written as part of the operation.

Data preview

Rows that are taken from the head of the dataset to provide sample data.

Column statistics

Aggregated metrics for each column in the logged dataset.

Instructions and examples

The dataset path, operation type, schema, and operation volume are captured in most cases of dataset logging. Advanced metadata (for example, column statistics) is not available for all integrations. Review the following pages for information about what metadata is collected for each integration type:

ETL inputs and outputs

Dataframe objects

Cursor functions in Python

Data at rest