Dataset logging
By logging your datasets with Databand, you can capture metadata about the operations your code or pipeline is performing. As a result, you get key insights into your data and the success or failure of your dataset operations.
Data in motion
With Databand, you can monitor your data in motion through minor changes to your existing code or by setting up monitors for your ETL tool. In most cases, only a few extra lines of code are required to integrate our SDK. Logged dataset operations are accessible through the Data Interactions tab of the Run overview page for a pipeline. Additionally, you can use the Datasets page to view historical operations for each of your datasets.
Logged metadata
The metadata captured as part of logged datasets can include:
- Dataset path
- The URI associated with the logged dataset.
- Operation type
- Whether the operation was a read or write type.
- Schema
- The column names and data types of your dataset.
- Operation volume
- The number of records that were read or written as part of the operation.
- Data preview
- Rows taken from the head of the dataset to provide sample data.
- Column statistics
- Aggregated metrics for each column in the logged dataset.
Instructions and examples
The dataset path, operation type, schema, and operation volume are captured in most cases of dataset logging. Advanced metadata, such as column statistics, is not available for all integrations. See the following pages for information about what metadata is collected for each integration type:
- ETL inputs and outputs
- Dataframe objects
- Cursor functions in Python
- Data at rest