Data set logging
By logging your data sets with Databand, you can capture metadata about the operations your code or pipeline is performing. As a result, you get key insights into your data and the success or failure of your data set operations.
Data in motion
With Databand, you can monitor your data in motion through minor changes to your existing code or by setting up monitors for your ETL tool. In most cases, only a few extra lines of code are required to integrate our SDK. Logged data set operations are accessible through the Data Interactions tab of the Run Overview page for a pipeline. Additionally, you can use the Data sets page to view historical operations for each of your data sets.
Logged metadata
The metadata captured as part of logged data sets can include:
- Data set path
-
The URI that is associated with the logged data set.
- Operation type
-
Whether the operation was of a read or write type.
- Schema
-
The column names and data types of your data set.
- Operation volume
-
The number of records that was read or written as part of the operation.
- Data preview
-
Rows that are taken from the head of the data set to provide sample data.
- Column statistics
-
Aggregated metrics for each column in the logged data set.
Instructions and examples
The data set path, operation type, schema, and operation volume are captured in most cases of data set logging. Advanced metadata (for example, column statistics) is not available for all integrations. Review the following pages for information about what metadata is collected for each integration type:
ETL inputs and outputs
Dataframe objects
- Python
- Java/Scala
- Spark and PySpark