Collected metadata
You can use various methods to track metadata.
You can use SDK configuration, dbnd_config Airflow
connection, or Databand UI to configure tracking metadata. The
following table lists pipeline and dataset metadata that is
collected by Databand and explains how users can control what is
collected.
| Metadata | Default |
dbnd config |
Airflow integration via UI |
|---|---|---|---|
| Source code | disabled |
[tracking] track_source_code = true
|
Select or clear the Include source code box in the Airflow integration wizard |
| Logs | disabled |
[log] preview_head_bytes = 8192 preview_tail_bytes = 8192
|
Select or clear the Collect logs box in the Airflow integration wizard, provide number of KB from head and tail if logs were enabled (maximum 8096 KB for each) |
| Errors | enabled | ask the Databand team to switch it on or off | ask the Databand team to switch it on or off |
| Airflow XCOM values | disabled |
[airflow_tracking] track_xcom_values = true
|
not supported in the UI; can be done through dbnd_config:
"airflow_tracking": { "track_xcom_values": true } |
| return value of Airflow Python task | disabled |
[airflow_tracking] track_airflow_execute_result = true
|
not supported in the UI; can be done through dbnd_config :
"airflow_tracking": { "track_airflow_execute_result": true } |
Note: You can track data operations metadata manually by using
log_metric and
log_dataset_op.Example of enabling code and logs tracking
You can easily enable code and logs tracking by Databand Service by providing the following config:
[tracking]
track_source_code=True
[log]
preview_head_bytes=15360
preview_tail_bytes=15360
For Airflow tracker, you can edit the Airflow tracking configuration. For more information, see Editing an Airflow integration.