Collected metadata

You can use various methods to track metadata.

You can use SDK configuration, dbnd_config Airflow connection, or Databand UI to configure tracking metadata. The following table lists pipeline and dataset metadata that is collected by Databand and explains how users can control what is collected.

Table 1. Metadata that Databand collects, and how the user can control it.
Metadata Default dbnd config Airflow integration via UI
Source code disabled [tracking] track_source_code = true Select or clear the Include source code box in the Airflow integration wizard
Logs disabled [log] preview_head_bytes = 8192 preview_tail_bytes = 8192 Select or clear the Collect logs box in the Airflow integration wizard, provide number of KB from head and tail if logs were enabled (maximum 8096 KB for each)
Errors enabled ask the Databand team to switch it on or off ask the Databand team to switch it on or off
Airflow XCOM values disabled [airflow_tracking] track_xcom_values = true not supported in the UI; can be done through dbnd_config: "airflow_tracking": { "track_xcom_values": true }
return value of Airflow Python task disabled [airflow_tracking] track_airflow_execute_result = true not supported in the UI; can be done through dbnd_config : "airflow_tracking": { "track_airflow_execute_result": true }
Note: You can track data operations metadata manually by using log_metric and log_dataset_op.

Example of enabling code and logs tracking

You can easily enable code and logs tracking by Databand Service by providing the following config:

[tracking]
track_source_code=True

[log]
preview_head_bytes=15360
preview_tail_bytes=15360

For Airflow tracker, you can edit the Airflow tracking configuration. For more information, see Editing an Airflow integration.