Python

Enable Python tracking in your Databand environment to gain visibility into your data operations, code errors, metrics, and logging information, within the context of your broader pipeline or orchestration system.

Enabling Python tracking

Before you enable Python tracking, make sure that the Python SDK is installed and integrated with Databand, see Installing the Python SDK and that you have the necessary credentials to connect, see Connecting to Databand service.

  1. To enable tracking, call your Python code within the dbnd_tracking() context, see the following code snippet.
  2. Optional: For better visibility, you can annotate your function with decorator @task.
from dbnd import dbnd_tracking, log_metric, task, log_dataframe
import pandas
 
@task
def calculate_counts(str_param, int_param, dataframe_input : pandas.DataFrame):
    ...
    log_dataframe("my_dataframe", dataframe_input)
    log_metric("counts", dataframe_input.count())

if __name__ == "__main__":
    with dbnd_tracking():
        pass

For certain objects that are passed to your functions, such as Pandas DataFrames and Spark DataFrames, Databand automatically collects schema information and stats. This automation makes it simpler to track data lineage and report on data quality issues.

Enabling implicit tracking

If you don't want to change your code by adding dbnd_tracking()context but you want to use log_metrics() and log_dataframe() functions, you can enable implicit tracking, see Tracking Python functions.

If you have Airflow DAGs tracking enabled, see Apache Airflow, all Airflow operator code is tracked automatically. As a result, you don’t need to enable tracking for Python code, because it is automatically enabled by airflow-auto-tracking.

Tracking metrics and datasets

When you enable Python tracking, you can also track: