Controlling tracked DAGs

By default, all Airflow DAGs are synced. You can optionally filter out DAGs by providing an explicit list of DAG IDs to monitor. In the wizard for Apache Airflow integration configuration, you can provide a comma-separated list of DAGs to sync.

If you do not want to track specific DAGs, operators, or functions, you can exclude them from automatic tracking by using the following function:

  • dont_track(dag)
  • dont_track(operator)

Alternatively, you can use @dont_track decorator that is shown in the following example:

from dbnd import dont_track

@dont_track
def f():
  pass

Tracking specific DAGs

If you don't want to use automatic tracking, install dbnd-airflow package instead of dbnd-airflow-auto-tracking. For specific DAGs that you want to track, add track_dag function to your DAG definition.

from dbnd_airflow import track_dag

track_dag(dag)

[airflow_tracking] Configuration section parameter reference

spark_submit_dbnd_java_agent

Sets the DBND Java agent `.jar` file to track a Java application that is on the local system.

databricks_dbnd_java_agent

Sets the DBND Java agent `.jar` file to track a Java application that is on the remote system.

track_airflow_execute_result

Enables saving the results of tracked Airflow operators.

track_xcom_values

Logs the values of xcom variables from Airflow.

max_xcom_length

Sets the number of xcom values to track per operator.

af_with_monitor

Activates when the Airflow monitor is not in use.

sql_reporting

Enables reporting targets from SQL queries.

Databand monitor DAG memory guard

When a DAG monitor is running, the memory guard automatically limits the amount of memory the monitor can consume. The default value is 8 GB. If the monitor consumes more memory, it stops.

To limit the number of bytes that the monitor can consume, add the guard_memory parameter to the get_monitor_dag function. Set it to the maximum number of bytes the monitor can consume. For example, the following parameter limits memory consumption to 5 GB:

from airflow_monitor.monitor_as_dag import get_monitor_dag

dag = get_monitor_dag(guard_memory=5 * 1024 * 1024 * 1024)

The main source of memory consumption by Databand Monitor DAG is Airflow DAGBag with the "in-memory" representation of all DAGs. A DAGBag is a collection of DAGs that are loaded in memory by running the user code with DAGs definition (Airflow DAGBag is the official way of loading DAG information). Because the Airflow database in old Airflow versions doesn't have the full context of the DAG (DAG structure for example), Databand will load DAGs from disk into DAGBag and sync the DAG structure. Although Airflow DAGbag parses all DAGs in the DAGs folder, currently Databand sends only relevant DAGs to the server (in your case the DAGs that are defined by the filter).