Controlling tracked DAGs
By default, all Airflow DAGs are synced. You can optionally filter out DAGs by providing an explicit list of DAG IDs to monitor.
In the wizard for Apache Airflow integration configuration, you can provide a comma-separated list of DAGs to sync.
If you do not want to track specific DAGs, operators, or functions, you can exclude them from automatic tracking by using the following function:
-
dont_track(dag) -
dont_track(operator)
Alternatively, you can use the @dont_track decorator that is shown in the
following example:
from dbnd import dont_track
@dont_track
def f():
pass
Tracking specific DAGs
If you don't want to use automatic tracking, install dbnd-airflow package
instead of dbnd-airflow-auto-tracking. For specific DAGs that you want to track,
add track_dag function to your DAG definition.
from dbnd_airflow import track_dag
track_dag(dag)
[airflow_tracking] configuration section parameter reference
You can add the following parameters to the [airflow_tracking]
configuration:
-
spark_submit_dbnd_java_agent - Sets the DBND Java agent
.jarfile to track a Java application that is on the local system. -
databricks_dbnd_java_agent - Sets the DBND Java agent
.jarfile to track a Java application that is on the remote system. -
track_airflow_execute_result - Enables saving the results of tracked Airflow operators.
-
track_xcom_values - Logs the values of xcom variables from Airflow.
-
max_xcom_length - Sets the number of xcom values to track per operator.
-
af_with_monitor - Activates when the Airflow monitor is not in use.
-
sql_reporting - Enables reporting targets from SQL queries.
Databand monitor DAG memory guard
When a DAG monitor is running, the memory guard automatically limits the amount of memory the monitor can consume. The default value is 8 GB. If the monitor consumes more memory, it stops.
To limit the number of bytes that the monitor can consume:
- Add the
guard_memoryparameter to theget_monitor_dagfunction. - Set the parameter to the maximum number of bytes the monitor can consume. For example, the following parameter limits memory consumption to 5 GB:
from airflow_monitor.monitor_as_dag import get_monitor_dag
dag = get_monitor_dag(guard_memory=5 * 1024 * 1024 * 1024)
The main source of memory consumption by the Databand Monitor DAG is an Airflow DAGBag with the "in-memory" representation of all DAGs. A DAGBag is a collection of DAGs that are loaded in memory by running the user code with DAGs definition (Airflow DAGBag is the official way of loading DAG information). Because the Airflow database in old Airflow versions doesn't have the full context of the DAG (DAG structure for example), Databand will load DAGs from disk into DAGBag and sync the DAG structure. Although Airflow DAGbag parses all DAGs in the DAGs folder, currently Databand sends only relevant DAGs to the server (in your case the DAGs that are defined by the filter).