Controlling tracked DAGs
By default, all Airflow DAGs are synced. You can optionally filter out DAGs by providing an explicit list of DAG IDs to monitor. In the wizard for Apache Airflow integration configuration, you can provide a comma-separated list of DAGs to sync.
If you do not want to track specific DAGs, operators, or functions, you can exclude them from automatic tracking by using the following function:
dont_track(dag)
dont_track(operator)
Alternatively, you can use @dont_track
decorator that is shown in the following example:
from dbnd import dont_track
@dont_track
def f():
pass
Tracking specific DAGs
If you don't want to use automatic tracking, install dbnd-airflow
package instead of dbnd-airflow-auto-tracking
. For specific DAGs that you want to track, add track_dag
function to your DAG definition.
from dbnd_airflow import track_dag
track_dag(dag)
[airflow_tracking]
Configuration section parameter reference
- spark_submit_dbnd_java_agent
-
Sets the DBND Java agent `.jar` file to track a Java application that is on the local system.
- databricks_dbnd_java_agent
-
Sets the DBND Java agent `.jar` file to track a Java application that is on the remote system.
- track_airflow_execute_result
-
Enables saving the results of tracked Airflow operators.
- track_xcom_values
-
Logs the values of xcom variables from Airflow.
- max_xcom_length
-
Sets the number of xcom values to track per operator.
- af_with_monitor
-
Activates when the Airflow monitor is not in use.
- sql_reporting
-
Enables reporting targets from SQL queries.
Databand monitor DAG memory guard
When a DAG monitor is running, the memory guard automatically limits the amount of memory the monitor can consume. The default value is 8 GB. If the monitor consumes more memory, it stops.
To limit the number of bytes that the monitor can consume, add the guard_memory
parameter to the get_monitor_dag
function. Set it to the maximum number of bytes the monitor can consume. For example, the following parameter
limits memory consumption to 5 GB:
from airflow_monitor.monitor_as_dag import get_monitor_dag
dag = get_monitor_dag(guard_memory=5 * 1024 * 1024 * 1024)
The main source of memory consumption by Databand Monitor DAG is Airflow DAGBag with the "in-memory" representation of all DAGs. A DAGBag is a collection of DAGs that are loaded in memory by running the user code with DAGs definition (Airflow DAGBag is the official way of loading DAG information). Because the Airflow database in old Airflow versions doesn't have the full context of the DAG (DAG structure for example), Databand will load DAGs from disk into DAGBag and sync the DAG structure. Although Airflow DAGbag parses all DAGs in the DAGs folder, currently Databand sends only relevant DAGs to the server (in your case the DAGs that are defined by the filter).