Tracking Python functions
For better visibility, you can annotate your function with a decorator (@task).
from dbnd import task
import pandas as pd
@task
def user_function(pandas_df: pd.DataFrame, counter: int, random: int):
return "OK"For certain objects that are passed to your functions, such as Pandas DataFrames and Spark DataFrames, Databand automatically collects stats and schema information. This automation makes it simpler to track data lineage and report on data quality issues.
To enable implicit tracking, set the environment variable
DBND__TRACKING to True. This parameter
enables tracking regardless of whether the dbnd_tracking() context
is applied.
export DBND__TRACKING=True
For more information, see Context configuration.
Customizing task names by using @task decorator
With @task decorators, you can also customize the name of the Python function. As a result, it is displayed in the Databand UI with the name you provided instead of the default one, from function definition.
To do so, you need to define the task_family parameter.
Look at the following code snippet that uses the @task decorator with the
task_family parameter. The name of the function in this example is
"my_custom_name":
from dbnd import task
@task(task_family="my_custom_name")
def user_function():
return "OK"
Tracking specific functions without changing the module code
To track functions from a module, you can use track_functions instead of
decorating each function with @task.
In the following example, module1 contains f1 and
f2 functions:
from module1 import f1,f2
from dbnd import track_functions
track_functions(f1, f2)
track_functions uses functions as arguments and automatically decorates them so
that you can track any function without changing your existing function code or manually adding
decorators.
Tracking all functions from a specific module
Use track_module_functions to track all functions within a named module. For
instance, module2.pyfrom the previous example would look like this:
import module1
from dbnd import track_module_functions
track_module_functions(module1)
Tracking all functions from multiple modules
To track all functions from multiple modules, you can use track_modules, which
receives modules as arguments and tracks all functions that are contained within those modules. Look
at the following examples:
from dbnd import track_modules
import module1
import module2
track_modules(module1, module2)
Tracking external resources within a specific task
If the value that you want to track is a URL, you can use
set_external_resource_urls, which logs the URL in the specific task context. The
set_external_resource_urls(links:dict) function accepts one parameter with a
dictionary of {"key": "URL"}.
from dbnd._core.tracking.commands import set_external_resource_urls
set_external_resource_urls(
{"my_resource": "http://some_resource_name.com/path/to/resource/123456789"}
)