Tracking Python functions

For better visibility, you can annotate your function with a decorator (@task).

For better visibility, you can annotate your function with a decorator (@task). The following example shows a Python function. Decorators for Java and Scala functions are also supported.
from dbnd import task
import pandas as pd

@task
def user_function(pandas_df: pd.DataFrame, counter: int, random: int):
    return "OK"

For certain objects that are passed to your functions, such as Pandas DataFrames and Spark DataFrames, Databand automatically collects stats and schema information. This automation makes it simpler to track data lineage and report on data quality issues.

To enable implicit tracking, set the environment variable DBND__TRACKING to True. This parameter enables tracking regardless of whether the dbnd_tracking() context is applied.

export DBND__TRACKING=True

For more information, see Context configuration.

Customizing task names by using @task decorator

With @task decorators, you can also customize the name of the Python function. As a result, it is displayed in the Databand UI with the name you provided instead of the default one, from function definition.

To do so, you need to define the task_family parameter.

Look at the following code snippet that uses the @task decorator with the task_family parameter. The name of the function in this example is "my_custom_name":

from dbnd import task

@task(task_family="my_custom_name")
def user_function():
    return "OK"

Tracking specific functions without changing the module code

To track functions from a module, you can use track_functions instead of decorating each function with @task.

In the following example, module1 contains f1 and f2 functions:

from module1 import f1,f2

from dbnd import track_functions
track_functions(f1, f2)

track_functions uses functions as arguments and automatically decorates them so that you can track any function without changing your existing function code or manually adding decorators.

Tracking all functions from a specific module

Use track_module_functions to track all functions within a named module. For instance, module2.pyfrom the previous example would look like this:

import module1
from dbnd import track_module_functions

track_module_functions(module1)

Tracking all functions from multiple modules

To track all functions from multiple modules, you can use track_modules, which receives modules as arguments and tracks all functions that are contained within those modules. Look at the following examples:

from dbnd import track_modules

import module1
import module2

track_modules(module1, module2)

Tracking external resources within a specific task

If the value that you want to track is a URL, you can use set_external_resource_urls, which logs the URL in the specific task context. The set_external_resource_urls(links:dict) function accepts one parameter with a dictionary of {"key": "URL"}.

from dbnd._core.tracking.commands import set_external_resource_urls

set_external_resource_urls(
            {"my_resource": "http://some_resource_name.com/path/to/resource/123456789"}
        )