Python SDK configuration

Python Databand SDK uses its own configuration system that you can use to set and update your configuration in a way that is most suitable for your needs.

The Python Databand SDK consists of:

Environment variables
Configuration files
Code
External configs, such as an Airflow connection

Environment variables

You can set any system parameter by using environment variables. For example, you can override the databand_url parameter under the core section by setting a value for the DBND__CORE__DATABAND_URL environment variables. For example:

export DBND__CORE__DATABAND_URL="https://yourdataband-service.databand.ai"

Similarly, you can override any environment variable for other configuration parameters by using the DBND__<SECTION>__<KEY> format.

Configuration files in Databand

Databand loads configuration information sequentially from the following configuration files:

Table 1. List of configuration files, their location, and loading priority
File loading priority	File location	File description
1	$DBND_LIB/databand-core.cfg	Provides the default core configuration of the system that cannot be changed.
2	$DBND_SYSTEM/databand-system.cfg	Provides middle layer configuration. Use this file to configure project infrastructure.
3	$DBND_HOME/project.cfg	Provides a project configuration. Use for configuring user-facing parts of the project.
4	$USER_HOME/.dbnd/databand.cfg	Provides system user configuration.

You can also create configuration files in a custom location. Use the DBND__CONF__FILE environment variable to create custom files.

Configuration information that is available in files that is specified in the inferior configuration layers overrides the configuration that is specified in the superior layers. For example, you specified config key A in the $DBND_SYSTEM/databand-system.cfg. If in $DBND_HOME/project.cfg the configuration of A is specified, then Databand uses the configuration that is specified in the inferior configuration layer, for example $DBND_HOME/project.cfg. For more information, see ConfigParser.read.

Environment variables in configuration files

You can use $DBND_HOME, $DBND_LIB or $DBND_SYSTEM in your configuration file, or any other environment variable as shown in the following section.

[core]
databand_url="${YOUR_ENV_VARIABLE}"

Changing the configuration for a specific section of the code

You can use the config context manager to set up the configuration in the code:

from dbnd import config
with config({"section": {"key": "value"}}):
    pass

You can also load configuration from a file:

from dbnd import config

from dbnd._core.configuration.config_readers import read_from_config_file
with config(read_from_config_file("/path/to/config.cfg")):
    pass

Python SDK advanced configuration

With Python Databand SDK, you can also perform more advanced configuration:

Passing a list parameters in a .cfg file
Passing a dictionary parameter in a .cfg file
Using configuration files with different use cases of files with the default variables overridden in production or test
Using multiple extra configuration files
Controlling the output type of dbnd_config

Passing list parameters in a .cfg file

To pass list parameters in a .cfg file, use the following syntax:

[some_section]
list_param = [1, 2, 3, "str1"]

Passing a dictionary parameter in a .cfg file

To pass dictionary parameters in a .cfg file, use the following syntax:

[some_section]
dict_param = {"key1": "value1", 255: 3}

Using configuration files with different use cases of files with the default variables overridden in production or test

You can also use configuration files, for example files with different use cases or files where some default variables are overridden in production or test. To specify such a file, you can set an environment variable:

export DBND_CONFIG=<extra_file_path>

Using multiple extra configuration files

With -- conf and DBND__DATABAND__CONF variables, you can add multiple files (such as a list of files separated by comma).

Controlling the output type of dbnd__config

The dbnd_config is a 'dict'-like object that stores only the mapping to value.

To control the output type of dbnd_config.get ("section", "key"), you can use getboolean, getint, or getfloat (for permeable types).

Changing the configuration to control the tracking store behavior when errors occur

The Python Databand SDK uses the tracking system to report the state of the runs or tasks to the Databand web server. Errors can occur when you report important information, and these errors can cause invalid states for the runs you see in the Databand webapp.

The tracking system uses a different tracking store, and each reports the information to a different location, for example:

Web tracking-store - reports to Databand webserver.
Console tracking-store - writes the events to the console.

To control the behavior of the tracking system when errors occur, use the following configurations under the core section:

[core]
remove_failed_store=true
tracker_raise_on_error=true

remove_failed_store: The parameter removes a tracking store if multiple fails occur. Default value = false.
max_tracking_store_retries: The parameter defines the maximal amount of retries allowed for a single tracking store call if it fails. Default value = 2.
tracker_raise_on_error: The parameter stops the run with an error if a critical error occurs on the tracking like failing to connect the web-server. Default value = true.

Changing the configuration to control logging of parameters within decorated functions

The logging configuration options help you save computational resources and protect against reporting of sensitive data, such as full data previews.

Logging data processes and full data quality reports in Databand can be resource-intensive. However, explicitly turning off all calculations for log_value_size, log_value_schema, log_value_stats, log_value_preview, log_value_preview_max_len, log_value_meta results in valuable metrics not being tracked at all. To help you better manage logging performance and visibility needs, it's now possible to selectively calculate and log metadata through a zero-computational-cost approach.

When you use a new configuration, you can decide whether you want to log certain information with the help of value_reporting_strategy. value_reporting_strategy changes nothing in your code, but acts as a guard (or fuse) before the value calculation code gets to execution:

[tracking]
value_reporting_strategy=SMART

The following options are available:

ALL: No restrictions for logging. All the log_value_ types are on: log_value_size, log_value_scheme, log_value_stats, log_value_preview, log_value_preview_max_len, log_value_meta.
SMART: Restrictions on lazy evaluation types. For types like Spark, values are only calculated when they are needed. The calculations for lazy evaluation types are restricted and even if you have log_value_preview set to True, when the SMART strategy is on, the Spark previews are not logged.
NONE: No logging of anything expensive or potentially problematic. This option can be useful if you have some or many values that constitute private and sensitive information, and you don’t want them to be logged.
<!-- ALL: No restrictions for logging. All the log_value_ types are on: log_value_size, log_value_scheme, log_value_stats, log_value_preview, log_value_preview_max_len, log_value_meta.
SMART: Restrictions on lazy evaluation types. For types like Spark, values are only calculated when they are needed. The calculations for lazy evaluation types are restricted and even if you have log_value_preview set to True, when the SMART strategy is on, the Spark previews are not logged.
NONE: No logging of anything expensive or potentially problematic. This option can be useful if you have some or many values that constitute private and sensitive information, and you don’t want them to be logged.-->

Most users can benefit from using the SMART option for logging.

The list of available [tracking] configuration parameters

You can add the following parameters to the [tracking] configuration:

project: Set the project to which the run is assigned. If you don't set this value, the default project is used. The tracking server selects a project with is_default == True.
databand_external_url: Set a tracker URL to be used for tracking from external systems.
log_value_size: Calculate and log the value's size. Enabling this parameter causes a full scan on nonindexable distributed memory objects.
log_value_schema: Calculate and log the value's schema.
log_value_stats: Calculate and log the value's stats. This parameter is expensive to calculate, so it can be better to use log_stats on the parameter level.
log_value_preview: Calculate and log the value's preview. This parameter can be expensive to calculate on Spark.
log_value_preview_max_len: Set the max size of the value's preview to be saved at the service. The max value of this parameter is 50000.
log_value_meta: Calculate and log the value's meta.
log_histograms: Enable calculation and tracking of histograms. This parameter can be expensive.
value_reporting_strategy: Set the strategy used for the reporting of values. You have multiple strategy options, each with different limitations on potentially expensive calculations for value_meta. ALL removes all limitations. SMART limits lazy evaluation types. NONE, which is the default value, limits everything.
track_source_code: Enable tracking of function, module, and file source code.
auto_disable_slow_size: Enable automatically disabling slow previews for Spark DataFrame with text formats.
flatten_operator_fields: Control which of the operator's fields are flattened when tracked.
capture_tracking_log: Enable log-capturing for tracking tasks.

`[core]` configuration section parameter reference

You can add the following parameters to the tracking context by adding configuration by using the conf parameter of the dbnd_tracking function. This is not recommended for production usage.

databand_url: Set the tracker URL to be used for creating links in the console logs.
databand_access_token: Set the personal access token used to connect to the Databand web server.
extra_default_headers: Specify extra headers to be used as defaults for databand_api_client.
tracker: Set the tracking stores to be used.
tracker_api: Set the tracker channels to be used by the 'api' store.
debug_webserver: Enable collecting the webserver's logs for each api-call on the local machine. It needs to be supported by the web-server.
silence_tracking_mode: Enable silencing the console when in tracking mode.
tracker_raise_on_error: Enable raising an error when failed to track data.
remove_failed_store: Enable removal of a tracking store if it fails.
max_tracking_store_retries: Set the maximum number of retries allowed for a single tracking store call if it fails.
client_session_timeout: Set the number of minutes to re-create the api client's session.
client_max_retry: Set the maximum number of retries on a failed connection for the api client.
client_retry_sleep: Set the amount of sleep time in between retries of the API client.
user_configs: Set the config used for creating tasks from the user code.
user_init: Runs in every dbnd process with the system configuration in place. This is called in DatabandContext after entering initialization steps by the SDK.
user_driver_init: Runs in a driver after configuration initialization. This is called from DatabandContext when Python runtime is entering a new context.
user_code_on_fork: Runs in a sub process, on parallel, Kubernetes, or external modes.
plugins: Specify which plug-ins to load on Databand context creations.
allow_vendored_package: Enable adding the dbnd/_vendor_package module to your system path.
fix_env_on_osx: Enable adding no_proxy=* to environment variables, fixing issues with multiprocessing on OSX.
environments: Set a list of enabled environments.
dbnd_user: Set which user to connect to the Databand web server. This parameter is deprecated.
dbnd_password: Set what password needs to be used to connect to the Databand web server. This parameter is deprecated.
tracker_url: Set the tracker URL to be used for creating links in console logs. This parameter is deprecated.