Managing data quality queries
Add, edit, delete, and add alerts for data quality queries.
On the Data quality page of the Databand UI, you can:
- Add queries
- Edit and delete queries
- Display the query's metric in the form of a chart, see Data quality
- Add an alert
Adding a data quality query
The following use case shows an end-to-end process of creating a data quality validation rule, including a sample SQL query, its testing, the query results, and the results analysis.
You can create queries only for the warehouses that you integrated with Databand, which are monitored by a data at rest monitor (Snowflake and BigQuery). Regardless of the selected warehouse, you add and run a query in the same way.
In the use case, we are going to count the number of customers in the CUSTOMERS table who meet the following two conditions:
- Their
phone_numberfield does not have exactly 10 characters. - The
last_update_timefield is between a specific date (which is one day before theexecution_date) and theexecution_date.
To create the data quality validation rule:
-
Go to the Datasets > Data quality page and click Add data quality query.
-
In the opened window select the warehouse.
-
Provide an SQL query. When you create your query, follow the guidelines that are provided in the wizard. Make sure that your SQL includes time-based partitioning by using the {{ execution_date }} Jinja template. It is replaced with the actual execution time during the query run. Because the system provides the result of the query as a numerical value, you need to create a query that returns a number. To create this type of query, you can use the
SQL expression example for a Snowflake warehouse:count()function.
SQL expression example for a BigQuery warehouse:SELECT COUNT(*) FROM CUSTOMERS WHERE LENGTH(phone_number) <> 10 AND last_update_time BETWEEN dateadd(day, -1, '{{ execution_date }}' ) and '{{ execution_date }}'SELECT COUNT(*) FROM CUSTOMERS WHERE LENGTH(phone_number) <> 10 AND last_update_time BETWEEN DATE_SUB('{{ execution_date }}', INTERVAL 1 DAY) AND '{{ execution_date }}' -
Click Test query to make sure that your query is valid.
Each time that you run the query you are billed by your database provider.
-
After you test the query, you can check:
- The query result - in our case it was completed successfully.
- The value returned - in our case the number of customers who meet the conditions that are specified in the query.
- The query duration.
Important: If your query failed, you need to change it and retest it. The positive outcome of the test is a prerequisite to proceed to the next steps. - If the run was successful, you can continue to the next step.
-
Optional: Schedule runs by using CRON expressions. We want to schedule our query to run at the top of every hour, hence we create the following expression: 0 * * * *.
The CRON expression is based on the UTC time zone.
- Optional: You can also make sure that the query will stop automatically after a time frame specified by you. To do so, select and complete the checkbox.
Note: You can set the time frame up to 60 minutes.
- Click Next.
- You can provide the name of the query. Click Save query.
- The Data quality page displays the details of the saved query and the results of its run, such as:
- Name
- Data warehouse on which the query was run
- Returned values (presented in a form of a bar chart)
- Schedule of the query runs
- Time of the last run
- Query run metrics (with the deviation from the average returned value)
Disabling and enabling data quality queries
By default the data queries are enabled. To temporarily disable them, move the toggle in the Active column next to the quality query that you don't want to run. To enable the data query, move the toggle in the Disabled column.
Editing and deleting data quality queries
To edit or delete a query:
- Click the More options icon
.
- Select the relevant option (Edit or Delete) from the list.
You can edit a query in the same way that you created it.
Creating an alert
To create an alert for a query:
- Hover your mouse over the query, next to the Last run column.
- Click the Create alert icon
. - The wizard for Data quality metric alert is displayed. For more information, see Managing alerts.