Managing data quality queries

Add, edit, delete, and add alerts for data quality queries.

On the Data quality page of the Databand UI, you can:

Adding a data quality query

The following use case shows an end-to-end process of creating a data quality validation rule, including a sample SQL query, its testing, the query results, and the results analysis.

You can create queries only for the warehouses that you integrated with Databand, which are monitored by a data at rest monitor (Snowflake and BigQuery). Regardless of the selected warehouse, you add and run a query in the same way.

In the use case, we are going to count the number of customers in the CUSTOMERS table who meet the following two conditions:

  • Their phone_number field does not have exactly 10 characters.
  • The last_update_time field is between a specific date (which is one day before the execution_date) and the execution_date.

To create the data quality validation rule:

  1. Go to the Datasets > Data quality page and click Add data quality query.

  2. In the opened window select the warehouse.

  3. Provide an SQL query. When you create your query, follow the guidelines that are provided in the wizard. Make sure that your SQL includes time-based partitioning by using the {{ execution_date }} Jinja template. It is replaced with the actual execution time during the query run. Because the system provides the result of the query as a numerical value, you need to create a query that returns a number. To create this type of query, you can use the count() function.

    SQL expression example for a Snowflake warehouse:
    SELECT
      COUNT(*)
    FROM
      CUSTOMERS
    WHERE
      LENGTH(phone_number) <> 10
    AND
      last_update_time BETWEEN dateadd(day, -1, '{{ execution_date }}' ) and '{{ execution_date }}'
    SQL expression example for a BigQuery warehouse:
    SELECT
      COUNT(*)
    FROM
      CUSTOMERS
    WHERE
      LENGTH(phone_number) <> 10
    AND
      last_update_time BETWEEN DATE_SUB('{{ execution_date }}', INTERVAL 1 DAY) AND '{{ execution_date }}'
  4. Click Test query to make sure that your query is valid.

    Each time that you run the query you are billed by your database provider.

  5. After you test the query, you can check:

    • The query result - in our case it was completed successfully.
    • The value returned - in our case the number of customers who meet the conditions that are specified in the query.
    • The query duration.
    Important: If your query failed, you need to change it and retest it. The positive outcome of the test is a prerequisite to proceed to the next steps.
  6. If the run was successful, you can continue to the next step.
  7. Optional: Schedule runs by using CRON expressions. We want to schedule our query to run at the top of every hour, hence we create the following expression: 0 * * * *.

    The CRON expression is based on the UTC time zone.

  8. Optional: You can also make sure that the query will stop automatically after a time frame specified by you. To do so, select and complete the checkbox.
    Note: You can set the time frame up to 60 minutes.
  9. Click Next.
  10. You can provide the name of the query. Click Save query.
  11. The Data quality page displays the details of the saved query and the results of its run, such as:
    • Name
    • Data warehouse on which the query was run
    • Returned values (presented in a form of a bar chart)
    • Schedule of the query runs
    • Time of the last run
    • Query run metrics (with the deviation from the average returned value)

Disabling and enabling data quality queries

By default the data queries are enabled. To temporarily disable them, move the toggle in the Active column next to the quality query that you don't want to run. To enable the data query, move the toggle in the Disabled column.

Note: Only Owners can disable and enable data quality queries.

Editing and deleting data quality queries

To edit or delete a query:

  1. Click the More options icon More options.
  2. Select the relevant option (Edit or Delete) from the list.

You can edit a query in the same way that you created it.

Note: Only Owners can edit data quality queries.

Creating an alert

To create an alert for a query:

  1. Hover your mouse over the query, next to the Last run column.
  2. Click the Create alert icon Create alert.
  3. The wizard for Data quality metric alert is displayed. For more information, see Managing alerts.