GitHubContribute in GitHub: Edit online

dcount() (aggregation function)

See KQL Compatibility for differences in the implementation.

Returns an estimate for the number of distinct values that are taken by a scalar expression in the summary group.

The dcount() aggregation function is primarily useful for estimating the cardinality of huge sets. It trades performance for accuracy, and may return a result that varies between executions. The order of inputs may have an effect on its output.

Syntax

dcount (Expr[, Accuracy])

Arguments

Expr Type Required Description
Expr string A scalar expression whose distinct values are to be counted.
Accuracy int An optional int literal that defines the requested estimation accuracy.
See below for supported values. If unspecified, the default value 1 is used.

Accuracy, if specified, controls the balance between speed and accuracy.

Value Description
0 The least accurate and fastest calculation. 1.6% error
1 The default, which balances accuracy and calculation time; about 0.8% error.
2 Accurate and slow calculation; about 0.4% error.
3 Extra accurate and slow calculation; about 0.28% error.
4 Super accurate and slowest calculation; about 0.2% error.

Example

Returns an estimate of the number of distinct name.

events
    | project name, original_time
    | where original_time > ago(5m)
    | summarize NameCount=dcount(name)

Results

NameCount
2032

Example

Get an exact count of distinct values of V grouped by G.

T | summarize by V, G | summarize count() by G

This calculation requires a great amount of internal memory, since distinct values of V are multiplied by the number of distinct values of G. It may result in memory errors or large execution times. dcount()provides a fast and reliable alternative:

T | summarize dcount(V) by G | count

Estimation accuracy

The dcount() aggregate function uses a variant of the HyperLogLog (HLL) algorithm, which does a stochastic estimation of set cardinality. The algorithm provides a "knob" that can be used to balance accuracy and execution time per memory size:

Accuracy Error (%) Entry count
0 1.6 212
1 0.8 214
2 0.4 216
3 0.28 217
4 0.2 218

The "entry count" column is the number of 1-byte counters in the HLL implementation.

The algorithm includes some provisions for doing a perfect count (zero error), if the set cardinality is small enough:

  • When the accuracy level is 1, 1000 values are returned
  • When the accuracy level is 2, 8000 values are returned

The error bound is probabilistic, not a theoretical bound. The value is the standard deviation of error distribution (the sigma), and 99.7% of the estimations will have a relative error of under 3 x sigma.