Contribute in GitHub:

dcount() (aggregation function)

See KQL Compatibility for differences in the implementation.

Returns an estimate for the number of distinct values that are taken by a scalar expression in the summary group.

The dcount() aggregation function is primarily useful for estimating the cardinality of huge sets. It trades performance for accuracy, and may return a result that varies between executions. The order of inputs may have an effect on its output.

Syntax

dcount (Expr[, Accuracy])

Arguments

Expr	Type	Required	Description
Expr	string	✓	A scalar expression whose distinct values are to be counted.
Accuracy	int		An optional `int` literal that defines the requested estimation accuracy. See below for supported values. If unspecified, the default value `1` is used.

Accuracy, if specified, controls the balance between speed and accuracy.

Value	Description
0	The least accurate and fastest calculation. 1.6% error
1	The default, which balances accuracy and calculation time; about 0.8% error.
2	Accurate and slow calculation; about 0.4% error.
3	Extra accurate and slow calculation; about 0.28% error.
4	Super accurate and slowest calculation; about 0.2% error.

Example

Returns an estimate of the number of distinct name.

events
    | project name, original_time
    | where original_time > ago(5m)
    | summarize NameCount=dcount(name)

Results

NameCount
2032

Example

Get an exact count of distinct values of V grouped by G.

T | summarize by V, G | summarize count() by G

This calculation requires a great amount of internal memory, since distinct values of V are multiplied by the number of distinct values of G. It may result in memory errors or large execution times. dcount()provides a fast and reliable alternative:

T | summarize dcount(V) by G | count

Estimation accuracy

The dcount() aggregate function uses a variant of the HyperLogLog (HLL) algorithm, which does a stochastic estimation of set cardinality. The algorithm provides a "knob" that can be used to balance accuracy and execution time per memory size:

Accuracy	Error (%)	Entry count
0	1.6	2¹²
1	0.8	2¹⁴
2	0.4	2¹⁶
3	0.28	2¹⁷
4	0.2	2¹⁸

The "entry count" column is the number of 1-byte counters in the HLL implementation.

The algorithm includes some provisions for doing a perfect count (zero error), if the set cardinality is small enough:

When the accuracy level is 1, 1000 values are returned
When the accuracy level is 2, 8000 values are returned

The error bound is probabilistic, not a theoretical bound. The value is the standard deviation of error distribution (the sigma), and 99.7% of the estimations will have a relative error of under 3 x sigma.