dcount() (aggregation function)
See KQL Compatibility for differences in the implementation.
Returns an estimate for the number of distinct values that are taken by a scalar expression in the summary group.
The dcount() aggregation function is primarily useful for estimating the cardinality of huge sets. It trades performance for accuracy, and may return a result that varies between executions. The order of inputs may have an effect
on its output.
Syntax
dcount (Expr[, Accuracy])
Arguments
| Expr | Type | Required | Description |
|---|---|---|---|
| Expr | string | ✓ | A scalar expression whose distinct values are to be counted. |
| Accuracy | int | An optional int literal that defines the requested estimation accuracy.See below for supported values. If unspecified, the default value 1 is used. |
Accuracy, if specified, controls the balance between speed and accuracy.
| Value | Description |
|---|---|
| 0 | The least accurate and fastest calculation. 1.6% error |
| 1 | The default, which balances accuracy and calculation time; about 0.8% error. |
| 2 | Accurate and slow calculation; about 0.4% error. |
| 3 | Extra accurate and slow calculation; about 0.28% error. |
| 4 | Super accurate and slowest calculation; about 0.2% error. |
Example
Returns an estimate of the number of distinct name.
events
| project name, original_time
| where original_time > ago(5m)
| summarize NameCount=dcount(name)
Results
| NameCount |
|---|
| 2032 |
Example
Get an exact count of distinct values of V grouped by G.
T | summarize by V, G | summarize count() by G
This calculation requires a great amount of internal memory, since distinct values of V are multiplied by the number of distinct values of G. It may result in memory errors or large execution times.
dcount()provides a fast and reliable alternative:
T | summarize dcount(V) by G | count
Estimation accuracy
The dcount() aggregate function uses a variant of the HyperLogLog (HLL) algorithm, which does a stochastic estimation of set cardinality. The algorithm provides a "knob"
that can be used to balance accuracy and execution time per memory size:
| Accuracy | Error (%) | Entry count |
|---|---|---|
| 0 | 1.6 | 212 |
| 1 | 0.8 | 214 |
| 2 | 0.4 | 216 |
| 3 | 0.28 | 217 |
| 4 | 0.2 | 218 |
The "entry count" column is the number of 1-byte counters in the HLL implementation.
The algorithm includes some provisions for doing a perfect count (zero error), if the set cardinality is small enough:
- When the accuracy level is
1, 1000 values are returned - When the accuracy level is
2, 8000 values are returned
The error bound is probabilistic, not a theoretical bound. The value is the standard deviation of error distribution (the sigma), and 99.7% of the estimations will have a relative error of under 3 x sigma.