Example: Top 10 data sources
This example demonstrates how to find the top 10 data sources which are generating the most events.
In the query below, we use the following columns, on the events view. See Domain Model for more information on views.
Columns | Type | Description |
---|---|---|
data_source_id | string | The unique identifier assigned to each configured datasource |
original_time | long | The time in milliseconds the event was generated by the reporting device |
Query
events_all
| project
event_uuid,
data_source_id,
original_time
| where original_time > ago(24h) and isnotempty(data_source_id)
| lookup datasources on $left.data_source_id == $right.data_source_id
| summarize EventCount=count_distinct(event_uuid) by DataSourceName=data_source_name
| order by EventCount desc
| take 10
Results
DataSourceName | EventCount |
---|---|
CheckPointSource1 | 102394 |
CheckPointSource2 | 101391 |
paloAltoPASeriesSource2 | 82714 |
paloAltoPASeriesSource7 | 81714 |
microsoftWindowsSource2 | 75766 |
microsoftWindowsSource1 | 73766 |
microsoftWindowsSource9 | 72170 |
microsoftWindowsSource301 | 21170 |
ciscoASASource2 | 20172 |
ciscoASASource18 | 19272 |
Query explanation
The beginning of the query is pretty straight forward. We project the required columns and perform any convenience conversions to make working with columns easier.
| project event_uuid, low_level_categories, original_time
Next, ensure the where clause(s) reflects exactly what you are looking for to avoid unnecessary cost. In this case we are looking for events in the last 24 hours, and data_source_id's which are not empty. isnotempty is used to check for both nulls, and empty string's.
| where original_time > ago(24h) and isnotempty(data_source_id)
In the results, we want to display the name of the data source, and not the id which is stored with the event itself. To do this, we use the lookup operator to fetch the data_source_name based on the data_source_id. After the lookup operator is called, data_source_name will be added to the projection in addition to the data_source_id.
| lookup datasources on $left.data_source_id == $right.data_source_id
Once the name has been resolved, we summarize using count_distinct() on the event_uuid column to only count the number of events, by the resolved data_source_name. The resulting projection from summarize will only include DataSourceName and EventCount. All prior columns are removed by the operator.
| summarize EventCount=count_distinct(event_uuid)
by DataSourceName=data_source_name
The remaining query is simply call to sort on the EventCount column, followed by take 10 resulting in the top 10 data sources.
| sort by EventCount desc
| take 10
Alternative approach
An alternative approach to using summarize is to use top-hitters. With this operator, it performs the same operation as summarize, but is typically more efficient as it is an approximation. See KQL Compatibility for differences in the top-hitters implementation.
Query
events_all
| project
event_uuid,
data_source_id,
original_time
| where original_time > ago(24h) and isnotempty(data_source_id)
| lookup datasources on $left.data_source_id == $right.data_source_id
| top-hitters 10 of data_source_name
| project DataSourceName=data_source_name, EventCount=approximate_count_data_source_name
Results
The results here are the same as the easlier query with summarize. The EventCount is aggregated by the DataSourceName.
DataSourceName | EventCount |
---|---|
CheckPointSource1 | 102394 |
CheckPointSource2 | 101391 |
paloAltoPASeriesSource2 | 82714 |
paloAltoPASeriesSource7 | 81714 |
microsoftWindowsSource2 | 75766 |
microsoftWindowsSource1 | 73766 |
microsoftWindowsSource9 | 72170 |
microsoftWindowsSource301 | 21170 |
ciscoASASource2 | 20172 |
ciscoASASource18 | 19272 |
The only significant difference here is that we added an additional project to change the names and order of the columns. By default, top-hitters generates a column named approximate_count_data_source_name so renaming it to EventCount just cleans it up.