GitHubContribute in GitHub: Edit online

Example: Top 10 data sources

This example demonstrates how to find the top 10 data sources which are generating the most events.

In the query below, we use the following columns, on the events view. See Domain Model for more information on views.

Columns Type Description
data_source_id string The unique identifier assigned to each configured datasource
original_time long The time in milliseconds the event was generated by the reporting device

Query

events_all
    | project
        event_uuid,
        data_source_id,
        original_time
    | where original_time > ago(24h) and isnotempty(data_source_id)
    | lookup datasources on $left.data_source_id == $right.data_source_id
    | summarize EventCount=count_distinct(event_uuid) by DataSourceName=data_source_name
    | order by EventCount desc
    | take 10

Results

DataSourceName EventCount
CheckPointSource1 102394
CheckPointSource2 101391
paloAltoPASeriesSource2 82714
paloAltoPASeriesSource7 81714
microsoftWindowsSource2 75766
microsoftWindowsSource1 73766
microsoftWindowsSource9 72170
microsoftWindowsSource301 21170
ciscoASASource2 20172
ciscoASASource18 19272

Query explanation

The beginning of the query is pretty straight forward. We project the required columns and perform any convenience conversions to make working with columns easier.

 | project event_uuid, low_level_categories, original_time

Next, ensure the where clause(s) reflects exactly what you are looking for to avoid unnecessary cost. In this case we are looking for events in the last 24 hours, and data_source_id's which are not empty. isnotempty is used to check for both nulls, and empty string's.

  | where original_time > ago(24h) and isnotempty(data_source_id)

In the results, we want to display the name of the data source, and not the id which is stored with the event itself. To do this, we use the lookup operator to fetch the data_source_name based on the data_source_id. After the lookup operator is called, data_source_name will be added to the projection in addition to the data_source_id.

  | lookup datasources on $left.data_source_id == $right.data_source_id

Once the name has been resolved, we summarize using count_distinct() on the event_uuid column to only count the number of events, by the resolved data_source_name. The resulting projection from summarize will only include DataSourceName and EventCount. All prior columns are removed by the operator.

  | summarize EventCount=count_distinct(event_uuid) 
        by DataSourceName=data_source_name

The remaining query is simply call to sort on the EventCount column, followed by take 10 resulting in the top 10 data sources.

  | sort by EventCount desc
  | take 10


Alternative approach

An alternative approach to using summarize is to use top-hitters. With this operator, it performs the same operation as summarize, but is typically more efficient as it is an approximation. See KQL Compatibility for differences in the top-hitters implementation.

Query

    events_all
      | project
          event_uuid,
          data_source_id,
          original_time
      | where original_time > ago(24h) and isnotempty(data_source_id)
      | lookup datasources on $left.data_source_id == $right.data_source_id
      | top-hitters 10 of data_source_name
      | project DataSourceName=data_source_name, EventCount=approximate_count_data_source_name

Results

The results here are the same as the easlier query with summarize. The EventCount is aggregated by the DataSourceName.

DataSourceName EventCount
CheckPointSource1 102394
CheckPointSource2 101391
paloAltoPASeriesSource2 82714
paloAltoPASeriesSource7 81714
microsoftWindowsSource2 75766
microsoftWindowsSource1 73766
microsoftWindowsSource9 72170
microsoftWindowsSource301 21170
ciscoASASource2 20172
ciscoASASource18 19272

The only significant difference here is that we added an additional project to change the names and order of the columns. By default, top-hitters generates a column named approximate_count_data_source_name so renaming it to EventCount just cleans it up.