GitHubContribute in GitHub: Edit online

Example: Unparsed events

This example demonstrates how to find a count of the number of a events ingested which were unable to be parsed within the last 24 hours. The value of this query is to see gain insight into how much of your data is being normalized.

In the query below, we use the following columns, on the events view. See Domain Model for more information on views.

Columns Type Description
event_uuid string A GUID assigned to each event ingested
low_level_categories dynamic: Array of string The categories associated with event at parse time
original_time datetime The datetime the event was generated by the reporting device

Query

events
    | project event_uuid, low_level_categories, 
        original_time
    | where original_time > ago(24h)
    | mv-expand category=low_level_categories to typeof(int)
    | summarize TotalEvents=count_distinct(event_uuid), 
        UnparsedEvents=count_distinctif(event_uuid, category > 10000 and category < 11000)
    | project TotalEvents, UnparsedEvents, 
        PctUnparsed=(UnparsedEvents/TotalEvents)*100

Results

The results show there is 8.25% of events ingested in the last 24h were unparsed.

TotalEvents UnparsedEvents PctUnparsed
31125023 2568919 8.25

Query explanation

The beginning of the query is pretty straight forward. We project the required columns and perform any convenience conversions to make working with columns easier.

| project event_uuid, low_level_categories, original_time

Next, ensure the where clause(s) reflects exactly what you are looking for to avoid unnecessary cost. In this case we are looking for events in the last 24 hours.

| where original_time > ago(24h)

Next, we need to be able to perform range comparisons on the values of the low_level_categories array. If any of the array elements are between the range of 10000 and 11000, they are considered to be unparsed. Before we can do this, we need to unroll the array and cast the type from string to int. This is done with mv-expand.

| mv-expand category=low_level_categories to typeof(int)

Once the array is unrolled and the individual low_level_categories are now available as a column, we can now simply summarize the rows using the count_distinctif() function on the event_uuid, specifying a predicate expressing the appropriate range.

At the same time, we will use count_distinct() to count the TotalEvents so we can later calculate the percentage of UnparsedEvents.

| summarize TotalEvents=count_distinct(event_uuid), 
    UnparsedEvents=count_distinctif(event_uuid, category > 10000 and category < 11000)
        

In the final step, we are restating the projection and including a new calculated column PctUnparsed which will calculate the percentage of unparsed events overall.

| project TotalEvents, UnparsedEvents, 
    PctUnparsed=(UnparsedEvents/TotalEvents)*100        


Understanding mv-expand

mv-expand can be a little confusing. For each element of the input array, the original row is replicated with the addition of the new column containing exactly one array element.

Without mv-expand

To demonstrate using the example above, if we query a single record without mv-expand.

events
  | project event_uuid, low_level_categories,
      original_time
  | where original_time > ago(24h)
  | take 1

Results

Notice there is a single row, and low_level_categories contains an array of three elements

event_uuid low_level_categories original_time
0e851e58-db76-4c84-b64c-ccc58ce84e4b [8055,8001,9033] 2023-01-31 18:13:37.883

With mv-expand

Furthering the example, returning mv-expand to the end of the query to show how applying it transforms a single row into three.

events
  | project event_uuid, low_level_categories,
      original_time
  | where original_time > ago(24h)
  | take 1
  | mv-expand category=low_level_categories to typeof(int)

Results

Notice there are three rows and each are identical except for the category column. There is a single row for each and every element in the low_level_categories array.

event_uuid low_level_categories original_time category
0e851e58-db76-4c84-b64c-ccc58ce84e4b [8055,8001,9033] 2023-01-31 18:13:37.883 8055
0e851e58-db76-4c84-b64c-ccc58ce84e4b [8055,8001,9033] 2023-01-31 18:13:37.883 8001
0e851e58-db76-4c84-b64c-ccc58ce84e4b [8055,8001,9033] 2023-01-31 18:13:37.883 9033