Example: Unparsed events
This example demonstrates how to find a count of the number of a events ingested which were unable to be parsed within the last 24 hours. The value of this query is to see gain insight into how much of your data is being normalized.
In the query below, we use the following columns, on the events view. See Domain Model for more information on views.
Columns | Type | Description |
---|---|---|
event_uuid | string | A GUID assigned to each event ingested |
low_level_categories | dynamic: Array of string | The categories associated with event at parse time |
original_time | datetime | The datetime the event was generated by the reporting device |
Query
events
| project event_uuid, low_level_categories,
original_time
| where original_time > ago(24h)
| mv-expand category=low_level_categories to typeof(int)
| summarize TotalEvents=count_distinct(event_uuid),
UnparsedEvents=count_distinctif(event_uuid, category > 10000 and category < 11000)
| project TotalEvents, UnparsedEvents,
PctUnparsed=(UnparsedEvents/TotalEvents)*100
Results
The results show there is 8.25% of events ingested in the last 24h were unparsed.
TotalEvents | UnparsedEvents | PctUnparsed |
---|---|---|
31125023 | 2568919 | 8.25 |
Query explanation
The beginning of the query is pretty straight forward. We project the required columns and perform any convenience conversions to make working with columns easier.
| project event_uuid, low_level_categories, original_time
Next, ensure the where clause(s) reflects exactly what you are looking for to avoid unnecessary cost. In this case we are looking for events in the last 24 hours.
| where original_time > ago(24h)
Next, we need to be able to perform range comparisons on the values of the low_level_categories array. If any of the array elements are between the range of 10000 and 11000, they are considered to be unparsed. Before we can do this, we need to unroll the array and cast the type from string to int. This is done with mv-expand.
| mv-expand category=low_level_categories to typeof(int)
Once the array is unrolled and the individual low_level_categories are now available as a column, we can now simply summarize the rows using the count_distinctif() function on the event_uuid, specifying a predicate expressing the appropriate range.
At the same time, we will use count_distinct() to count the TotalEvents so we can later calculate the percentage of UnparsedEvents.
| summarize TotalEvents=count_distinct(event_uuid),
UnparsedEvents=count_distinctif(event_uuid, category > 10000 and category < 11000)
In the final step, we are restating the projection and including a new calculated column PctUnparsed which will calculate the percentage of unparsed events overall.
| project TotalEvents, UnparsedEvents,
PctUnparsed=(UnparsedEvents/TotalEvents)*100
Understanding mv-expand
mv-expand can be a little confusing. For each element of the input array, the original row is replicated with the addition of the new column containing exactly one array element.
Without mv-expand
To demonstrate using the example above, if we query a single record without mv-expand.
events
| project event_uuid, low_level_categories,
original_time
| where original_time > ago(24h)
| take 1
Results
Notice there is a single row, and low_level_categories contains an array of three elements
event_uuid | low_level_categories | original_time |
---|---|---|
0e851e58-db76-4c84-b64c-ccc58ce84e4b | [8055,8001,9033] | 2023-01-31 18:13:37.883 |
With mv-expand
Furthering the example, returning mv-expand to the end of the query to show how applying it transforms a single row into three.
events
| project event_uuid, low_level_categories,
original_time
| where original_time > ago(24h)
| take 1
| mv-expand category=low_level_categories to typeof(int)
Results
Notice there are three rows and each are identical except for the category column. There is a single row for each and every element in the low_level_categories array.
event_uuid | low_level_categories | original_time | category |
---|---|---|---|
0e851e58-db76-4c84-b64c-ccc58ce84e4b | [8055,8001,9033] | 2023-01-31 18:13:37.883 | 8055 |
0e851e58-db76-4c84-b64c-ccc58ce84e4b | [8055,8001,9033] | 2023-01-31 18:13:37.883 | 8001 |
0e851e58-db76-4c84-b64c-ccc58ce84e4b | [8055,8001,9033] | 2023-01-31 18:13:37.883 | 9033 |