Kusto Query Language (KQL) overview
Kusto Query Language is a powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical modeling, and more.
What is a Kusto query?
A Kusto query is a read-only request to process data and return results. The request is stated in plain text, using a data-flow model that is easy to read, author, and automate. Kusto queries are made of one or more query statements.
What is a query statement?
There are two kinds of user Query statements
- A tabular expression statement
- A [let statement]- Currently not supported, please see Support Difference.
The most common kind of query statement is a tabular expression statement, which means both its input and output consist of tables or tabular datasets. Tabular statements contain zero or more operators, each
of which starts with a tabular input and returns a tabular output. Operators are sequenced by a
| (pipe). Data flows, or is piped, from one operator to the next. The data is filtered or manipulated at each step and then fed into
the following step.
It's like a funnel, where you start out with an entire data table. Each time the data passes through another operator, it is filtered, rearranged, or summarized. Because the piping of information from one operator to another is sequential, the query operator order is important and can affect both results and performance. At the end of the funnel, you're left with a refined output.
Let's look at an example query. Please note, KQL is case-sensitive for everything – table names, table column names, operators, functions, and so on.
events_all | project original_time, user_id | where original_time > ago(5m) and user_id contains "user" | count
This query has a single tabular expression statement. The statement begins with a reference to a table called
events_all. This simple statement really means, go fetch all data for all time, unconstrained. Although very
helpful, it is very costly so it must be constrained with additional operators at all times.
The second statement is to apply the project operator to specify only the columns we are interested in. In the case of the
events_all, there are 400+ columns and we only need 2 to
complete this query so including the others would incur a huge cost with no benefit at all. Always include a
| project original_time, user_id
The third statement applies the where to constrain the data in the projection by specifying a list of predicates. Below, we are indicating we only need records which were generated by data sources in the last five minutes AND the user_id contains the characters "user".
| where original_time > ago(5m) and user_id contains "user"
This query can also be written to break the logical
and statement across two lines. This is useful for clarity, and plays to the strength of KQL's highly iterative nature. One consideration here when leveraging multiple
where clauses, it is an implied
and statement. These two examples of
where are identical.
| where original_time > ago(5m) | where user_id contains "user"
The final statement applies the count. This takes all the matching records from the where clause and performs a simple aggregation. The output is a table with a single long value.