How you can unlock true insights into your data using KQL and Azure Data Explorer

    By Adrian Harris, Platform Engineer

    Adrian Harris

    Getting a good handle on your data, logs, and metrics are an essential component of any good digital service. Through all the data that we have at our fingertips, using a handful of tools and services from Microsoft, we can unlock valuable and actionable business insights. In this blog, we’ll go through how we leveraged the power of Azure Data Explorer and Kusto Query Language and how you could also unlock key business value in your digital enterprise from your data.

    Putting Together a Hook and a Line

    Let’s start with the tools. Azure Data Explorer (ADX) is an exploratory analytics platform used for insights, spotting trends, and forecasting in a query and visualisation of insights from multiple data sources. Think of it as being a platform that specialises in collating vast amounts of data in one location that makes it easy to analyse high volumes of it in near real time. Azure Data Explorer capabilities are further extended by many a service you may already be aware of when working within Azure, such as Azure Monitor Logs, Application Insights, Time Series Insights, and Microsoft Defender for Endpoint. This makes an excellent starting point for us to collate all our data together in one unified place.

    ADX utilises a highly scalable cluster and is comprised of a collection of databases and tables, the image below illustrates how ADX can be used to ingest data from across an environment with a broad collection of security services, and possible use cases to visualise the insights that could be abstracted from the data sets.

    unlock true insights image2

    Now we have identified all our data sources, and plugged them in to our new ADX solution, we’ll move on to Kusto Query Language (KQL).

    Developed initially by Microsoft, and since made open source, KQL is a powerful language baked in to ADX by default. It explores data and discovers patterns, identifies anomalies, creates statistical modelling, and others. The query uses schema entities that are organised into databases, tables, and columns.

    If you are familiar with SQL, you will have no issues using it. Even those who are not necessarily familiar with the language will understand very quickly as it is so easy to get started with an approachable syntax. If you've had experience with Graylag, Splunk, elastic, and various other visualisation tools such as Grafana, this provides all the power in one easy-to-use interface allowing you to deep dive into the data without the need to traverse different interfaces.

    Many businesses across many sectors have one or more of the above systems for log aggregation, so to briefly put the above into perspective, let’s look at some examples of how this could be used to unlock the potential of your own data.

    Users wants to understand why, at certain times of the day, they are unable to perform actions on their website that would otherwise take a couple of seconds.

    To triage this issue, a typical workflow would be:

    unlock true insights image1

     

    Based on the scenario, we found that the process of investigating this issue used a variety of data points from across the platform pooled into a central log source. The information we had used data sources from services such as Azure Front Door, APIM, and Application logs which are all siloed into their own respective log locations. It was identified that all these data sources could all be ingested into ADX which enabled us to visually see the journey from start to finish to work out the problem. Leveraging the power of KQL and ADX together, we could start to get insights into all of our data sets.

    We had also noticed that over the previous three days, the platform itself had been quite noisy (HTTP 429 on certain APIs and functions scaling out more than normal), therefore we’re going to need to look further back to find out any trends in the data we can utilise for the investigation.

    Let’s explore an example of how this could be achieved once all the data is plugged in to the ADX platform and how KQL could be used to extract insights into what happened.

    Structurally, a Kusto query has one or more query statements and returns data in a tabular or graph format. The most common kind of query statement is a tabular expression statement. Both its input and its output consist of tables or tabular datasets. The first query we will write has a single tabular expression statement. The statement begins with a reference to the table ‘AppRequests’ and contains the operator ‘where’ to filter via ‘timestamp’:

    AppRequests

    | where timestamp >= ago(3d)

    Each operator is separated by a pipe, so we can start to further extend and filter our query. In our case, we will restrict the query to a specific web page which in this case validates some details.

    | where Name like "<webPage event name>"

    We can then aggregate the output using the ‘summarise’ operator, which counts the quantity of requests within a per-minute period, per IP address over a three-day period.

    | summarise count() by bin(TimeGenerated, 1m), ClientIP

    This is a great start, but the query pulled back too many requests to make sense of the event. So, we will now need to look at a left join it to remove the ones we don't need.

    Let’s extend on this to further dig into our data. To do this, we need to join the searches that looks for the same IP addresses but have appeared more than once, therefore, to do that we need to use a ‘leftsemi’ join. This returns all the records from the left side that have matches from the right. So, when using the join statement, I can then remove any IP addresses that are small in volumes in a short period of time.

    Let’s wrap that up, this time using a larger window to count the amount of times the IP has appeared within one day, and add a count operator to count how many times that IP has appeared in that list, which will further aggregate those IPs who have firstly, appeared more than 200 times in the day (request count), and also have appeared more than two different times that three-day period.

    | join kind=leftsemi(

    AppRequests

    | where timestamp >= ago(3d)

          | where Name like "<webPage event name>"

          | summarise count() by bin(TimeGenerated, 1d), ClientIP

          | where count_ > 200

       | summarise count(ClientIP) by ClientIP

       | where count_ClientIP >= 2

       )

         on ClientIP

    Perfect, KQL saved the day, we have our IP we have been looking for. Let’s go one further and render the above query in a form that allows us to visualise it by appending the below:

    | render columnchart

    unlock true insights graph

    It’s worth noting here that we always have many people accessing this site therefore we do aim for a fair usage policy to provide a good user experience.

    Now that we have an idea of specific IPs that show excessive and frequent usage, we were able to further isolate these with session data which allowed us to a variety of remediations, such as:

    • Improve our WAF rules
    • Find accounts used by bots which allowed us to understand the market usage for such tools, leading to further improvements with our applications.

    This is one of many examples of how the power of ADX and KQL has a had a big positive impact on our clients. Using these tools, we've been able to introduce much greater visibility and flexibility when gathering insights into useful data that businesses already have, leading to a substantial improvement in security and observability.

    Site Reliability embodies a unique set of principles and practices that incorporates key proven engineering practices, and uses these in the world of infrastructure and operations. Teams can utilise the power of KQL to build rich monitoring and alerting solutions, using the language to leverage more advanced insights into their data, leading to more rapid triage and remediation times. This is not limited to technical teams however, another use could be a product owner who wishes to visualise how their service is being used, its efficiency, and in the way it was originally designed. All accessible using a single powerful data exploratory analytics platform.

    Interested to learn more?

    Find out more about our Technology and Engineering service here, or get in touch.

    recruitment lead magnet 3