Domain 4 β€” Module 4 of 5 80%
26 of 27 overall
Domain 4: Secure, monitor, and troubleshoot Azure solutions Free ⏱ ~13 min read

KQL for AI Apps: Querying Logs + Metrics

The query language that holds it all together. Kusto basics, the Application Insights and Container Apps tables you'll use most, and the queries that turn 'something's slow' into a fixable problem in five minutes.

Why KQL is non-negotiable

Simple explanation

Kusto Query Language (KQL) is how you read everything Azure observes. Application Insights traces, Container Apps logs, KubeEvents, Service Bus diagnostics, AKS Container Insights, Log Analytics β€” they all answer to KQL.

Three rules of thumb to memorise:

  • Filter early with where on time and identifier columns β€” KQL is paid by the GB scanned
  • Project to what you need with project after filters
  • Summarise with summarize for aggregations β€” sum, count, percentile, average

The exam tests reading KQL β€” given a query, what does it tell you? It also tests writing KQL for common scenarios β€” p95 latency, error rates, dependency analysis.

Tables you must know

TableWhereHolds
requestsApplication InsightsInbound HTTP requests handled by your app
dependenciesApplication InsightsOutbound calls (HTTP, SQL, Service Bus, Cosmos, etc.) β€” including span data
tracesApplication InsightsApplication logs (info / warn / error) emitted via the OTel SDK
exceptionsApplication InsightsCaptured exceptions with stack traces
customMetricsApplication InsightsMetrics emitted via the OTel meter API
ContainerAppConsoleLogs_CLLog AnalyticsContainer app stdout/stderr
ContainerAppSystemLogs_CLLog AnalyticsContainer app platform events (image pulls, scale, restarts)
ContainerLog (AKS)Log AnalyticsAKS pod stdout/stderr (newer schema: ContainerLogV2)
KubeEvents (AKS)Log AnalyticsAKS pod events (scheduling, restarts, OOMKilled)
AzureDiagnosticsLog AnalyticsDiagnostic logs for many Azure services

Five queries that solve real problems

1. P95 latency over time

requests
| where timestamp > ago(1h)
| where name == "POST /embed"
| summarize p50 = percentile(duration, 50),
            p95 = percentile(duration, 95),
            p99 = percentile(duration, 99)
            by bin(timestamp, 1m)
| render timechart

A line chart of latency percentiles per minute. The first thing to look at when β€œthe API feels slow.”

2. Error rate per route

requests
| where timestamp > ago(1d)
| summarize total = count(), errors = countif(success == false) by name
| extend error_rate_pct = round(100.0 * errors / total, 2)
| where total > 100
| order by error_rate_pct desc

Routes ranked by error rate, with a minimum-volume gate so noisy low-traffic routes don’t dominate.

3. Dependency breakdown for slow requests

requests
| where timestamp > ago(1h) and duration > 5000   // slow ones
| project rid = id, parent = operation_Id, total_ms = duration
| join kind=inner (
    dependencies | project parent = operation_Id, dep_target = target,
                            dep_type = type, dep_ms = duration
) on parent
| summarize total_dep_ms = sum(dep_ms), n = count() by dep_target, dep_type
| order by total_dep_ms desc

For requests over 5 seconds, where did the time go? Which downstream service ate the budget?

4. AI-specific custom dimension query

dependencies
| where timestamp > ago(1h)
| where name == "rag.generate"
| extend model = tostring(customDimensions['gen_ai.request.model'])
| extend tokens_out = toint(customDimensions['gen_ai.usage.output_tokens'])
| summarize calls = count(), total_tokens = sum(tokens_out),
            p95_ms = percentile(duration, 95) by model
| order by total_tokens desc

Token usage and latency per model β€” pulled from custom dimensions you set on your span attributes.

5. Container Apps system events

ContainerAppSystemLogs_CL
| where TimeGenerated > ago(30m)
| where ContainerAppName_s == "roo-vision"
| where Reason_s in ("Failed", "BackOff", "Killing", "Unhealthy")
| project TimeGenerated, Reason_s, Log_s, RevisionName_s
| order by TimeGenerated desc

Anything alarming the platform reported about a specific container app β€” image pull failures, probe failures, throttle events.

Exam tip: 'where' before 'summarize'

KQL queries are billed by the data they scan. A query that does summarize ... by name and THEN filters with where scans the entire table. Filtering first β€” by time, by app name, by route β€” keeps cost (and latency) low.

Order: where TimeGenerated > ago(...) β†’ other where filters β†’ extend (computed columns) β†’ summarize β†’ order by β†’ render.

The most useful operators in one place

// Filter
| where col == "value" and othercol > 100

// Pick columns
| project a, b, c

// Add computed columns without dropping
| extend duration_s = duration / 1000.0

// Aggregate
| summarize count(), sum(x), avg(x), percentile(x, 95) by groupCol, bin(timestamp, 5m)

// Join
| join kind=inner (otherTable | project key, val) on key

// Take top N by some metric
| top 10 by duration desc

// Render hint
| render timechart        // or barchart, piechart, columnchart

Joins β€” when correlation matters

// Find the dependency call chains for the slowest 20 requests
requests
| where timestamp > ago(1h)
| top 20 by duration desc
| project oid = operation_Id, request_dur = duration, request_name = name
| join kind=inner (
    dependencies | project oid = operation_Id, dep_dur = duration,
                            dep_target = target, dep_name = name
) on oid

operation_Id is the trace ID β€” all spans in a single trace share it. That’s how you reconstruct an end-to-end story.

Functions and parsing

// String functions
| extend route_short = substring(name, 0, 30)
| where url contains "openai"
| extend host = url_host(url)

// Numeric / time
| extend dt = todatetime(customDimensions["created_at"])
| extend tier = case(duration < 100, "fast",
                     duration < 1000, "ok",
                     "slow")

// JSON parsing
| extend parsed = parse_json(customDimensions)
| extend model = tostring(parsed.model)

case, iff, parse_json are the everyday kitchen tools.

Workbooks and dashboards

KQL queries become reusable through:

SurfaceWhat it does
WorkbooksInteractive parameterised reports β€” pick a time range, an app, an environment; the query reruns
DashboardsPinned charts on the Azure portal home
AlertsRun a KQL query on a schedule; trigger if rows match (e.g., error rate > 5%)

Most teams build a small Workbook per service that answers the standard β€œis everything OK” questions in one place.

Key terms

Question

What's the operator order in a typical KQL query?

Click or press Enter to reveal answer

Answer

`where` (filter) β†’ `extend` (add computed columns) β†’ `summarize` (aggregate) β†’ `order by` β†’ `render`. Filtering first is critical β€” KQL is billed by the data scanned, and unfiltered summaries scan the whole table.

Click to flip back

Question

What does `summarize ... by bin(timestamp, 1m)` do?

Click or press Enter to reveal answer

Answer

Groups the rows into 1-minute time buckets and summarises within each bucket. Used for time series β€” counts, percentiles, sums per minute. The `bin` function rounds timestamps to the nearest interval.

Click to flip back

Question

Where do OpenTelemetry span attributes appear in Application Insights?

Click or press Enter to reveal answer

Answer

In `customDimensions` on the `dependencies` (and sometimes `requests` / `traces`) records. Pull them out in queries with `tostring(customDimensions['gen_ai.request.model'])` and similar β€” they're queryable, sortable, group-byable.

Click to flip back

Question

What's the difference between `requests` and `dependencies` in Application Insights?

Click or press Enter to reveal answer

Answer

`requests` β€” inbound calls served by your app (your HTTP API handling a POST). `dependencies` β€” outbound calls your app makes to other services (HTTP, SQL, Cosmos, OpenAI). Both show duration, success, custom dimensions.

Click to flip back

Question

What does `operation_Id` represent across Application Insights tables?

Click or press Enter to reveal answer

Answer

The trace ID of a single end-to-end operation. All `requests`, `dependencies`, `exceptions`, `traces`, and `customEvents` for one user request share the same `operation_Id`. Joining on it reconstructs the full trace.

Click to flip back

Knowledge check

Knowledge Check

Theo asks: 'What was the p95 latency of POST /chat over the last hour, in 5-minute buckets?' Which KQL is correct?

Knowledge Check

Mira instruments LLM calls with span attribute `gen_ai.request.model`. She wants the average latency and call count grouped by model name. Which clause extracts the model name from custom dimensions?

Knowledge Check

Lin's Container App is restarting frequently. Which table holds the platform's view of why?