KQL for AI Apps: Querying Logs + Metrics
The query language that holds it all together. Kusto basics, the Application Insights and Container Apps tables you'll use most, and the queries that turn 'something's slow' into a fixable problem in five minutes.
Why KQL is non-negotiable
Kusto Query Language (KQL) is how you read everything Azure observes. Application Insights traces, Container Apps logs, KubeEvents, Service Bus diagnostics, AKS Container Insights, Log Analytics β they all answer to KQL.
Three rules of thumb to memorise:
- Filter early with
whereon time and identifier columns β KQL is paid by the GB scanned - Project to what you need with
projectafter filters - Summarise with
summarizefor aggregations β sum, count, percentile, average
The exam tests reading KQL β given a query, what does it tell you? It also tests writing KQL for common scenarios β p95 latency, error rates, dependency analysis.
Tables you must know
| Table | Where | Holds |
|---|---|---|
requests | Application Insights | Inbound HTTP requests handled by your app |
dependencies | Application Insights | Outbound calls (HTTP, SQL, Service Bus, Cosmos, etc.) β including span data |
traces | Application Insights | Application logs (info / warn / error) emitted via the OTel SDK |
exceptions | Application Insights | Captured exceptions with stack traces |
customMetrics | Application Insights | Metrics emitted via the OTel meter API |
ContainerAppConsoleLogs_CL | Log Analytics | Container app stdout/stderr |
ContainerAppSystemLogs_CL | Log Analytics | Container app platform events (image pulls, scale, restarts) |
ContainerLog (AKS) | Log Analytics | AKS pod stdout/stderr (newer schema: ContainerLogV2) |
KubeEvents (AKS) | Log Analytics | AKS pod events (scheduling, restarts, OOMKilled) |
AzureDiagnostics | Log Analytics | Diagnostic logs for many Azure services |
Five queries that solve real problems
1. P95 latency over time
requests
| where timestamp > ago(1h)
| where name == "POST /embed"
| summarize p50 = percentile(duration, 50),
p95 = percentile(duration, 95),
p99 = percentile(duration, 99)
by bin(timestamp, 1m)
| render timechart
A line chart of latency percentiles per minute. The first thing to look at when βthe API feels slow.β
2. Error rate per route
requests
| where timestamp > ago(1d)
| summarize total = count(), errors = countif(success == false) by name
| extend error_rate_pct = round(100.0 * errors / total, 2)
| where total > 100
| order by error_rate_pct desc
Routes ranked by error rate, with a minimum-volume gate so noisy low-traffic routes donβt dominate.
3. Dependency breakdown for slow requests
requests
| where timestamp > ago(1h) and duration > 5000 // slow ones
| project rid = id, parent = operation_Id, total_ms = duration
| join kind=inner (
dependencies | project parent = operation_Id, dep_target = target,
dep_type = type, dep_ms = duration
) on parent
| summarize total_dep_ms = sum(dep_ms), n = count() by dep_target, dep_type
| order by total_dep_ms desc
For requests over 5 seconds, where did the time go? Which downstream service ate the budget?
4. AI-specific custom dimension query
dependencies
| where timestamp > ago(1h)
| where name == "rag.generate"
| extend model = tostring(customDimensions['gen_ai.request.model'])
| extend tokens_out = toint(customDimensions['gen_ai.usage.output_tokens'])
| summarize calls = count(), total_tokens = sum(tokens_out),
p95_ms = percentile(duration, 95) by model
| order by total_tokens desc
Token usage and latency per model β pulled from custom dimensions you set on your span attributes.
5. Container Apps system events
ContainerAppSystemLogs_CL
| where TimeGenerated > ago(30m)
| where ContainerAppName_s == "roo-vision"
| where Reason_s in ("Failed", "BackOff", "Killing", "Unhealthy")
| project TimeGenerated, Reason_s, Log_s, RevisionName_s
| order by TimeGenerated desc
Anything alarming the platform reported about a specific container app β image pull failures, probe failures, throttle events.
Exam tip: 'where' before 'summarize'
KQL queries are billed by the data they scan. A query that does summarize ... by name and THEN filters with where scans the entire table. Filtering first β by time, by app name, by route β keeps cost (and latency) low.
Order: where TimeGenerated > ago(...) β other where filters β extend (computed columns) β summarize β order by β render.
The most useful operators in one place
// Filter
| where col == "value" and othercol > 100
// Pick columns
| project a, b, c
// Add computed columns without dropping
| extend duration_s = duration / 1000.0
// Aggregate
| summarize count(), sum(x), avg(x), percentile(x, 95) by groupCol, bin(timestamp, 5m)
// Join
| join kind=inner (otherTable | project key, val) on key
// Take top N by some metric
| top 10 by duration desc
// Render hint
| render timechart // or barchart, piechart, columnchart
Joins β when correlation matters
// Find the dependency call chains for the slowest 20 requests
requests
| where timestamp > ago(1h)
| top 20 by duration desc
| project oid = operation_Id, request_dur = duration, request_name = name
| join kind=inner (
dependencies | project oid = operation_Id, dep_dur = duration,
dep_target = target, dep_name = name
) on oid
operation_Id is the trace ID β all spans in a single trace share it. Thatβs how you reconstruct an end-to-end story.
Functions and parsing
// String functions
| extend route_short = substring(name, 0, 30)
| where url contains "openai"
| extend host = url_host(url)
// Numeric / time
| extend dt = todatetime(customDimensions["created_at"])
| extend tier = case(duration < 100, "fast",
duration < 1000, "ok",
"slow")
// JSON parsing
| extend parsed = parse_json(customDimensions)
| extend model = tostring(parsed.model)
case, iff, parse_json are the everyday kitchen tools.
Workbooks and dashboards
KQL queries become reusable through:
| Surface | What it does |
|---|---|
| Workbooks | Interactive parameterised reports β pick a time range, an app, an environment; the query reruns |
| Dashboards | Pinned charts on the Azure portal home |
| Alerts | Run a KQL query on a schedule; trigger if rows match (e.g., error rate > 5%) |
Most teams build a small Workbook per service that answers the standard βis everything OKβ questions in one place.
Key terms
Knowledge check
Theo asks: 'What was the p95 latency of POST /chat over the last hour, in 5-minute buckets?' Which KQL is correct?
Mira instruments LLM calls with span attribute `gen_ai.request.model`. She wants the average latency and call count grouped by model name. Which clause extracts the model name from custom dimensions?
Lin's Container App is restarting frequently. Which table holds the platform's view of why?