Domain 4 β€” Module 5 of 5 100%
27 of 27 overall
Domain 4: Secure, monitor, and troubleshoot Azure solutions Free ⏱ ~11 min read

End-to-End Observability: Putting It All Together

Stitching Key Vault, App Configuration, OpenTelemetry, and KQL into a production observability story. The trace, the dashboard, the alert, the runbook, the post-incident review β€” and Mira's pager.

What β€œproduction observability” actually means

Simple explanation

Observability is the property of a system that lets you ask any question of it without changing the code. For an AI back-end, that means answering: β€œis it up?”, β€œis it fast?”, β€œis it accurate?”, β€œwhere did the cost come from?”, β€œdid it leak a secret?”, β€œwhat did this user see?”

The four Azure pieces fit together like this:

  • Key Vault β€” secrets out of code, audited every time they’re read
  • App Configuration β€” feature flags + environment-specific config + audit of changes
  • OpenTelemetry β€” traces, logs, metrics flowing to Application Insights
  • KQL β€” the query language that ties it all back together

Golden signals for AI back-ends

The classic four golden signals β€” latency, traffic, errors, saturation β€” translate cleanly. Add a fifth for AI:

SignalWhatWhere
Latencyp50/p95/p99 of request duration per routerequests
Trafficrequests/sec per routerequests | summarize count() by bin(...)
Errorssuccess=false rate per routerequests | countif(success == false)
Saturationreplica count, CPU%, memory%Container Insights, Container Apps system logs
AI qualitytoken cost, retrieval quality, refusal rate, hallucination markersCustom metrics + spans

A reasonable starter dashboard renders these five plus a few drill-downs.

SLOs β€” what counts as β€œbroken”

A service-level objective is a numeric target that defines the contract:

- 99.5% of POST /chat requests complete in under 2 seconds (rolling 30 days)
- Error rate < 1% rolling 1 hour
- Token-cost per request < $0.05 p95 rolling 24 hours

Translate each SLO to a KQL query. Translate each query to an alert. The alert + the runbook = an actionable response.

// Error-rate SLO (rolling 1 hour, alert if > 1%)
requests
| where timestamp > ago(1h)
| where name startswith "POST /chat"
| summarize errors = countif(success == false), total = count()
| extend error_rate = todouble(errors) / total
| where error_rate > 0.01

Schedule that as an Azure Monitor alert; route to PagerDuty/Teams/email; on fire, the on-call follows the runbook (next section).

Alerts β€” the operational primitive

Azure Monitor alerts run a query on a schedule. Three flavours:

TypeWhat it does
Metric alertThreshold on an Azure-native metric (CPU, requests/sec)
Log alertRun KQL on a schedule; alert if rows match (or count exceeds threshold)
Activity log alertTrigger on Azure resource events (resource deleted, role assignment changed)

For AI back-ends, log alerts on KQL are the workhorse β€” they cover everything you instrumented, including custom AI metrics.

az monitor scheduled-query create \
  --name "chat-error-rate" \
  --resource-group roo-prod \
  --action-group $AG_RESOURCE_ID \
  --condition "count 'rows' > 0" \
  --condition-query 'requests | where timestamp > ago(15m) | where name startswith "POST /chat" | summarize errors = countif(success == false), total = count() | where todouble(errors) / total > 0.01' \
  --description "Chat error rate > 1% rolling 15 min" \
  --evaluation-frequency 5m \
  --window-size 15m \
  --severity 2 \
  --scopes $APP_INSIGHTS_RESOURCE_ID

Runbooks β€” what to do when an alert fires

A runbook is the human-readable answer to β€œthe alert just paged me at 2am β€” now what?” Three sections:

  1. Triage β€” how to tell what’s actually wrong (specific KQL queries, App Insights links, dashboard URLs)
  2. Mitigate β€” actions that buy time (scale up, kill the bad feature flag, swap deployment slot back, redirect traffic)
  3. Investigate β€” once mitigated, the deeper diagnosis (full trace inspection, logs, customer impact assessment)
Real-world example: Mira's chat-error-rate runbook

Alert: chat error rate > 1% rolling 15 min

Triage queries:

requests | where timestamp > ago(15m) and name startswith "POST /chat"
         | summarize errors = countif(success==false), total = count() by bin(timestamp, 1m)

exceptions | where timestamp > ago(15m) | summarize count() by problemId | top 5

Mitigate (in this order):

  1. Set feature flag EnableNewRagPipeline to 0% (App Configuration β†’ save β†’ take effect within 30 s)
  2. If still failing: swap deployment slots (production ↔ staging) β€” instant rollback
  3. If Service Bus queue is backed up: scale max replicas up

Investigate:

  • Check App Insights end-to-end transactions for the slowest 10 requests
  • Search exceptions for new error types in the past hour
  • Check Cosmos / Postgres metrics β€” RU exhaustion or connection storms?

A worked example β€” Mira’s morning

08:00 β€” alert fires: error rate on POST /chat is 4% (normal: 0.2%).

exceptions | where timestamp > ago(15m) | summarize count() by problemId | top 5

Top exception: httpx.TimeoutException at openai_client.embed. The embedding API is slow. Mira looks at:

dependencies
| where timestamp > ago(15m) and name == "POST /openai/embeddings"
| summarize p95 = percentile(duration, 95), errors = countif(success==false) by bin(timestamp, 1m)
| render timechart

p95 latency on embeddings has gone from 300 ms to 8 s. Azure OpenAI is throttling.

Mitigate: App Configuration feature flag EmbeddingFallback from false to true β€” the worker now uses the cached embedding for cache hits, and skips re-embedding edits if the recent embedding is less than 24 hours old.

10 minutes later, error rate is back to 0.2%. Mira files a ticket with Azure OpenAI to raise the quota, schedules a retro for tomorrow, and goes back to her coffee. The whole flight is in App Insights with full trace evidence; the rollback was a single config change.

Tying the four pieces together

ServiceWhat it covered in Mira’s morning
Key VaultHeld the OpenAI key; nothing exposed in the alert workflow; rotation possible without downtime
App ConfigurationThe EmbeddingFallback feature flag was the mitigation β€” runtime change, no redeploy
OpenTelemetryTraces and dependency records made the breakdown visible
KQLThe investigation queries that pinned the problem in 90 seconds

This is the integrated production story AI-200 expects you to internalise.

Common AI-specific alerts to ship from day one

AlertQuery shape
Error rate spikerequests | where timestamp > ago(15m) | countif(success==false) / count() > 0.01
Cosmos throttlingdependencies | where target endswith "documents.azure.com" | countif(resultCode == 429)
OpenAI throttlingdependencies | where target endswith "openai.azure.com" | countif(resultCode == 429)
Probe failures (Container Apps)ContainerAppSystemLogs_CL | where Reason_s == "Unhealthy"
Replica restartsContainerAppSystemLogs_CL | where Reason_s in ("Killing", "BackOff")
Token cost runawaydependencies | where name == "rag.generate" | summarize sum(toint(customDimensions['gen_ai.usage.input_tokens']))

Final exam framing β€” the next steps

You’ve now covered every domain on AI-200:

  • Domain 1 β€” Containers (ACR, App Service containers, Container Apps + KEDA, AKS, troubleshooting)
  • Domain 2 β€” Data services (Cosmos NoSQL incl vectors + change feed, PostgreSQL + pgvector, Managed Redis)
  • Domain 3 β€” Connect (Service Bus, Event Grid, Functions, choosing between them)
  • Domain 4 β€” Secure / monitor / troubleshoot (Key Vault, App Configuration, OpenTelemetry, KQL, end-to-end observability)

The exam pattern: every question maps a real-world scenario (often featuring developers like Mira, Theo, Priya, or Lin) to the right Azure primitive at the right scope with the right configuration. There’s a short list of β€œcommon pivot points” β€” managed identity vs passwords, scale-to-zero vs always-warm, single-partition vs cross-partition, push vs pull, secret in Key Vault vs config in App Configuration β€” and most questions test which side of the pivot fits.

Trust your intuition. The AI-200 exam rewards the simplest service that fits β€” over-engineered answers are usually wrong, even when they technically work.

Good luck.

Key terms

Question

What are the four golden signals for a back-end service?

Click or press Enter to reveal answer

Answer

Latency, traffic, errors, saturation. For AI back-ends add a fifth β€” AI quality (token cost, retrieval quality, hallucination markers, refusal rate). Each signal maps to an SLO, a KQL query, and an alert.

Click to flip back

Question

What is an SLO?

Click or press Enter to reveal answer

Answer

Service-Level Objective β€” a numeric target for a service-level indicator over a time window. Example: '99.5% of POST /chat requests complete in under 2 seconds, rolling 30 days'. Translate each SLO into a KQL query, then into an alert.

Click to flip back

Question

What's the difference between a metric alert and a log alert in Azure Monitor?

Click or press Enter to reveal answer

Answer

Metric alerts fire on Azure-native metrics with thresholds (CPU, requests/sec). Log alerts run a KQL query on a schedule and fire if the result matches a condition. For instrumented application data β€” including OpenTelemetry custom dimensions β€” log alerts are the workhorse.

Click to flip back

Question

What's the operational role of an App Configuration feature flag in incident response?

Click or press Enter to reveal answer

Answer

A feature flag is a runtime kill switch β€” flip it to 0% to disable a misbehaving feature without a redeploy. Combined with a clear runbook ('flip flag X if the alert fires'), feature flags become the fastest mitigation path for many AI-back-end incidents.

Click to flip back

Question

Why is the simplest service that fits usually the right answer on AI-200?

Click or press Enter to reveal answer

Answer

Microsoft has overlapping services and the exam rewards 'right-sized' choices. AKS works for a single inference endpoint but Container Apps fits better; pgvector works for everything but Managed Redis is faster for the hot tier; Service Bus works for telemetry but Event Hubs is purpose-built. Over-engineering is almost always the wrong answer.

Click to flip back

Knowledge check

Knowledge Check

Mira's pager goes off: chat error rate spiked to 4% from 0.2%. What's the first thing she should run?

Knowledge Check

Theo's runbook calls for a feature-flag mitigation. The flag lives in App Configuration. He needs the change to take effect across 12 Container App replicas within 30 seconds. Which mechanism makes that timing realistic?

Knowledge Check

Lin's team wants automated monitoring for Cosmos throttling. Which approach combines OTel data with Azure Monitor alerts most cleanly?