KEDA: Event-Driven Scaling in Container Apps
How KEDA scales your container app from 0 to many based on real events β Service Bus queue depth, Event Hubs lag, custom Prometheus metrics. The exact YAML the exam expects you to read.
What KEDA does
KEDA is the autoscaler that watches things outside your container and decides how many replicas to run. A traditional autoscaler watches CPU. KEDA watches anything: a Service Bus queue, an Event Hubs partition lag, a Cosmos DB change feed, a Kafka topic, even a custom Prometheus metric.
For AI inference, this is exactly what you need. Inference cost β replicas Γ time. If you can scale down to zero when nothingβs queued and out to dozens when work piles up, you pay only for the work that actually happened.
In Container Apps, KEDA is the autoscaler. You donβt install it; you just write scale rules. The exam tests whether you can read a scale rule and predict what will happen.
The mental model
Event source KEDA scaler Container App
(Service Bus queue) ββββΆ Service Bus scaler ββββΆ scale 0βN replicas
(HTTP requests) ββββΆ HTTP scaler ββββΆ scale 0βN replicas
(Cosmos change feed)ββββΆ Cosmos DB scaler ββββΆ scale 0βN replicas
(Custom Prom metric)ββββΆ Prometheus scaler ββββΆ scale 0βN replicas
Each scaler has a threshold (messages-per-replica, requests-per-second, etc.). KEDA computes:
desiredReplicas = ceil(metricValue / threshold)
Constrained by your --min-replicas and --max-replicas.
Configuring scale rules in Container Apps
# Service Bus queue depth scaler
properties:
template:
scale:
minReplicas: 0
maxReplicas: 50
rules:
- name: queue-scale
custom:
type: azure-servicebus
metadata:
queueName: warehouse-images
messageCount: "20" # Each replica handles 20 messages
namespace: roo-prod
identity: system
That rule says: βlook at the warehouse-images queue. For every 20 messages waiting, run one replica. Maximum 50.β
If the queue jumps to 600 messages, KEDA targets 30 replicas. When the queue drains and stays empty for the cool-down (default 5 minutes), replicas scale down β eventually to zero, since minReplicas: 0.
Exam tip: messageCount is per-replica, not total
Reading scale rules in exam questions, this is the most common gotcha. messageCount is the per-replica target, not a global trigger. So messageCount: 20 with 600 messages β 30 replicas, NOT one replica that processes 20 messages.
If the question gives you a queue depth and a messageCount, divide and round up. Thatβs the target replica count (capped by maxReplicas).
Common scale-rule types
| Feature | HTTP | Service Bus queue | Event Hubs | Custom (Prometheus, Redis, ...) |
|---|---|---|---|---|
| Scaler type | `http` | `azure-servicebus` | `azure-eventhub` | `prometheus`, `redis`, etc. |
| Key metadata | `concurrentRequests` | `queueName`, `messageCount`, `namespace` | `eventHubName`, `unprocessedEventThreshold`, `consumerGroup` | Scaler-specific (query, threshold) |
| Auth | None β Container Apps wires it up | Managed identity (preferred), connection string secret | Managed identity, connection string | Per-scaler β often managed identity or secret |
| Best for | Web APIs, public endpoints | Background workers, AI inference behind a queue | Telemetry pipelines, real-time AI on streams | Anything else KEDA supports |
HTTP scale rule
rules:
- name: http-scale
http:
metadata:
concurrentRequests: "100" # 1 replica per 100 concurrent in-flight requests
concurrentRequests here is also per-replica. With 850 concurrent requests, ceil(850/100) = 9 replicas.
Service Bus queue with managed identity (recommended)
rules:
- name: queue-scale
custom:
type: azure-servicebus
metadata:
queueName: image-jobs
messageCount: "10"
namespace: roo-sb
identity: <user-assigned-identity-resource-id>
The identity must have Azure Service Bus Data Receiver (or higher) on the namespace.
Cosmos DB change feed
rules:
- name: changefeed-scale
custom:
type: azure-cosmosdb
metadata:
connectionFromEnv: COSMOS_CONNECTION_STRING
databaseName: orders
containerName: events
leaseContainerName: leases
eventCount: "100"
Useful for AI workflows that react to data changes β every order update triggers an embedding refresh, an enrichment, or a notification.
Scale-rule constraints in Container Apps
Container Apps imposes some sensible defaults you should know:
| Constraint | Default | Notes |
|---|---|---|
| Min replicas | 0 | Set higher to avoid cold-start latency |
| Max replicas | 10 | Increase as needed; Consumption profile supports up to 1,000 |
| Polling interval | 30 s | KEDA polls the scaler every 30 s by default |
| Cool-down period | 300 s (5 min) | How long below threshold before scaling down |
| Number of rules | Up to 10 per app | KEDA picks the highest desired replica count from any rule |
When multiple scale rules apply, KEDA evaluates each and runs the maximum desired replicas across them. This is βscale up if any source needs us; scale down only when all sources agree.β
When KEDA isnβt enough β KEDA + Dapr
For a few patterns, KEDA alone undercounts the work. Container Apps integrates Dapr for these cases:
| Scenario | Without Dapr | With Dapr |
|---|---|---|
| Pub/sub on Service Bus where Dapr handles the subscription | KEDA canβt see Daprβs internal queue | Dapr exposes its queue depth to KEDA via the dapr-pubsub-keda scaler pattern |
| State store backed by Cosmos DB | KEDA doesnβt know which container Dapr writes to | Daprβs pluggable component model surfaces the right metric |
You wonβt write raw Dapr code on AI-200, but recognise that Dapr is part of the platform and integrates with KEDA.
Authentication β managed identity beats connection strings
Two ways to give a scaler access to its source:
- Managed identity (recommended): assign a system- or user-assigned identity to the container app, grant it the right Azure RBAC role on Service Bus / Event Hubs / Cosmos. Reference the identity in the scale-rule
identityfield. - Connection string in a secret: store the connection string in a Container Apps secret, reference it via
connectionFromEnv. Works, but you own the secret rotation.
# Grant the user-assigned identity Service Bus Data Receiver
az role assignment create \
--assignee $UAI_PRINCIPAL_ID \
--role "Azure Service Bus Data Receiver" \
--scope $(az servicebus namespace show -n roo-sb -g roo-prod --query id -o tsv)
Real-world example: Priya's loyalty event pipeline
BeanCraft Coffeeβs loyalty event pipeline is one of Priyaβs wins. Customer transactions flow into a Service Bus queue (tx-events). A Container App enriches each transaction with AI-generated personalisation tags and writes the result to Cosmos DB.
- Quiet hours (overnight): queue is empty, container app sits at 0 replicas, cost is $0.
- Morning rush (7-9 am): queue jumps to 12,000 messages, KEDA scales to 50 replicas, drains in 90 seconds.
- Mid-morning: queue settles at ~50 messages/min, KEDA holds 2-3 replicas.
- Lunch rush: another 8,000-message spike, scale up again.
The whole pipeline costs Priya about ten dollars a day. The same workload on always-on App Service plans would be over $200 a day.
Key terms
Knowledge check
Mira's KEDA rule reads `messageCount: 25, queueName: image-jobs, maxReplicas: 40`. The queue spikes to 1500 messages. How many replicas will Container Apps target?
Theo's container app has both an HTTP scale rule (target 4 replicas right now) and a Service Bus scale rule (target 12 replicas right now). The app's `maxReplicas` is 30. How many replicas does Container Apps run?
Lin wants a Container App to scale based on requests per second on a custom Prometheus metric exposed by the app itself. Which scale-rule type does Lin need?