KEDA: Event-Driven Scaling in Container Apps

What KEDA does

Simple explanation

KEDA is the autoscaler that watches things outside your container and decides how many replicas to run. A traditional autoscaler watches CPU. KEDA watches anything: a Service Bus queue, an Event Hubs partition lag, a Cosmos DB change feed, a Kafka topic, even a custom Prometheus metric.

For AI inference, this is exactly what you need. Inference cost ≈ replicas × time. If you can scale down to zero when nothing’s queued and out to dozens when work piles up, you pay only for the work that actually happened.

In Container Apps, KEDA is the autoscaler. You don’t install it; you just write scale rules. The exam tests whether you can read a scale rule and predict what will happen.

The mental model

Event source             KEDA scaler                  Container App
(Service Bus queue) ───▶ Service Bus scaler ───▶ scale 0→N replicas
(HTTP requests)     ───▶ HTTP scaler        ───▶ scale 0→N replicas
(Cosmos change feed)───▶ Cosmos DB scaler   ───▶ scale 0→N replicas
(Custom Prom metric)───▶ Prometheus scaler  ───▶ scale 0→N replicas

Each scaler has a threshold (messages-per-replica, requests-per-second, etc.). KEDA computes:

desiredReplicas = ceil(metricValue / threshold)

Constrained by your --min-replicas and --max-replicas.

Configuring scale rules in Container Apps

# Service Bus queue depth scaler
properties:
  template:
    scale:
      minReplicas: 0
      maxReplicas: 50
      rules:
        - name: queue-scale
          custom:
            type: azure-servicebus
            metadata:
              queueName: warehouse-images
              messageCount: "20"     # Each replica handles 20 messages
              namespace: roo-prod
            identity: system

That rule says: “look at the warehouse-images queue. For every 20 messages waiting, run one replica. Maximum 50.”

If the queue jumps to 600 messages, KEDA targets 30 replicas. When the queue drains and stays empty for the cool-down (default 5 minutes), replicas scale down — eventually to zero, since minReplicas: 0.

Exam tip: messageCount is per-replica, not total

Reading scale rules in exam questions, this is the most common gotcha. messageCount is the per-replica target, not a global trigger. So messageCount: 20 with 600 messages → 30 replicas, NOT one replica that processes 20 messages.

If the question gives you a queue depth and a messageCount, divide and round up. That’s the target replica count (capped by maxReplicas).

Common scale-rule types

The four scale-rule families you'll see most on AI-200.
Feature	HTTP	Service Bus queue	Event Hubs	Custom (Prometheus, Redis, ...)
Scaler type	`http`	`azure-servicebus`	`azure-eventhub`	`prometheus`, `redis`, etc.
Key metadata	`concurrentRequests`	`queueName`, `messageCount`, `namespace`	`eventHubName`, `unprocessedEventThreshold`, `consumerGroup`	Scaler-specific (query, threshold)
Auth	None — Container Apps wires it up	Managed identity (preferred), connection string secret	Managed identity, connection string	Per-scaler — often managed identity or secret
Best for	Web APIs, public endpoints	Background workers, AI inference behind a queue	Telemetry pipelines, real-time AI on streams	Anything else KEDA supports

HTTP scale rule

rules:
  - name: http-scale
    http:
      metadata:
        concurrentRequests: "100"   # 1 replica per 100 concurrent in-flight requests

concurrentRequests here is also per-replica. With 850 concurrent requests, ceil(850/100) = 9 replicas.

Service Bus queue with managed identity (recommended)

rules:
  - name: queue-scale
    custom:
      type: azure-servicebus
      metadata:
        queueName: image-jobs
        messageCount: "10"
        namespace: roo-sb
      identity: <user-assigned-identity-resource-id>

The identity must have Azure Service Bus Data Receiver (or higher) on the namespace.

Cosmos DB change feed

rules:
  - name: changefeed-scale
    custom:
      type: azure-cosmosdb
      metadata:
        connectionFromEnv: COSMOS_CONNECTION_STRING
        databaseName: orders
        containerName: events
        leaseContainerName: leases
        eventCount: "100"

Useful for AI workflows that react to data changes — every order update triggers an embedding refresh, an enrichment, or a notification.

Scale-rule constraints in Container Apps

Container Apps imposes some sensible defaults you should know:

Constraint	Default	Notes
Min replicas	0	Set higher to avoid cold-start latency
Max replicas	10	Increase as needed; Consumption profile supports up to 1,000
Polling interval	30 s	KEDA polls the scaler every 30 s by default
Cool-down period	300 s (5 min)	How long below threshold before scaling down
Number of rules	Up to 10 per app	KEDA picks the highest desired replica count from any rule

When multiple scale rules apply, KEDA evaluates each and runs the maximum desired replicas across them. This is “scale up if any source needs us; scale down only when all sources agree.”

When KEDA isn’t enough — KEDA + Dapr

For a few patterns, KEDA alone undercounts the work. Container Apps integrates Dapr for these cases:

Scenario	Without Dapr	With Dapr
Pub/sub on Service Bus where Dapr handles the subscription	KEDA can’t see Dapr’s internal queue	Dapr exposes its queue depth to KEDA via the `dapr-pubsub-keda` scaler pattern
State store backed by Cosmos DB	KEDA doesn’t know which container Dapr writes to	Dapr’s pluggable component model surfaces the right metric

You won’t write raw Dapr code on AI-200, but recognise that Dapr is part of the platform and integrates with KEDA.

Authentication — managed identity beats connection strings

Two ways to give a scaler access to its source:

Managed identity (recommended): assign a system- or user-assigned identity to the container app, grant it the right Azure RBAC role on Service Bus / Event Hubs / Cosmos. Reference the identity in the scale-rule identity field.
Connection string in a secret: store the connection string in a Container Apps secret, reference it via connectionFromEnv. Works, but you own the secret rotation.

# Grant the user-assigned identity Service Bus Data Receiver
az role assignment create \
  --assignee $UAI_PRINCIPAL_ID \
  --role "Azure Service Bus Data Receiver" \
  --scope $(az servicebus namespace show -n roo-sb -g roo-prod --query id -o tsv)

Real-world example: Priya's loyalty event pipeline

BeanCraft Coffee’s loyalty event pipeline is one of Priya’s wins. Customer transactions flow into a Service Bus queue (tx-events). A Container App enriches each transaction with AI-generated personalisation tags and writes the result to Cosmos DB.

Quiet hours (overnight): queue is empty, container app sits at 0 replicas, cost is $0.
Morning rush (7-9 am): queue jumps to 12,000 messages, KEDA scales to 50 replicas, drains in 90 seconds.
Mid-morning: queue settles at ~50 messages/min, KEDA holds 2-3 replicas.
Lunch rush: another 8,000-message spike, scale up again.

The whole pipeline costs Priya about ten dollars a day. The same workload on always-on App Service plans would be over $200 a day.