Domain 1 β€” Module 6 of 8 75%
6 of 27 overall
Domain 1: Develop containerized solutions on Azure Free ⏱ ~12 min read

KEDA: Event-Driven Scaling in Container Apps

How KEDA scales your container app from 0 to many based on real events β€” Service Bus queue depth, Event Hubs lag, custom Prometheus metrics. The exact YAML the exam expects you to read.

What KEDA does

Simple explanation

KEDA is the autoscaler that watches things outside your container and decides how many replicas to run. A traditional autoscaler watches CPU. KEDA watches anything: a Service Bus queue, an Event Hubs partition lag, a Cosmos DB change feed, a Kafka topic, even a custom Prometheus metric.

For AI inference, this is exactly what you need. Inference cost β‰ˆ replicas Γ— time. If you can scale down to zero when nothing’s queued and out to dozens when work piles up, you pay only for the work that actually happened.

In Container Apps, KEDA is the autoscaler. You don’t install it; you just write scale rules. The exam tests whether you can read a scale rule and predict what will happen.

The mental model

Event source             KEDA scaler                  Container App
(Service Bus queue) ───▢ Service Bus scaler ───▢ scale 0β†’N replicas
(HTTP requests)     ───▢ HTTP scaler        ───▢ scale 0β†’N replicas
(Cosmos change feed)───▢ Cosmos DB scaler   ───▢ scale 0β†’N replicas
(Custom Prom metric)───▢ Prometheus scaler  ───▢ scale 0β†’N replicas

Each scaler has a threshold (messages-per-replica, requests-per-second, etc.). KEDA computes:

desiredReplicas = ceil(metricValue / threshold)

Constrained by your --min-replicas and --max-replicas.

Configuring scale rules in Container Apps

# Service Bus queue depth scaler
properties:
  template:
    scale:
      minReplicas: 0
      maxReplicas: 50
      rules:
        - name: queue-scale
          custom:
            type: azure-servicebus
            metadata:
              queueName: warehouse-images
              messageCount: "20"     # Each replica handles 20 messages
              namespace: roo-prod
            identity: system

That rule says: β€œlook at the warehouse-images queue. For every 20 messages waiting, run one replica. Maximum 50.”

If the queue jumps to 600 messages, KEDA targets 30 replicas. When the queue drains and stays empty for the cool-down (default 5 minutes), replicas scale down β€” eventually to zero, since minReplicas: 0.

Exam tip: messageCount is per-replica, not total

Reading scale rules in exam questions, this is the most common gotcha. messageCount is the per-replica target, not a global trigger. So messageCount: 20 with 600 messages β†’ 30 replicas, NOT one replica that processes 20 messages.

If the question gives you a queue depth and a messageCount, divide and round up. That’s the target replica count (capped by maxReplicas).

Common scale-rule types

The four scale-rule families you'll see most on AI-200.
FeatureHTTPService Bus queueEvent HubsCustom (Prometheus, Redis, ...)
Scaler type`http``azure-servicebus``azure-eventhub``prometheus`, `redis`, etc.
Key metadata`concurrentRequests``queueName`, `messageCount`, `namespace``eventHubName`, `unprocessedEventThreshold`, `consumerGroup`Scaler-specific (query, threshold)
AuthNone β€” Container Apps wires it upManaged identity (preferred), connection string secretManaged identity, connection stringPer-scaler β€” often managed identity or secret
Best forWeb APIs, public endpointsBackground workers, AI inference behind a queueTelemetry pipelines, real-time AI on streamsAnything else KEDA supports

HTTP scale rule

rules:
  - name: http-scale
    http:
      metadata:
        concurrentRequests: "100"   # 1 replica per 100 concurrent in-flight requests

concurrentRequests here is also per-replica. With 850 concurrent requests, ceil(850/100) = 9 replicas.

rules:
  - name: queue-scale
    custom:
      type: azure-servicebus
      metadata:
        queueName: image-jobs
        messageCount: "10"
        namespace: roo-sb
      identity: <user-assigned-identity-resource-id>

The identity must have Azure Service Bus Data Receiver (or higher) on the namespace.

Cosmos DB change feed

rules:
  - name: changefeed-scale
    custom:
      type: azure-cosmosdb
      metadata:
        connectionFromEnv: COSMOS_CONNECTION_STRING
        databaseName: orders
        containerName: events
        leaseContainerName: leases
        eventCount: "100"

Useful for AI workflows that react to data changes β€” every order update triggers an embedding refresh, an enrichment, or a notification.

Scale-rule constraints in Container Apps

Container Apps imposes some sensible defaults you should know:

ConstraintDefaultNotes
Min replicas0Set higher to avoid cold-start latency
Max replicas10Increase as needed; Consumption profile supports up to 1,000
Polling interval30 sKEDA polls the scaler every 30 s by default
Cool-down period300 s (5 min)How long below threshold before scaling down
Number of rulesUp to 10 per appKEDA picks the highest desired replica count from any rule

When multiple scale rules apply, KEDA evaluates each and runs the maximum desired replicas across them. This is β€œscale up if any source needs us; scale down only when all sources agree.”

When KEDA isn’t enough β€” KEDA + Dapr

For a few patterns, KEDA alone undercounts the work. Container Apps integrates Dapr for these cases:

ScenarioWithout DaprWith Dapr
Pub/sub on Service Bus where Dapr handles the subscriptionKEDA can’t see Dapr’s internal queueDapr exposes its queue depth to KEDA via the dapr-pubsub-keda scaler pattern
State store backed by Cosmos DBKEDA doesn’t know which container Dapr writes toDapr’s pluggable component model surfaces the right metric

You won’t write raw Dapr code on AI-200, but recognise that Dapr is part of the platform and integrates with KEDA.

Authentication β€” managed identity beats connection strings

Two ways to give a scaler access to its source:

  1. Managed identity (recommended): assign a system- or user-assigned identity to the container app, grant it the right Azure RBAC role on Service Bus / Event Hubs / Cosmos. Reference the identity in the scale-rule identity field.
  2. Connection string in a secret: store the connection string in a Container Apps secret, reference it via connectionFromEnv. Works, but you own the secret rotation.
# Grant the user-assigned identity Service Bus Data Receiver
az role assignment create \
  --assignee $UAI_PRINCIPAL_ID \
  --role "Azure Service Bus Data Receiver" \
  --scope $(az servicebus namespace show -n roo-sb -g roo-prod --query id -o tsv)
Real-world example: Priya's loyalty event pipeline

BeanCraft Coffee’s loyalty event pipeline is one of Priya’s wins. Customer transactions flow into a Service Bus queue (tx-events). A Container App enriches each transaction with AI-generated personalisation tags and writes the result to Cosmos DB.

  • Quiet hours (overnight): queue is empty, container app sits at 0 replicas, cost is $0.
  • Morning rush (7-9 am): queue jumps to 12,000 messages, KEDA scales to 50 replicas, drains in 90 seconds.
  • Mid-morning: queue settles at ~50 messages/min, KEDA holds 2-3 replicas.
  • Lunch rush: another 8,000-message spike, scale up again.

The whole pipeline costs Priya about ten dollars a day. The same workload on always-on App Service plans would be over $200 a day.

Key terms

Question

What is KEDA?

Click or press Enter to reveal answer

Answer

Kubernetes Event-driven Autoscaling β€” an open-source autoscaler that scales Kubernetes pods (or Container Apps replicas) based on external event sources like queues, streams, and metrics. The autoscaler underneath Azure Container Apps.

Click to flip back

Question

What does messageCount mean in a Service Bus scale rule?

Click or press Enter to reveal answer

Answer

The target number of messages PER REPLICA. KEDA divides queue depth by messageCount and rounds up to compute desired replicas. queueDepth=600, messageCount=20 β†’ 30 replicas (capped by maxReplicas).

Click to flip back

Question

What's the recommended way to authenticate a KEDA Service Bus scale rule to its namespace?

Click or press Enter to reveal answer

Answer

Use a managed identity assigned to the container app, granted Azure Service Bus Data Receiver on the namespace. Reference the identity in the scale rule's `identity` field. No connection strings, no secrets to rotate.

Click to flip back

Question

If a Container App has two scale rules, how does KEDA decide replica count?

Click or press Enter to reveal answer

Answer

KEDA evaluates each scale rule's desired replicas independently and runs the MAXIMUM across all rules. The app scales up when any rule wants more replicas; it scales down only when all rules agree fewer replicas are enough.

Click to flip back

Question

Can a Container App with HTTP ingress have minReplicas: 0?

Click or press Enter to reveal answer

Answer

Yes β€” HTTP ingress with minReplicas 0 is the canonical scale-to-zero pattern. The first request after idle hits a brief cold start while a replica spins up. Set minReplicas to 1 if cold-start latency is unacceptable.

Click to flip back

Knowledge check

Knowledge Check

Mira's KEDA rule reads `messageCount: 25, queueName: image-jobs, maxReplicas: 40`. The queue spikes to 1500 messages. How many replicas will Container Apps target?

Knowledge Check

Theo's container app has both an HTTP scale rule (target 4 replicas right now) and a Service Bus scale rule (target 12 replicas right now). The app's `maxReplicas` is 30. How many replicas does Container Apps run?

Knowledge Check

Lin wants a Container App to scale based on requests per second on a custom Prometheus metric exposed by the app itself. Which scale-rule type does Lin need?