Domain 3 β€” Module 6 of 6 100%
22 of 27 overall
Domain 3: Connect to and consume Azure services Free ⏱ ~10 min read

Choosing the Right Messaging Service

Service Bus, Event Grid, Event Hubs, Storage Queues β€” four very similar-sounding services with different jobs. The decision framework that turns 'I need messaging' into the right choice.

Why this module exists

Simple explanation

Azure has four overlapping messaging services and they look interchangeable until you’ve used them. The exam loves to give you a scenario where two of them would technically work but only one is the textbook fit.

Three quick mental hooks:

  • Service Bus = work queue. β€œI need a worker to do this thing reliably.”
  • Event Grid = router. β€œSomething happened β€” fan out the news.”
  • Event Hubs = telemetry firehose. β€œMillions of small events per second from many producers.”
  • Storage Queue = the cheap one. β€œI just need a queue, no fancy features.”

The matrix

Use this matrix as the first pass; pick the row that matches the scenario.
FeatureService BusEvent GridEvent HubsStorage Queues
PatternQueue or pub/sub topicPush event routingPartitioned event logSimple FIFO queue
DirectionPull (receivers poll)Push (Event Grid pushes to subscribers)Pull (consumers read offsets)Pull (receivers poll)
ThroughputUp to ~thousands of msgs/sec per Premium MUMillions of events/sec across topicsMillions of events/sec on Event Hubs Premium / DedicatedModest (hundreds of msgs/sec)
OrderFIFO with sessionsNo order guaranteePer-partition orderFIFO
RetentionTTL per message; configurable up to effectively unlimited on Standard/Premium (14-day max applies on Basic only)24-hour retry window then DLQDays of replay (configurable, up to 90 days on Premium)7-day default TTL; on API β‰₯ 2017-07-29 you can set any positive value or -1 (no expiry)
Best for AIReliable async work, RAG ingest, agent backplaneReactive flows: blob created β†’ embed; cosmos updated β†’ auditTelemetry from inference, click streams, training pipelinesCheap small queues, dev/test

The decision flow

Are you reacting to a state change in another Azure service or your own app?
  └── Yes β†’ Event Grid (or Event Grid β†’ Service Bus for durability)

Are you ingesting telemetry / events at very high volume from many producers?
  └── Yes β†’ Event Hubs

Do you need reliable, ordered, transactional async work?
  └── Yes β†’ Service Bus

Just need a basic queue, cost-sensitive, small scale?
  └── Yes β†’ Storage Queues

Common architectural patterns

Pattern 1 β€” Event Grid β†’ Service Bus β†’ Container Apps

The classic β€œreactive AI worker” pipeline:

Blob created (Event Grid system topic)
  β†’ Service Bus queue (subscriber)
  β†’ Container App + KEDA (scales on Service Bus depth)

Event Grid handles the routing/filtering, Service Bus provides durability + DLQ for the worker, Container Apps + KEDA scales workers.

Pattern 2 β€” Event Hubs β†’ Stream processor β†’ Cosmos

High-throughput ingestion + processing:

Many producers (devices, ad servers, sensors) β†’ Event Hubs
  β†’ Azure Functions (or Stream Analytics, or Spark on AKS)
  β†’ Cosmos DB for state, vector store for embeddings, telemetry to a data lake

Event Hubs absorbs the firehose; downstream services do the work.

Pattern 3 β€” Service Bus topic with multiple subscribers

"order-events" topic
  β†’ "fulfilment" subscription (filter on type=NewOrder)  β†’ fulfilment Container App
  β†’ "loyalty" subscription (no filter)                    β†’ loyalty Container App
  β†’ "audit" subscription (no filter)                       β†’ audit Function App

One publisher, three downstream concerns, broker-side filtering, durable delivery.

Pattern 4 β€” Cosmos DB change feed β†’ Function App

When the trigger source IS Cosmos:

Cosmos write β†’ Cosmos change feed β†’ Function App

Skip messaging entirely β€” the change feed itself acts as the queue. Cheaper and simpler than tee-ing every write to Service Bus or Event Grid first.

Real-world example: Mira's choice between three patterns

Mira’s robots upload shipping label photos to Blob storage. Two downstream concerns: an OCR worker extracting label text, and an audit log of every upload.

  • Naive choice: put all logic in a Function App with a Blob trigger. Works, but the Blob trigger has reliability gotchas at scale (long-poll model, missed events under heavy load).
  • Better choice: Blob created event β†’ Event Grid β†’ Service Bus topic with two subscriptions (ocr and audit). Container Apps run the OCR worker, scaled by KEDA on Service Bus depth.

The second pattern handles the burst at shift change, dead-letters bad images, audits everything, and never duplicates work β€” exactly what Service Bus + Event Grid was designed for.

Anti-patterns

Anti-patternWhy it’s wrongBetter choice
Event Grid for durable work assignmentAt-least-once + 24-hour retry window is fine for events, but you can’t store work in Event GridService Bus queue
Service Bus for high-throughput telemetryDesigned for thousands of msgs/sec, not millions; per-message overheadEvent Hubs
Storage Queue for production AI workloadsNo DLQ, no sessions, no Entra auth, no built-in metricsService Bus
Two messaging services chained β€œjust in case”Doubles failure modes and operational surfacePick one and lean on its features

When in doubt β€” Microsoft’s own decision tree

Microsoft documents this exact decision flow at β€œCompare Azure messaging services”. Two big rules of thumb from there:

  1. Discrete events that report state change β†’ Event Grid. β€œBlob created”, β€œCosmos changed”, β€œResource Health changed”.
  2. Series of related events that need to stay in order β†’ Event Hubs (telemetry/log) or Service Bus (work).

Don’t memorise feature tables β€” internalise the intent of each service, and the right answer reveals itself from the scenario.

Key terms

Question

When should you use Service Bus instead of Storage Queues?

Click or press Enter to reveal answer

Answer

When you need any of: dead-letter, sessions, transactions, large messages, Microsoft Entra auth on the queue itself, or topics. Storage Queues are a basic FIFO queue inside Azure Storage β€” cheap and simple, but missing every advanced feature.

Click to flip back

Question

When does Event Hubs beat Service Bus?

Click or press Enter to reveal answer

Answer

When throughput is in millions of events per second from many producers, and per-event durability semantics aren't critical. Event Hubs is a partitioned log; consumers read with offsets and replay history. Service Bus tops out far below Event Hubs throughput.

Click to flip back

Question

What's the canonical 'reactive AI worker' pattern in Azure?

Click or press Enter to reveal answer

Answer

Event Grid (system or custom topic) β†’ Service Bus queue/topic β†’ Container App with KEDA scaling. Event Grid does routing/filtering, Service Bus adds durability + DLQ, Container Apps + KEDA scales workers. The most common Domain 3 question architecture.

Click to flip back

Question

Why isn't Event Grid the right choice for back-end work assignment?

Click or press Enter to reveal answer

Answer

Event Grid is push-based event routing. Subscribers must accept events when delivered (within retry window) or lose them after 24 hours. Service Bus is pull-based and durable β€” workers consume on their own schedule, messages persist for days, dead-lettering captures failures. Use Event Grid for routing, Service Bus for work.

Click to flip back

Question

When should you skip Azure messaging entirely?

Click or press Enter to reveal answer

Answer

When the source itself emits a usable change feed β€” Cosmos DB change feed, for example. The change feed is its own ordered, durable, partitioned event source; tee-ing every write to Service Bus first adds cost and operational surface for no benefit.

Click to flip back

Knowledge check

Knowledge Check

Theo's clinical assistant ingests 850k clinical events per minute from monitoring devices across hospitals. Each event is small (200 bytes); all events for a given patient must stay in order. Which Azure messaging service fits best?

Knowledge Check

Mira wants to react when a Blob is created in a Storage container β€” embed the image, write the result to Cosmos, alert if the embedding worker is down. Which architecture fits?

Knowledge Check

Lin's POC needs the cheapest possible queue for a hobby AI demo handling 10 messages/minute. No DLQ needed; if a message is lost, it doesn't matter. Which service?