Domain 3 β€” Module 1 of 6 17%
17 of 27 overall
Domain 3: Connect to and consume Azure services Free ⏱ ~13 min read

Azure Service Bus: Queues + Dead-Letter Handling

Reliable async messaging for AI back-ends. Queue mechanics, sessions, dead-letter queues, message lock and complete, and how to scale a Container App on Service Bus depth.

Why Service Bus is the workhorse for async AI

Simple explanation

Azure Service Bus is a durable, ordered message broker for back-end work. When an HTTP request arrives but the work is too slow to do synchronously β€” embedding an article, generating a summary, calling an external API β€” drop a message on a Service Bus queue and let a worker pick it up. The user gets an instant response; the worker chews through messages on its own schedule.

For AI workloads, the killer features are:

  • Reliable delivery β€” once a message is in the queue, it stays there until processed
  • Dead-letter queue (DLQ) β€” messages that fail too many times go to a side queue you can inspect
  • Sessions β€” guarantee FIFO order within a session (e.g., per-user message order)
  • KEDA scaling β€” Container Apps scales replicas based on queue depth

The lifecycle of a message

1. Sender places a message on the queue
2. Receiver calls Receive() β€” message is locked (default 30 s, max 5 min)
3. Receiver processes the message
4. Receiver calls Complete() β†’ message is removed from the queue
   OR Abandon() β†’ lock released, message returns to the queue (delivery count++)
   OR DeadLetter() β†’ message moved to the DLQ with a reason
5. If the lock expires before any of those, message returns to the queue (delivery count++)
6. After MaxDeliveryCount attempts, the message auto-dead-letters

The β€œlock” is critical: while a receiver holds the lock on a message, no other receiver gets it. If the receiver crashes, the lock expires and another receiver picks up the message β€” at-least-once delivery.

The five outcomes of a Receive()

The receiver disposition decides what happens next. Always Complete on success; Abandon for transient failures.
FeatureCompleteAbandonDeadLetterDeferLock expired
WhenSuccessful processingTransient failure β€” retry laterPoison message β€” give upProcess out of order β€” defer for laterCrash, slow processing, network drop
EffectRemoved from queueReturns to queue, delivery count++Moves to DLQ with reason + descriptionRemoved from main queue, retrievable by SequenceNumberReturns to queue, delivery count++
Counts toward MaxDeliveryCountNoYesYes (last attempt)NoYes
from azure.servicebus.aio import ServiceBusClient
from azure.identity.aio import DefaultAzureCredential

async with ServiceBusClient(
    fully_qualified_namespace="roo-sb.servicebus.windows.net",
    credential=DefaultAzureCredential(),
) as client:
    receiver = client.get_queue_receiver(queue_name="image-jobs", max_wait_time=30)
    async with receiver:
        async for msg in receiver:
            try:
                await process(msg)
                await receiver.complete_message(msg)
            except TransientError:
                await receiver.abandon_message(msg)   # retry later
            except PoisonError as e:
                await receiver.dead_letter_message(
                    msg, reason="Poison", error_description=str(e)
                )

Dead-letter queues β€” the safety net

Every queue has an automatic sub-queue for dead messages. Two paths in:

PathWhen
Auto dead-letter on max deliveryA message exceeds MaxDeliveryCount (default 10) β€” usually because it keeps abandoning
Explicit dead-letterReceiver calls DeadLetter() with a reason β€” clean way to handle β€œthis will never succeed”

Read DLQ messages by appending /$DeadLetterQueue to the queue path:

dlq_receiver = client.get_queue_receiver(
    queue_name="image-jobs/$DeadLetterQueue",
    max_wait_time=10,
)

The dead-letter queue is a queue too β€” you can inspect, reprocess, fix, and resubmit. Most production systems have a small admin tool that reads the DLQ, lets a human triage, and either re-sends or discards.

Real-world example: Mira's poison-image handling

Mira’s worker embeds product images. About 0.1% of incoming images are corrupted (broken JPEGs, network truncation). Without DLQ handling, those messages would loop forever, blocking other work.

Mira’s pattern:

  1. Worker tries to process the image
  2. On TransientError (network blip, OpenAI quota): Abandon β€” Service Bus retries
  3. On CorruptImageError (file is broken): DeadLetter with reason β€œBadPayload”
  4. After 3 abandons, Service Bus auto-dead-letters with reason β€œMaxDeliveryCountExceeded”
  5. A nightly job reads the DLQ, re-encodes JPEGs that look fixable, re-submits them, deletes the rest

Result: the main queue never blocks on broken data. The DLQ is a small worklist, not a black hole.

Sessions β€” FIFO within a key

Service Bus sessions group related messages and guarantee FIFO order within the session. Useful when message order matters per user / conversation / order-id.

sender = client.get_queue_sender(queue_name="orders")
await sender.send_messages(
    ServiceBusMessage(body=json.dumps(payload), session_id=user_id)
)

# Receive β€” get messages from one session at a time
session_receiver = client.get_queue_receiver(
    queue_name="orders", session_id=NEXT_AVAILABLE_SESSION,
)

The session lock lasts as long as one receiver holds it; other receivers can get other sessions in parallel. So sessions give you per-key FIFO without serialising the whole queue.

# Grant the Container App's managed identity the right RBAC
az role assignment create \
  --assignee $PRINCIPAL_ID \
  --role "Azure Service Bus Data Receiver" \
  --scope $(az servicebus namespace show -n roo-sb -g roo-prod --query id -o tsv)

Three built-in roles (data-plane):

RolePermissionsUse for
Azure Service Bus Data OwnerFull data-planeAdmin tools
Azure Service Bus Data SenderSend onlyProducers
Azure Service Bus Data ReceiverReceive (peek, complete, abandon, dead-letter)Consumers

Receive modes β€” peek-lock vs receive-and-delete

ModeWhat happensUse for
PeekLock (default)Message is locked but stays in queue; you Complete or AbandonReliable processing β€” recover from failures
ReceiveAndDeleteMessage is removed from queue immediately on receiveTelemetry where it’s OK to lose a message on crash

ReceiveAndDelete is faster but risky for AI workloads β€” a worker crash drops the message permanently. Default to PeekLock unless you have a clear reason.

Scheduled and TTL messages

# Scheduled β€” process this in 5 minutes
scheduled_time = datetime.now(timezone.utc) + timedelta(minutes=5)
sequence_number = await sender.schedule_messages(message, scheduled_time)

# TTL β€” message expires if not consumed in 1 hour
message = ServiceBusMessage(body=payload, time_to_live=timedelta(hours=1))

TTL is a per-message override of the queue default. Expired messages can either dead-letter (queue config) or be silently discarded.

Key terms

Question

What's the default receive mode in Service Bus, and why?

Click or press Enter to reveal answer

Answer

PeekLock. The receiver gets the message locked (others can't see it) for a configurable lock duration. The receiver must call Complete (success), Abandon (retry), or DeadLetter (give up). If the lock expires, the message returns to the queue. Lets you recover from worker crashes.

Click to flip back

Question

What is the Service Bus dead-letter queue?

Click or press Enter to reveal answer

Answer

An automatic sub-queue for messages that can't be processed β€” either because they exceeded MaxDeliveryCount or were explicitly DeadLettered. Read by appending `/$DeadLetterQueue` to the queue path. Inspect, fix, and resubmit or discard.

Click to flip back

Question

When should you use Service Bus sessions?

Click or press Enter to reveal answer

Answer

When messages must be processed in FIFO order within a key β€” per user, per order, per session ID. Sessions give you per-key ordering without serialising the whole queue (parallel receivers can each hold different sessions).

Click to flip back

Question

What's the difference between Abandon and DeadLetter?

Click or press Enter to reveal answer

Answer

Abandon β€” release the lock; the message returns to the queue and another attempt happens. Counts toward MaxDeliveryCount. DeadLetter β€” explicitly route the message to the DLQ with a reason. Use Abandon for transient failures, DeadLetter for poison messages you know won't succeed.

Click to flip back

Question

Which Service Bus tier supports dead-letter queues, sessions, and topics?

Click or press Enter to reveal answer

Answer

Standard and Premium. Basic supports queues only and lacks DLQ, sessions, and topics. Pick Premium when you need dedicated capacity, large messages (up to 100 MB), or VNet integration.

Click to flip back

Knowledge check

Knowledge Check

Mira's worker is processing image-embedding messages. A message contains a corrupted JPEG and will never succeed. What's the cleanest disposition to use?

Knowledge Check

Theo's clinical-message processing must guarantee that all messages for the same patient are processed in the order they were sent, even with multiple worker replicas. What feature does Theo need?

Knowledge Check

Lin's Container App connects to Service Bus with a connection string in app settings. The security team wants the connection string removed. What's the recommended replacement?