Azure Event Grid: Filters, Custom Events, Retries
The push-based event routing service for AI integration. CloudEvents, custom topics, system topics, advanced filters, and the retry policies that protect downstream subscribers.
What Event Grid is β and isnβt
Event Grid is an event routing service. Many publishers send small JSON events; Event Grid pushes each event to one or more subscribers based on filters. Subscribers can be Functions, Logic Apps, webhooks, Service Bus queues, Storage queues, even Container Apps.
Itβs NOT a message broker like Service Bus. Event Grid is push-based, optimised for small (β€1 MB) event metadata, and excellent for βreact to a state changeβ. Service Bus is pull-based, durable, and excellent for βprocess this work in orderβ.
For AI workloads, Event Grid is great when an Azure resource emits an event you want to react to β Storage blob created, Cosmos DB doc updated, Container Registry image pushed, custom event from your app.
CloudEvents schema
{
"specversion": "1.0",
"id": "evt-7f3...",
"source": "/subscriptions/<sub>/resourceGroups/roo-prod/providers/Microsoft.Storage/storageAccounts/rooblobs",
"type": "Microsoft.Storage.BlobCreated",
"time": "2026-05-08T03:14:15Z",
"subject": "/blobServices/default/containers/uploads/blobs/img-789.jpg",
"datacontenttype": "application/json",
"data": {
"url": "https://rooblobs.blob.core.windows.net/uploads/img-789.jpg",
"contentType": "image/jpeg",
"contentLength": 482910
}
}
The five envelope fields used by filters: id, source, type, subject, time. Plus data for the resource-specific payload.
Filter dimensions β pick once, narrow forever
| Feature | Subject filter | Event type filter | Advanced filter (data fields) |
|---|---|---|---|
| Filters on | Begins with / ends with on the `subject` field | Exact match against allow-listed event types | Any field including `data.*` β equals, in, contains, range |
| Performance | Fast β broker-side path filter | Fast | Slightly higher per-match cost |
| Use for | Storage events scoped to a container or prefix | Subscribing only to specific event types | Filtering by content (e.g., contentType='image/jpeg') |
# Subscribe to only blob-created events for a specific container, only JPEGs
az eventgrid event-subscription create \
--name jpeg-uploads \
--source-resource-id $STORAGE_RESOURCE_ID \
--endpoint $FUNCTION_APP_URL \
--included-event-types Microsoft.Storage.BlobCreated \
--subject-begins-with "/blobServices/default/containers/uploads/blobs/" \
--advanced-filter data.contentType StringContains "image/jpeg"
Custom topics β your app emits events
# Publishing a custom event from your app
from azure.eventgrid import EventGridPublisherClient, CloudEvent
from azure.identity import DefaultAzureCredential
client = EventGridPublisherClient(
"https://roo-events.<region>-1.eventgrid.azure.net/api/events",
DefaultAzureCredential(),
)
event = CloudEvent(
type="com.roorobotics.shipment.scanned",
source="/warehouses/auckland",
data={"shipment_id": "S-7012", "weight_kg": 42.3},
subject="warehouse/auckland/shipment/S-7012",
)
client.send(event)
Subscribers register against your custom topic and choose endpoints β Functions, webhooks, Service Bus, Container Apps, Logic Apps.
Delivery semantics β and what to do about retries
Event Grid delivers each event to each subscriber at least once. If a delivery fails (timeout, 5xx, or non-2xx), Event Grid retries with exponential back-off:
| Phase | Schedule (default) |
|---|---|
| Initial retries | 10 s, 30 s, 1 min, 5 min, 10 min, 30 min, 1 hour, 3 hours, 6 hours, 12 hours, 24 hours |
| Total retry window | Up to 24 hours (configurable, max 24 h) |
| After max retries | Drops the event, OR routes to a configured dead-letter Storage destination |
# Set a custom retry policy + dead-letter destination
az eventgrid event-subscription update \
--name jpeg-uploads \
--source-resource-id $STORAGE_RESOURCE_ID \
--max-delivery-attempts 5 \
--event-ttl 60 \
--deadletter-endpoint $STORAGE_DLQ_URL
Dead-lettering writes the failed event as a JSON blob to your specified Storage container. Most production subscribers configure dead-lettering β the alternative is silent data loss after retries exhaust.
Exam tip: 'events stop arriving' β check the subscription, not the publisher
When events stop appearing at a subscriber, the most common causes (in order):
- The subscriptionβs filter changed and now matches nothing
- The subscriber endpoint returns non-2xx and Event Grid is in back-off
- The subscriber endpoint is unreachable (DNS, network)
- The publisher actually stopped emitting
Check the subscription first β Event Grid metrics (delivered, failed, dead-lettered) show all of (1)-(3) clearly.
Idempotency β the at-least-once consequence
Because Event Grid retries, your subscriber will see duplicate events sometimes. Make handlers idempotent:
# Pattern: use the event's `id` to deduplicate
async def handle_event(event: CloudEvent):
event_id = event.id
if await already_processed(event_id):
return # safe re-delivery β skip
await process(event)
await mark_processed(event_id)
Common dedup stores: Cosmos DB (partition by something stable, document id = event id), Redis (SET NX with TTL), PostgreSQL (UNIQUE constraint on event_id).
Endpoints β where can events be delivered?
| Endpoint | Notes |
|---|---|
| Webhook (HTTPS) | Anywhere β Container Apps, App Service, your data-centre. Must respond 200 within 30 s |
| Azure Functions | Native trigger β EventGridTrigger binding |
| Logic Apps | Native trigger |
| Service Bus queue / topic | Push events into a Service Bus destination for durable processing |
| Storage queue | Simple, cheap, lightweight |
| Event Hubs | High-throughput buffering |
| Hybrid Connections | Reach back-end services through hybrid connections without exposing them publicly |
Two patterns recur in AI workloads:
- Event Grid β Service Bus β Container Apps: Event Grid handles the routing/filtering, Service Bus provides durability + DLQ, Container Apps + KEDA scales the workers
- Event Grid β Function App: simple event-driven serverless, fine for low-volume reactive work
Event Grid Namespace topics + MQTT
Event Grid Namespaces add:
- Pull-style subscribers (HTTP long-poll instead of push) for back-end services that donβt expose endpoints
- MQTT broker for IoT-style scenarios where devices publish/subscribe with MQTT v3.1.1 / v5
This isnβt core to AI-200 but recognise the keywords if they appear in a question.
Key terms
Knowledge check
Mira wants a Container App to react when a new image lands in a specific Storage container. Multiple Storage events fire for various containers, but only `uploads/` matters and only for JPEGs. Which Event Grid configuration fits?
Theo's webhook subscriber returns 503s during a deployment. Events from the previous hour are critical. What should be true of the subscription?
Lin notices the subscriber occasionally processes the same event twice. What's the right fix?