Containers for AI Workloads β Why and When
Why AI apps live in containers in 2026 β and what containers, registries, and orchestrators have to do with passing AI-200. Meet the four characters whose stories run through this whole course.
Why containers are the default for AI on Azure
A container is a sealed lunchbox for your code. It packs your app, every library it needs, the right Python version, the right CUDA version β everything β into one self-contained box. You can hand the same lunchbox to a laptop, a test server, or a giant AI cluster and it tastes the same.
AI apps especially love containers because AI dependencies are fussy. Wrong NumPy version, wrong tokenizer, wrong CUDA driver β and your model breaks. Containers freeze βwhat works on my machineβ so it works everywhere.
On Azure, youβll meet four ways to run containers: App Service (a managed web host), Container Apps (serverless containers that scale to zero), AKS (full Kubernetes for control freaks), and Azure Container Registry (the warehouse where your container images live). The exam tests when to pick which.
The four hosting tiers at a glance
| Feature | Container Registry | App Service | Container Apps | AKS |
|---|---|---|---|---|
| What it is | Image storage + build service | Managed web app host (with container support) | Serverless containers, scale to zero | Full managed Kubernetes |
| You manage | Images, tags, replication | App settings, slots, scaling rules | Container revisions, ingress, scale rules | Nodes, pods, networking, RBAC |
| Best for | Storing every other tier's images | Web APIs, simple AI inference endpoints | Event-driven AI pipelines, microservices, scale-to-zero | Complex multi-service AI platforms, GPU pools |
| Scale model | N/A β it's storage | Manual + autoscale on metrics | KEDA event-driven (queues, HTTP, custom) | HPA, KEDA, cluster autoscaler β your choice |
| Operational cost | Tiny (storage + bandwidth) | Low β fully managed | Medium β pay per active container | High β you run a cluster |
Meet the four characters
This course follows four people building real AI workloads on Azure. Youβll see them across all 27 modules and again in the practice questions.
| Character | Who they are | AI use case |
|---|---|---|
| π¦ Mira at Roo Robotics | Backend engineer, 18-person robotics startup | Vision models for warehouse inventory robots β needs GPU containers + event scaling |
| π Theo at Tidewater Health | Senior platform engineer, 4500-staff hospital network | Clinical AI assistant over patient records β RAG, secrets, audit logs are non-negotiable |
| β Priya at BeanCraft Coffee | Tech lead, 240-store coffee chain | Loyalty app personalisation, real-time order recs, menu Q&A bot |
| π¨βπ» Lin | Freelance Azure consultant | Builds AI POCs for SMB clients β values simplicity and time-to-deploy |
Real-world example: how the four tiers fit together
Mira ships a warehouse robot. Hereβs how all four containers tiers cooperate:
- Azure Container Registry holds the inference image (
roo-vision:v3.4.1) β built nightly, signed, geo-replicated to two regions. - Container Apps runs the image as a serverless inference endpoint. KEDA scales it from zero up to 50 replicas based on Service Bus queue depth.
- App Service hosts the operator dashboard β a Node.js admin UI in a container, on a Linux App Service plan.
- AKS runs the model training cluster β GPU node pools, scheduled jobs, MLflow tracking.
All four tiers pull from the same ACR. That single registry is the spine of the whole system.
What βdeveloping AIβ means on this exam
The AI-200 exam title is Developing AI Cloud Solutions on Azure. The audience profile makes the focus clear: you are responsible for the back-end services and components of an AI solution. That distinction matters.
| AI-102 / AI-103 | AI-200 |
|---|---|
| Build the AI itself β pick the model, train, fine-tune, build the agent | Wire the AI into a production app β host it, scale it, secure it, observe it |
| Heavy on Foundry portal, model selection, RAG design | Heavy on containers, Cosmos DB, pgvector, Service Bus, Key Vault, OpenTelemetry |
| Python + Foundry SDK | Python + Azure SDKs + container tooling + KQL |
If AI-103 is βbe the AI engineerβ, AI-200 is βbe the cloud developer who ships the AI engineerβs work to production.β
Exam tip: read the question for 'where does the data live?'
Many AI-200 questions hinge on whether the answer needs container hosting, a vector database, a message bus, or a secret store. Build a habit: as you read each scenario, jot down the data path β where is the input coming from, where is state stored, where do events flow?
The right Azure service almost always falls out of that data-path sketch. Wrong answers are usually services that work, but donβt fit the data path.
How this course is organised
The examβs four domains map directly to four parts of this course:
- Domain 1 β Containers on Azure (you are here): ACR, App Service, Container Apps, KEDA, AKS, troubleshooting.
- Domain 2 β AI data services: Cosmos DB NoSQL (incl. vector search + change feed), PostgreSQL with pgvector + RAG, Azure Managed Redis.
- Domain 3 β Connect to Azure services: Service Bus, Event Grid, Azure Functions.
- Domain 4 β Secure, monitor, troubleshoot: Key Vault, App Configuration, OpenTelemetry, KQL.
Youβll see the same four characters in each domain. Their problems get more specific as you go β Miraβs container hits Domain 1, her Cosmos vector index hits Domain 2, her Service Bus queue hits Domain 3, her OpenTelemetry traces hit Domain 4. The more you sit with the cast, the more βwhich Azure service?β answers itself.
Key terms
Knowledge check
Mira at Roo Robotics is choosing between App Service and Container Apps to host an inference endpoint that processes images from a Service Bus queue. Most of the day the queue is empty, but during warehouse shift changes it spikes to 800 messages per minute. Which is the better fit?
Theo at Tidewater Health is mapping the back-end architecture for the new clinical AI assistant. Which Azure service should hold the container images that ALL the other tiers (Container Apps, App Service, AKS) pull from?
Lin is deciding how to host a small RAG demo for a client β one Python container that exposes an HTTP endpoint, used 5β10 times per hour by the client's three pilot users. The client cares about cost above all. Which is the best fit?