AKS for AI Apps: Deploy with Manifests
When App Service and Container Apps aren't enough β Azure Kubernetes Service for full control. Deployments, services, ingress, ConfigMaps, secrets, GPU node pools, and the manifest patterns the exam loves.
When AKS β and not Container Apps
Pick AKS when you need real Kubernetes. If βwe have Helm charts, custom CRDs, GPU node pools, multi-tenant namespaces, and the ML team already runs Kubeflowβ β thatβs AKS territory. App Service and Container Apps deliberately hide Kubernetes; AKS hands it to you.
The trade-off is operational: AKS gives you everything Kubernetes can do, but you maintain version upgrades, node pool sizing, networking, and the clusterβs health.
For AI-200, you donβt need to be a Kubernetes expert. You need to read and write basic manifests β Deployments, Services, Ingress, ConfigMaps, Secrets β and know how to plug AKS into ACR, Key Vault, and Microsoft Entra workload identity.
The five manifests every AI-200 candidate must read
| Manifest kind | What it does | Mental model |
|---|---|---|
Deployment | Runs N replicas of a container, manages rollouts | βI want 3 of these runningβ |
Service | Stable cluster IP / DNS name in front of pods | βHow other things in the cluster reach my podsβ |
Ingress | HTTP routing into the cluster from outside | βHow the internet reaches my Serviceβ |
ConfigMap | Non-secret config data, injected as env vars or files | βSettings, environment-specific valuesβ |
Secret | Sensitive values, base64-encoded, mounted similarly to ConfigMaps | βPasswords, keys, tokensβ |
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: roo-vision
labels: { app: roo-vision }
spec:
replicas: 3
selector:
matchLabels: { app: roo-vision }
template:
metadata:
labels: { app: roo-vision }
spec:
containers:
- name: vision
image: roo.azurecr.io/roo-vision:v3.4.1
ports:
- containerPort: 8000
env:
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: roo-config
key: LOG_LEVEL
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: roo-secrets
key: openai-key
resources:
requests: { cpu: "500m", memory: "1Gi" }
limits: { cpu: "1", memory: "2Gi" }
Read this manifest as: βrun 3 replicas of roo-vision:v3.4.1. Each gets LOG_LEVEL from a ConfigMap and OPENAI_API_KEY from a Secret. Each is allowed 500 millicores of CPU and 1 GiB of RAM, with bursts up to 1 core and 2 GiB.β
Service
apiVersion: v1
kind: Service
metadata:
name: roo-vision-svc
spec:
selector: { app: roo-vision }
ports:
- port: 80
targetPort: 8000
type: ClusterIP
The Service gives the Deployment a stable internal address β roo-vision-svc.default.svc.cluster.local β that load-balances across all healthy pods.
Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: roo-vision-ing
annotations:
kubernetes.io/ingress.class: webapprouting.kubernetes.azure.com
spec:
rules:
- host: vision.roo-robotics.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: roo-vision-svc
port: { number: 80 }
tls:
- hosts: [vision.roo-robotics.com]
secretName: roo-vision-tls
The Ingress publishes the Service to the public internet via the Application Routing add-on (the AKS-managed nginx ingress). For enterprise scenarios with WAF, swap to AGIC (Application Gateway Ingress Controller).
ConfigMap and Secret
apiVersion: v1
kind: ConfigMap
metadata: { name: roo-config }
data:
LOG_LEVEL: info
MODEL_NAME: phi-4-mini
---
apiVersion: v1
kind: Secret
metadata: { name: roo-secrets }
type: Opaque
data:
openai-key: c2stMTIzNDU2Nzg= # base64
Secrets in this raw form arenβt encrypted at rest by default in etcd. For real secrets, use the Secrets Store CSI driver to project Key Vault values directly into pods (next section).
Pulling images from ACR β Workload Identity OR cluster identity
Two patterns:
| Pattern | How | When |
|---|---|---|
| Cluster-level integration | az aks update --attach-acr <registry> | Default β kubelet identity gets AcrPull on the registry |
| Workload Identity | Federate a service account to a User-Assigned Managed Identity, grant AcrPull | When different namespaces / apps need different ACR access |
For the exam, the default cluster-level integration is the most common scenario:
az aks update -n roo-aks -g roo-prod --attach-acr roo
That single command grants the AKS kubelet identity AcrPull on the registry β every pod in the cluster can pull from that ACR.
Secrets β the Secrets Store CSI driver
Native Kubernetes Secrets are base64-encoded blobs, not real secrets. The recommended pattern for Azure is the Secrets Store CSI driver, which projects Key Vault secrets as files (or env vars) inside pods.
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata: { name: roo-kv-secrets }
spec:
provider: azure
parameters:
usePodIdentity: "false"
useVMManagedIdentity: "false"
clientID: "<workload-identity-client-id>"
keyvaultName: roo-kv
objects: |
array:
- |
objectName: OpenAIKey
objectType: secret
tenantId: "<tenant-id>"
In the pod spec:
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes: { secretProviderClass: roo-kv-secrets }
volumeMounts:
- { name: secrets-store, mountPath: /mnt/secrets, readOnly: true }
The pod sees /mnt/secrets/OpenAIKey containing the live Key Vault value. Rotation in Key Vault β next pod read picks up the new value (with the right rotation poller config).
GPU node pools for inference
For larger models or training:
az aks nodepool add \
--cluster-name roo-aks --resource-group roo-prod \
--name gpupool \
--node-vm-size Standard_NC6s_v3 \
--node-count 1 \
--node-taints sku=gpu:NoSchedule \
--labels accelerator=nvidia-tesla-v100
Pods that want GPU schedule onto this pool with a matching toleration:
spec:
tolerations:
- key: sku
operator: Equal
value: gpu
effect: NoSchedule
nodeSelector: { accelerator: nvidia-tesla-v100 }
containers:
- name: vision
image: roo.azurecr.io/roo-vision:v3.4.1-cuda
resources:
limits:
nvidia.com/gpu: 1
The taint keeps non-GPU workloads off expensive GPU nodes. Only pods that explicitly tolerate the taint AND request nvidia.com/gpu schedule there.
Key terms
Knowledge check
Theo's AKS cluster cannot pull a new image from ACR. Pods stay in `ImagePullBackOff`. The cluster previously pulled fine; only the registry has changed (a new ACR for production). What's the simplest fix?
Mira needs the inference pods to run only on GPU nodes (Standard_NC6s_v3). Other workloads must NOT land on those nodes. Which combination of mechanisms achieves this?
Lin's AKS deployment reads `OPENAI_API_KEY` from a native Kubernetes Secret. The security team wants the actual key stored only in Key Vault, with rotation visible in Key Vault audit logs. What's the recommended pattern?