Domain 1 β€” Module 7 of 8 88%
7 of 27 overall
Domain 1: Develop containerized solutions on Azure Free ⏱ ~14 min read

AKS for AI Apps: Deploy with Manifests

When App Service and Container Apps aren't enough β€” Azure Kubernetes Service for full control. Deployments, services, ingress, ConfigMaps, secrets, GPU node pools, and the manifest patterns the exam loves.

When AKS β€” and not Container Apps

Simple explanation

Pick AKS when you need real Kubernetes. If β€œwe have Helm charts, custom CRDs, GPU node pools, multi-tenant namespaces, and the ML team already runs Kubeflow” β€” that’s AKS territory. App Service and Container Apps deliberately hide Kubernetes; AKS hands it to you.

The trade-off is operational: AKS gives you everything Kubernetes can do, but you maintain version upgrades, node pool sizing, networking, and the cluster’s health.

For AI-200, you don’t need to be a Kubernetes expert. You need to read and write basic manifests β€” Deployments, Services, Ingress, ConfigMaps, Secrets β€” and know how to plug AKS into ACR, Key Vault, and Microsoft Entra workload identity.

The five manifests every AI-200 candidate must read

Manifest kindWhat it doesMental model
DeploymentRuns N replicas of a container, manages rollouts”I want 3 of these running”
ServiceStable cluster IP / DNS name in front of pods”How other things in the cluster reach my pods”
IngressHTTP routing into the cluster from outside”How the internet reaches my Service”
ConfigMapNon-secret config data, injected as env vars or files”Settings, environment-specific values”
SecretSensitive values, base64-encoded, mounted similarly to ConfigMaps”Passwords, keys, tokens”

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: roo-vision
  labels: { app: roo-vision }
spec:
  replicas: 3
  selector:
    matchLabels: { app: roo-vision }
  template:
    metadata:
      labels: { app: roo-vision }
    spec:
      containers:
        - name: vision
          image: roo.azurecr.io/roo-vision:v3.4.1
          ports:
            - containerPort: 8000
          env:
            - name: LOG_LEVEL
              valueFrom:
                configMapKeyRef:
                  name: roo-config
                  key: LOG_LEVEL
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: roo-secrets
                  key: openai-key
          resources:
            requests: { cpu: "500m", memory: "1Gi" }
            limits: { cpu: "1", memory: "2Gi" }

Read this manifest as: β€œrun 3 replicas of roo-vision:v3.4.1. Each gets LOG_LEVEL from a ConfigMap and OPENAI_API_KEY from a Secret. Each is allowed 500 millicores of CPU and 1 GiB of RAM, with bursts up to 1 core and 2 GiB.”

Service

apiVersion: v1
kind: Service
metadata:
  name: roo-vision-svc
spec:
  selector: { app: roo-vision }
  ports:
    - port: 80
      targetPort: 8000
  type: ClusterIP

The Service gives the Deployment a stable internal address β€” roo-vision-svc.default.svc.cluster.local β€” that load-balances across all healthy pods.

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: roo-vision-ing
  annotations:
    kubernetes.io/ingress.class: webapprouting.kubernetes.azure.com
spec:
  rules:
    - host: vision.roo-robotics.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: roo-vision-svc
                port: { number: 80 }
  tls:
    - hosts: [vision.roo-robotics.com]
      secretName: roo-vision-tls

The Ingress publishes the Service to the public internet via the Application Routing add-on (the AKS-managed nginx ingress). For enterprise scenarios with WAF, swap to AGIC (Application Gateway Ingress Controller).

ConfigMap and Secret

apiVersion: v1
kind: ConfigMap
metadata: { name: roo-config }
data:
  LOG_LEVEL: info
  MODEL_NAME: phi-4-mini
---
apiVersion: v1
kind: Secret
metadata: { name: roo-secrets }
type: Opaque
data:
  openai-key: c2stMTIzNDU2Nzg=  # base64

Secrets in this raw form aren’t encrypted at rest by default in etcd. For real secrets, use the Secrets Store CSI driver to project Key Vault values directly into pods (next section).

Pulling images from ACR β€” Workload Identity OR cluster identity

Two patterns:

PatternHowWhen
Cluster-level integrationaz aks update --attach-acr <registry>Default β€” kubelet identity gets AcrPull on the registry
Workload IdentityFederate a service account to a User-Assigned Managed Identity, grant AcrPullWhen different namespaces / apps need different ACR access

For the exam, the default cluster-level integration is the most common scenario:

az aks update -n roo-aks -g roo-prod --attach-acr roo

That single command grants the AKS kubelet identity AcrPull on the registry β€” every pod in the cluster can pull from that ACR.

Secrets β€” the Secrets Store CSI driver

Native Kubernetes Secrets are base64-encoded blobs, not real secrets. The recommended pattern for Azure is the Secrets Store CSI driver, which projects Key Vault secrets as files (or env vars) inside pods.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata: { name: roo-kv-secrets }
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    useVMManagedIdentity: "false"
    clientID: "<workload-identity-client-id>"
    keyvaultName: roo-kv
    objects: |
      array:
        - |
          objectName: OpenAIKey
          objectType: secret
    tenantId: "<tenant-id>"

In the pod spec:

volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes: { secretProviderClass: roo-kv-secrets }
volumeMounts:
  - { name: secrets-store, mountPath: /mnt/secrets, readOnly: true }

The pod sees /mnt/secrets/OpenAIKey containing the live Key Vault value. Rotation in Key Vault β†’ next pod read picks up the new value (with the right rotation poller config).

GPU node pools for inference

For larger models or training:

az aks nodepool add \
  --cluster-name roo-aks --resource-group roo-prod \
  --name gpupool \
  --node-vm-size Standard_NC6s_v3 \
  --node-count 1 \
  --node-taints sku=gpu:NoSchedule \
  --labels accelerator=nvidia-tesla-v100

Pods that want GPU schedule onto this pool with a matching toleration:

spec:
  tolerations:
    - key: sku
      operator: Equal
      value: gpu
      effect: NoSchedule
  nodeSelector: { accelerator: nvidia-tesla-v100 }
  containers:
    - name: vision
      image: roo.azurecr.io/roo-vision:v3.4.1-cuda
      resources:
        limits:
          nvidia.com/gpu: 1

The taint keeps non-GPU workloads off expensive GPU nodes. Only pods that explicitly tolerate the taint AND request nvidia.com/gpu schedule there.

Key terms

Question

What does a Kubernetes Deployment do?

Click or press Enter to reveal answer

Answer

It runs N replicas of a Pod template, manages rolling updates, and self-heals (replaces pods that crash or get evicted). The standard wrapper around any long-running service. Specify replicas, selector labels, container image, env vars, resources, and probes.

Click to flip back

Question

What's the difference between a Service and an Ingress?

Click or press Enter to reveal answer

Answer

A Service exposes a Deployment internally to the cluster (ClusterIP) or externally as a load-balanced TCP/UDP endpoint (LoadBalancer / NodePort). An Ingress is a layer-7 HTTP/HTTPS router that sits in front of one or more Services β€” handles host/path routing and TLS termination.

Click to flip back

Question

What is `--attach-acr` in AKS?

Click or press Enter to reveal answer

Answer

A one-line command that grants the AKS cluster's kubelet identity the AcrPull role on a specified ACR. After running it, every pod in the cluster can pull images from that registry without imagePullSecrets. Most common ACR-AKS integration pattern.

Click to flip back

Question

What is the Secrets Store CSI driver?

Click or press Enter to reveal answer

Answer

A Kubernetes CSI driver that projects external secret stores (Azure Key Vault, AWS Secrets Manager, HashiCorp Vault) into pods as files. Combined with Workload Identity, it lets AKS pods read Key Vault secrets without storing them in native Kubernetes Secrets.

Click to flip back

Question

What are taints and tolerations?

Click or press Enter to reveal answer

Answer

A taint is a marker on a node that repels pods; a toleration is a marker on a pod that lets it tolerate (i.e., be scheduled on) tainted nodes. Used to keep expensive nodes (like GPU SKUs) reserved for specific workloads β€” only pods that tolerate the taint AND request the resource schedule there.

Click to flip back

Knowledge check

Knowledge Check

Theo's AKS cluster cannot pull a new image from ACR. Pods stay in `ImagePullBackOff`. The cluster previously pulled fine; only the registry has changed (a new ACR for production). What's the simplest fix?

Knowledge Check

Mira needs the inference pods to run only on GPU nodes (Standard_NC6s_v3). Other workloads must NOT land on those nodes. Which combination of mechanisms achieves this?

Knowledge Check

Lin's AKS deployment reads `OPENAI_API_KEY` from a native Kubernetes Secret. The security team wants the actual key stored only in Key Vault, with rotation visible in Key Vault audit logs. What's the recommended pattern?