AKS for AI Apps: Deploy with Manifests

When AKS — and not Container Apps

Simple explanation

Pick AKS when you need real Kubernetes. If “we have Helm charts, custom CRDs, GPU node pools, multi-tenant namespaces, and the ML team already runs Kubeflow” — that’s AKS territory. App Service and Container Apps deliberately hide Kubernetes; AKS hands it to you.

The trade-off is operational: AKS gives you everything Kubernetes can do, but you maintain version upgrades, node pool sizing, networking, and the cluster’s health.

For AI-200, you don’t need to be a Kubernetes expert. You need to read and write basic manifests — Deployments, Services, Ingress, ConfigMaps, Secrets — and know how to plug AKS into ACR, Key Vault, and Microsoft Entra workload identity.

The five manifests every AI-200 candidate must read

Manifest kind	What it does	Mental model
`Deployment`	Runs N replicas of a container, manages rollouts	”I want 3 of these running”
`Service`	Stable cluster IP / DNS name in front of pods	”How other things in the cluster reach my pods”
`Ingress`	HTTP routing into the cluster from outside	”How the internet reaches my Service”
`ConfigMap`	Non-secret config data, injected as env vars or files	”Settings, environment-specific values”
`Secret`	Sensitive values, base64-encoded, mounted similarly to ConfigMaps	”Passwords, keys, tokens”

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: roo-vision
  labels: { app: roo-vision }
spec:
  replicas: 3
  selector:
    matchLabels: { app: roo-vision }
  template:
    metadata:
      labels: { app: roo-vision }
    spec:
      containers:
        - name: vision
          image: roo.azurecr.io/roo-vision:v3.4.1
          ports:
            - containerPort: 8000
          env:
            - name: LOG_LEVEL
              valueFrom:
                configMapKeyRef:
                  name: roo-config
                  key: LOG_LEVEL
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: roo-secrets
                  key: openai-key
          resources:
            requests: { cpu: "500m", memory: "1Gi" }
            limits: { cpu: "1", memory: "2Gi" }

Read this manifest as: “run 3 replicas of roo-vision:v3.4.1. Each gets LOG_LEVEL from a ConfigMap and OPENAI_API_KEY from a Secret. Each is allowed 500 millicores of CPU and 1 GiB of RAM, with bursts up to 1 core and 2 GiB.”

Service

apiVersion: v1
kind: Service
metadata:
  name: roo-vision-svc
spec:
  selector: { app: roo-vision }
  ports:
    - port: 80
      targetPort: 8000
  type: ClusterIP

The Service gives the Deployment a stable internal address — roo-vision-svc.default.svc.cluster.local — that load-balances across all healthy pods.

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: roo-vision-ing
  annotations:
    kubernetes.io/ingress.class: webapprouting.kubernetes.azure.com
spec:
  rules:
    - host: vision.roo-robotics.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: roo-vision-svc
                port: { number: 80 }
  tls:
    - hosts: [vision.roo-robotics.com]
      secretName: roo-vision-tls

The Ingress publishes the Service to the public internet via the Application Routing add-on (the AKS-managed nginx ingress). For enterprise scenarios with WAF, swap to AGIC (Application Gateway Ingress Controller).

ConfigMap and Secret

apiVersion: v1
kind: ConfigMap
metadata: { name: roo-config }
data:
  LOG_LEVEL: info
  MODEL_NAME: phi-4-mini
---
apiVersion: v1
kind: Secret
metadata: { name: roo-secrets }
type: Opaque
data:
  openai-key: c2stMTIzNDU2Nzg=  # base64

Secrets in this raw form aren’t encrypted at rest by default in etcd. For real secrets, use the Secrets Store CSI driver to project Key Vault values directly into pods (next section).

Pulling images from ACR — Workload Identity OR cluster identity

Two patterns:

Pattern	How	When
Cluster-level integration	`az aks update --attach-acr <registry>`	Default — kubelet identity gets AcrPull on the registry
Workload Identity	Federate a service account to a User-Assigned Managed Identity, grant AcrPull	When different namespaces / apps need different ACR access

For the exam, the default cluster-level integration is the most common scenario:

az aks update -n roo-aks -g roo-prod --attach-acr roo

That single command grants the AKS kubelet identity AcrPull on the registry — every pod in the cluster can pull from that ACR.

Secrets — the Secrets Store CSI driver

Native Kubernetes Secrets are base64-encoded blobs, not real secrets. The recommended pattern for Azure is the Secrets Store CSI driver, which projects Key Vault secrets as files (or env vars) inside pods.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata: { name: roo-kv-secrets }
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    useVMManagedIdentity: "false"
    clientID: "<workload-identity-client-id>"
    keyvaultName: roo-kv
    objects: |
      array:
        - |
          objectName: OpenAIKey
          objectType: secret
    tenantId: "<tenant-id>"

In the pod spec:

volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes: { secretProviderClass: roo-kv-secrets }
volumeMounts:
  - { name: secrets-store, mountPath: /mnt/secrets, readOnly: true }

The pod sees /mnt/secrets/OpenAIKey containing the live Key Vault value. Rotation in Key Vault → next pod read picks up the new value (with the right rotation poller config).

GPU node pools for inference

For larger models or training:

az aks nodepool add \
  --cluster-name roo-aks --resource-group roo-prod \
  --name gpupool \
  --node-vm-size Standard_NC6s_v3 \
  --node-count 1 \
  --node-taints sku=gpu:NoSchedule \
  --labels accelerator=nvidia-tesla-v100

Pods that want GPU schedule onto this pool with a matching toleration:

spec:
  tolerations:
    - key: sku
      operator: Equal
      value: gpu
      effect: NoSchedule
  nodeSelector: { accelerator: nvidia-tesla-v100 }
  containers:
    - name: vision
      image: roo.azurecr.io/roo-vision:v3.4.1-cuda
      resources:
        limits:
          nvidia.com/gpu: 1

The taint keeps non-GPU workloads off expensive GPU nodes. Only pods that explicitly tolerate the taint AND request nvidia.com/gpu schedule there.