Domain 1 β€” Module 5 of 8 63%
5 of 27 overall
Domain 1: Plan and Manage an Azure AI Solution Free ⏱ ~10 min read

Deploying Models & CI/CD

Models don't deploy themselves. Learn how to configure model and agent deployments in Foundry, and integrate your AI projects into CI/CD pipelines for repeatable, reliable releases.

Deploying models and agents

Simple explanation

Deploying a model is like installing an app on a server β€” you pick the version, configure the settings, and make it available to users.

In Foundry, you choose a model from the catalog, give it a deployment name, set capacity limits, and it gets an API endpoint. Same for agents β€” you define the agent, deploy it, and it gets an endpoint your app can call.

CI/CD means automating this process so every code change is tested and deployed automatically β€” no manual clicking in the portal.

Model deployment configuration

When deploying a model in Foundry, you configure:

SettingWhat It ControlsExample
Deployment nameThe identifier your app uses to call this model”gpt4o-prod”, β€œphi4-staging”
Model versionWhich version of the model to useGPT-4o 2024-11-20
Deployment typeServerless or provisioned throughputProvisioned for production
Rate limit / TPMMaximum tokens per minute80,000 TPM for prod
Content filterWhich safety filters to applyDefault or custom configuration
RegionWhere the model runsEast US 2
Exam tip: Deployment names matter

Your application code references the deployment name, not the model name. This means you can swap model versions (GPT-4o to GPT-4.1) without changing application code β€” just update the deployment to point to the new model version.

The exam may test this pattern: β€œHow can you upgrade a model version without modifying application code?” Answer: Update the model version on the existing deployment name.

Agent deployment

Agents deploy differently from raw models. An agent deployment includes:

ComponentWhat Gets Deployed
Agent definitionInstructions, model reference, tool schemas
Tool connectionsAPI endpoints, function definitions, knowledge sources
ConfigurationTemperature, max tokens, safety settings
VersionAgent version for rollback capability

CI/CD for AI solutions

AI development with and without CI/CD
FeatureWithout CI/CDWith CI/CD
Deploy processManual portal clicksAutomated pipeline triggered by git push
TestingHope it worksAutomated evaluation (quality, safety, groundedness)
ConsistencyDifferent every timeIdentical across environments
RollbackManually redeploy old versionOne-click or automatic on failure
Audit trailWho changed what? Good luckFull git history + pipeline logs

CI/CD pipeline stages for AI

StageWhat HappensTools
BuildPackage application code and agent definitionsGitHub Actions, Azure DevOps
EvaluateRun automated quality, safety, and groundedness testsFoundry Evaluation SDK
Deploy to stagingPush to staging Foundry ProjectAzure CLI, Foundry SDK
Integration testVerify end-to-end with real API callspytest, custom test suites
Promote to productionDeploy to prod after approvalManual gate or auto-promote
MonitorWatch for drift, errors, safety eventsAzure Monitor, Foundry tracing
Real-world example: Kai's CI/CD pipeline

Kai sets up a GitHub Actions pipeline for the logistics AI platform:

  1. On pull request: Run Foundry evaluations against test scenarios (20 predefined questions + expected answers)
  2. On merge to main: Deploy to staging Foundry Project, run integration tests
  3. Manual approval: Team lead reviews evaluation scores before production
  4. On approval: Deploy to production, update model deployment, run smoke tests
  5. On failure: Auto-rollback to previous deployment version

The whole process runs in under 15 minutes. No portal clicking required.

Key terms

Question

What is a model deployment name?

Click or press Enter to reveal answer

Answer

The identifier your application uses to call a deployed model. By referencing the deployment name (not the model name), you can swap model versions without changing application code.

Click to flip back

Question

What is provisioned throughput measured in?

Click or press Enter to reveal answer

Answer

Provisioned Throughput Units (PTU). You reserve a fixed number of PTUs, which guarantee a model-specific Tokens Per Minute (TPM) rate and consistent latency for production workloads.

Click to flip back

Question

How does CI/CD work for AI solutions?

Click or press Enter to reveal answer

Answer

Automated pipelines that build, evaluate (quality/safety/groundedness), deploy to staging, run integration tests, and promote to production. Uses Foundry SDK and evaluation framework for AI-specific testing.

Click to flip back

Knowledge check

Knowledge Check

MediaForge wants to upgrade their content generation model from GPT-4o to GPT-4.1 without changing any application code. What should they do?

Knowledge Check

Atlas Financial's compliance team requires that every AI model change is auditable and can be rolled back within 5 minutes. Which practice best supports this?