Domain 3 β€” Module 9 of 13 69%
15 of 25 overall
Domain 3: Design and Implement Build and Release Pipelines Free ⏱ ~12 min read

Safe Rollouts: Slots, Dependencies & Hotfix Paths

Ensure reliable deployments with dependency ordering, deployment slot swaps, hotfix planning, and resiliency strategies. Minimise downtime with load balancing and rolling updates.

Why Safe Rollouts Require Planning

Simple explanation

Think of moving house.

You cannot set up the TV before the power is connected. You cannot unpack kitchen boxes before the shelves are assembled. There is a natural order β€” electricity first, then furniture, then electronics. If you do it out of order, things break or you waste time redoing work.

Safe rollouts follow the same principle. Deploy the database changes before the API that needs them. Deploy the API before the frontend that calls it. Get the order wrong, and users see errors. Get it right, and nobody notices you shipped anything at all.

Dependency Deployment Ordering

When your application has multiple tiers (database, API, frontend, background workers), deployment order matters. The golden rule: deploy bottom-up β€” infrastructure and data layers first, presentation layers last.

The Deployment Order

1. Database schema changes (expand phase)
2. Background services / workers
3. Backend APIs
4. API gateways / BFF layers
5. Frontend applications
6. Database cleanup (contract phase β€” after old code is fully retired)

The Expand-Contract Pattern

The expand-contract pattern (also called parallel change) ensures backward compatibility during multi-service deployments:

Expand phase:

  • Add the new database column (nullable or with default value)
  • Deploy new API version that writes to BOTH old and new columns
  • Old and new API versions coexist safely

Contract phase (after all consumers updated):

  • Migrate remaining data from old column to new column
  • Remove old column
  • Remove backward-compatibility code

This eliminates the β€œdeploy database and API at the exact same millisecond” problem. Both versions work throughout the transition.

Scenario: Nadia orders Meridian's deployment

🏒 Nadia manages a claims processing system with four tiers: SQL Database, Claims API, Notification Service, and the Claims Portal (SPA).

Her YAML pipeline uses dependsOn to enforce the order:

stages:
  - stage: Database
    jobs:
      - job: MigrateSchema
  - stage: NotificationService
    dependsOn: Database
  - stage: ClaimsAPI
    dependsOn: Database
  - stage: Portal
    dependsOn:
      - ClaimsAPI
      - NotificationService

The Portal stage waits for BOTH ClaimsAPI and NotificationService to complete before deploying. If either fails, the Portal never deploys β€” preventing users from hitting a broken frontend.

Nadia also adds health check gates between stages. The ClaimsAPI stage does not complete until the deployed API passes a /health endpoint check. This prevents the Portal from deploying against an API that deployed but is not actually healthy.

Question

What is the expand-contract pattern in database deployments?

Click or press Enter to reveal answer

Answer

A two-phase approach for backward-compatible schema changes. EXPAND: add the new column/table alongside the old one, deploy code that writes to both. CONTRACT: after all consumers use the new schema, remove the old column/table and backward-compatibility code. This eliminates the need for simultaneous database and application deployments.

Click to flip back

Minimising Downtime

Load Balancing Strategies

StrategyHow It WorksDowntimeUse When
Deployment slotsDeploy to staging, swap to productionZeroAzure App Service
Rolling updateUpdate pods/VMs one at a time behind LBZero (if enough replicas)Kubernetes, VM Scale Sets
Blue-green via Traffic ManagerSwitch DNS-level traffic between regionsNear-zero (DNS TTL)Multi-region apps
Weighted routingSend percentage of traffic to new deploymentZeroAzure Front Door, Traffic Manager
Connection drainingFinish in-flight requests before removing instanceZeroAll LB-based strategies

Health Checks and Readiness Probes

Health checks ensure traffic only routes to healthy instances:

  • Liveness probe β€” is the process alive? Restart if not.
  • Readiness probe β€” can the instance serve traffic? Remove from LB if not.
  • Startup probe β€” is the app still starting up? Do not check liveness until startup completes.

In Azure App Service, configure the Health Check feature at /health β€” the platform automatically removes unhealthy instances from the load balancer rotation.

Question

What is the difference between a liveness probe and a readiness probe in Kubernetes?

Click or press Enter to reveal answer

Answer

A liveness probe checks if the container process is alive β€” if it fails, Kubernetes restarts the container. A readiness probe checks if the container can serve traffic β€” if it fails, Kubernetes removes the pod from the Service endpoints (no traffic routed to it). An app can be alive but not ready (e.g., still loading cache).

Click to flip back

Question

What is connection draining and why is it critical during deployments?

Click or press Enter to reveal answer

Answer

Connection draining (also called graceful shutdown) allows in-flight requests to complete before an instance is removed from the load balancer. Without it, active users get dropped connections mid-request during deployments. Azure Load Balancer, Application Gateway, and Kubernetes Services all support configurable drain timeouts.

Click to flip back

Hotfix Path Planning

A hotfix path is a pre-planned, expedited route from code fix to production that bypasses the normal release cadence. Every team needs one BEFORE the first emergency.

Standard Flow vs Hotfix Flow

Pre-plan both paths so the team knows exactly what to do under pressure
AspectStandard ReleaseHotfix Path
TriggerSprint end / release cadenceCritical production bug (P0/P1)
Branch sourceFeature branch from main/developHotfix branch from release tag or main
TestingFull regression, UAT, performanceTargeted fix validation + smoke tests
ApprovalNormal approval gatesExpedited approval (on-call lead + 1 reviewer)
EnvironmentsDev to Staging to ProductionHotfix env to Production (skip lower envs)
DeploymentScheduled maintenance windowImmediate β€” ASAP
Post-deployStandard monitoringEnhanced monitoring + incident bridge open
Merge backN/A (already in main)Cherry-pick or merge hotfix branch back to main AND develop

Hotfix Branching Approaches

Git Flow hotfix: Create hotfix/critical-fix from the main (or release) branch. Fix, test, deploy. Merge back into BOTH main and develop to prevent regression.

Trunk-based hotfix: Cherry-pick the fix commit from a feature branch (or commit directly to main if CI is fast enough). Deploy from main. The fix is already in the trunk.

Release branch hotfix: If you maintain release branches (release/2.4), apply the fix to the release branch, deploy, then cherry-pick to main for the next release.

Exam tip: Hotfix path questions

The exam often presents a scenario: β€œProduction is down. The team has a fix ready. What is the FASTEST safe path to production?”

Key principles:

  • A hotfix path MUST still have at least one approval gate (no rogue deploys)
  • Automated tests must run β€” but only the subset relevant to the fix
  • The fix MUST be merged back to the main development branch after deployment
  • Skip lower environments only if you have a dedicated hotfix environment with production-like config
  • Document the expedited process BEFORE you need it β€” decisions made during incidents are worse than decisions made calmly

Resiliency Strategies for Deployment

Resiliency is not just about the application β€” your deployment pipeline itself must be resilient.

Application Resiliency Patterns

PatternWhat It DoesWhen to Use
Retry with backoffRetry failed requests with increasing delaysTransient failures (network blips, throttling)
Circuit breakerStop calling a failing service, return fallbackDownstream service is consistently failing
BulkheadIsolate resources per consumer/featurePrevent one failing feature from taking down everything
Graceful degradationDisable non-critical features during partial outagesMaintain core functionality when dependencies fail
Immutable infrastructureNever patch in place β€” replace with new instancesEliminate configuration drift, ensure consistency

Pipeline Resiliency Patterns

  • Automatic rollback β€” if post-deployment health checks fail, automatically redeploy the previous version
  • Deployment gates β€” automated quality gates between stages (Azure Monitor alerts, SonarQube quality gate, custom API checks)
  • Approval timeouts β€” approvals expire after a window to prevent stale deployments sitting in the pipeline
  • Retry on transient failure β€” configure pipeline tasks to retry on infrastructure errors (network timeout, agent unavailable)
Question

What is the circuit breaker pattern and how does it relate to deployment resiliency?

Click or press Enter to reveal answer

Answer

A circuit breaker monitors calls to a downstream service. After a threshold of failures, it 'opens' and immediately returns a fallback response instead of attempting the call. After a cooldown, it allows a test call through (half-open state). This prevents cascading failures during deployments when a newly deployed service is unhealthy. Azure API Management and Polly (.NET) implement this pattern.

Click to flip back

Automatic Rollback Configuration

In Azure Pipelines, configure automatic rollback using the on: failure hook:

stages:
  - stage: Production
    jobs:
      - deployment: Deploy
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureWebApp@1
                  inputs:
                    appName: 'claims-api'
            on:
              failure:
                steps:
                  - task: AzureAppServiceManage@0
                    inputs:
                      Action: 'Swap Slots'
                      WebAppName: 'claims-api'
                      SourceSlot: 'production'
                      TargetSlot: 'staging'

In GitHub Actions, use a separate rollback job that runs if: failure() and references the previous stable deployment.

Question

How do deployment gates work in Azure Pipelines?

Click or press Enter to reveal answer

Answer

Deployment gates are automated checks evaluated between pipeline stages. Examples: query Azure Monitor for active alerts (no alerts = pass), check SonarQube quality gate, invoke a REST API that returns pass/fail. Gates are evaluated repeatedly at a configurable interval until they pass, timeout, or the deployment is cancelled. They prevent promoting a deployment that does not meet quality criteria.

Click to flip back

Knowledge Check

Nadia's team deploys a multi-tier application: SQL Database, Claims API, Notification Service, and Portal SPA. The Portal calls the Claims API, which calls the Database. What is the correct deployment order?

Knowledge Check

Production is down due to a critical bug. The team has a fix ready and tested locally. The normal release process takes 4 hours with full regression testing. What should the team do?

Knowledge Check

Jordan configures a Kubernetes deployment with both liveness and readiness probes. During a rolling update, a new pod starts but its readiness probe fails for 30 seconds while caches warm up. What happens?

Next up: Deployment Implementations: Containers, Scripts and Databases