Domain 4 β€” Module 5 of 8 63%
25 of 28 overall
Domain 4: Deploy and Maintain Data Pipelines and Workloads Free ⏱ ~14 min read

Testing & Databricks Asset Bundles (Declarative Automation Bundles)

Implement unit tests, integration tests, and end-to-end testing strategies. Package and deploy with Databricks Asset Bundles via CLI and REST APIs.

Testing strategy

Simple explanation

Testing is taste-testing your food at every stage of cooking.

Unit test: taste each ingredient individually. Integration test: taste the combined sauce. End-to-end test: taste the full dish. UAT: have a customer taste it before putting it on the menu.

Without testing, you serve bad data and only find out when the CEO’s dashboard is wrong.

Testing layers

Test TypeWhat It TestsHowWhen
Unit testIndividual functions/transformspytest with mock dataEvery commit
Integration testComponents working togetherTest tables in dev workspaceEvery PR merge
End-to-end testFull pipeline bronze β†’ goldRun pipeline on test data in stagingBefore production deploy
UATBusiness rules and output qualityStakeholders validate sample outputBefore production release

Unit testing example

# test_transforms.py
from transforms import clean_amount, validate_date

def test_clean_amount_removes_negatives():
    assert clean_amount(-50) is None
    assert clean_amount(100) == 100.0

def test_validate_date_rejects_future():
    assert validate_date("2099-01-01") is False
    assert validate_date("2026-04-01") is True

Integration testing

# Run in a dev workspace with test data
test_df = spark.createDataFrame([
    (1, "Alice", 100.0, "2026-04-01"),
    (2, None, -50.0, "2099-01-01"),  # should be filtered out
], ["id", "name", "amount", "date"])

result = run_silver_pipeline(test_df)
assert result.count() == 1  # only valid row
assert result.filter("name = 'Alice'").count() == 1

Databricks Asset Bundles (DABs)

Asset Bundles package your entire project into a deployable unit:

# databricks.yml β€” bundle configuration
bundle:
  name: freshmart-etl

workspace:
  host: https://adb-1234567890.1.azuredatabricks.net

resources:
  jobs:
    nightly_etl:
      name: "Freshmart Nightly ETL"
      tasks:
        - task_key: ingest
          notebook_task:
            notebook_path: ./notebooks/01_ingest.py
          job_cluster_key: etl_cluster
        - task_key: transform
          depends_on:
            - task_key: ingest
          notebook_task:
            notebook_path: ./notebooks/02_transform.py
          job_cluster_key: etl_cluster

  pipelines:
    quality_pipeline:
      name: "Freshmart Quality Pipeline"
      target: freshmart_silver
      libraries:
        - notebook:
            path: ./pipelines/quality_checks.sql

targets:
  dev:
    workspace:
      host: https://adb-dev.azuredatabricks.net
  prod:
    workspace:
      host: https://adb-prod.azuredatabricks.net

Deploy via CLI

# Validate the bundle
databricks bundle validate

# Deploy to dev environment
databricks bundle deploy --target dev

# Run a specific job
databricks bundle run nightly_etl --target dev

# Deploy to production
databricks bundle deploy --target prod

Deploy via REST API

Bundle deployment is primarily CLI-driven (databricks bundle deploy), and the CLI uses REST APIs internally. You don’t typically call the REST API directly for bundle deployment.

# The CLI abstracts REST API calls:
databricks bundle deploy --target prod
# ↕ internally calls multiple REST APIs:
#   - /api/2.1/jobs/create or /api/2.1/jobs/reset
#   - /api/2.0/workspace/import
#   - /api/2.0/pipelines/create or update
Exam tip: Bundles and REST APIs

The exam expects you to know that:

  • Bundles are deployed via CLI (databricks bundle deploy), not by calling REST APIs directly
  • The CLI uses REST APIs internally to create/update jobs, pipelines, and workspace objects
  • For programmatic deployment in CI/CD, the CLI is invoked in pipeline steps (GitHub Actions, Azure DevOps)
  • Direct REST API calls (e.g., /api/2.1/jobs/create) are used for individual resource management, not for deploying complete bundles
CI/CD with Asset Bundles

A typical CI/CD pipeline:

  1. Developer pushes code to feature branch
  2. CI pipeline (GitHub Actions/Azure DevOps) runs:
    • databricks bundle validate β€” check config syntax
    • pytest β€” run unit tests
    • databricks bundle deploy --target dev β€” deploy to dev
    • Integration tests in dev workspace
  3. PR merged β†’ deploy to staging, run E2E tests
  4. Release β†’ databricks bundle deploy --target prod
Question

What are the four testing levels for data engineering?

Click or press Enter to reveal answer

Answer

Unit tests (individual functions, every commit), integration tests (connected components, every PR), end-to-end tests (full pipeline, before deploy), UAT (business validation, before release).

Click to flip back

Question

What are Databricks Asset Bundles?

Click or press Enter to reveal answer

Answer

DABs package notebooks, jobs, pipelines, and configuration into a deployable unit defined in databricks.yml. Deploy via CLI (databricks bundle deploy) or REST API. Supports multiple environments (dev/staging/prod).

Click to flip back

Question

How do you deploy a bundle to different environments?

Click or press Enter to reveal answer

Answer

Define environments in databricks.yml with different workspace hosts. Deploy with: databricks bundle deploy --target dev (or staging/prod). Each target has its own workspace configuration.

Click to flip back

Knowledge check

Knowledge Check

Dr. Sarah Okafor needs to deploy Athena Group's ETL pipeline to three environments (dev, staging, prod) with the same code but different workspace URLs. Which tool should she use?


Next up: Monitoring Clusters & Troubleshooting β€” cluster monitoring, job repair, and Spark troubleshooting.