Testing & Databricks Asset Bundles (Declarative Automation Bundles)
Implement unit tests, integration tests, and end-to-end testing strategies. Package and deploy with Databricks Asset Bundles via CLI and REST APIs.
Testing strategy
Testing is taste-testing your food at every stage of cooking.
Unit test: taste each ingredient individually. Integration test: taste the combined sauce. End-to-end test: taste the full dish. UAT: have a customer taste it before putting it on the menu.
Without testing, you serve bad data and only find out when the CEOβs dashboard is wrong.
Testing layers
| Test Type | What It Tests | How | When |
|---|---|---|---|
| Unit test | Individual functions/transforms | pytest with mock data | Every commit |
| Integration test | Components working together | Test tables in dev workspace | Every PR merge |
| End-to-end test | Full pipeline bronze β gold | Run pipeline on test data in staging | Before production deploy |
| UAT | Business rules and output quality | Stakeholders validate sample output | Before production release |
Unit testing example
# test_transforms.py
from transforms import clean_amount, validate_date
def test_clean_amount_removes_negatives():
assert clean_amount(-50) is None
assert clean_amount(100) == 100.0
def test_validate_date_rejects_future():
assert validate_date("2099-01-01") is False
assert validate_date("2026-04-01") is True
Integration testing
# Run in a dev workspace with test data
test_df = spark.createDataFrame([
(1, "Alice", 100.0, "2026-04-01"),
(2, None, -50.0, "2099-01-01"), # should be filtered out
], ["id", "name", "amount", "date"])
result = run_silver_pipeline(test_df)
assert result.count() == 1 # only valid row
assert result.filter("name = 'Alice'").count() == 1
Databricks Asset Bundles (DABs)
Asset Bundles package your entire project into a deployable unit:
# databricks.yml β bundle configuration
bundle:
name: freshmart-etl
workspace:
host: https://adb-1234567890.1.azuredatabricks.net
resources:
jobs:
nightly_etl:
name: "Freshmart Nightly ETL"
tasks:
- task_key: ingest
notebook_task:
notebook_path: ./notebooks/01_ingest.py
job_cluster_key: etl_cluster
- task_key: transform
depends_on:
- task_key: ingest
notebook_task:
notebook_path: ./notebooks/02_transform.py
job_cluster_key: etl_cluster
pipelines:
quality_pipeline:
name: "Freshmart Quality Pipeline"
target: freshmart_silver
libraries:
- notebook:
path: ./pipelines/quality_checks.sql
targets:
dev:
workspace:
host: https://adb-dev.azuredatabricks.net
prod:
workspace:
host: https://adb-prod.azuredatabricks.net
Deploy via CLI
# Validate the bundle
databricks bundle validate
# Deploy to dev environment
databricks bundle deploy --target dev
# Run a specific job
databricks bundle run nightly_etl --target dev
# Deploy to production
databricks bundle deploy --target prod
Deploy via REST API
Bundle deployment is primarily CLI-driven (databricks bundle deploy), and the CLI uses REST APIs internally. You donβt typically call the REST API directly for bundle deployment.
# The CLI abstracts REST API calls:
databricks bundle deploy --target prod
# β internally calls multiple REST APIs:
# - /api/2.1/jobs/create or /api/2.1/jobs/reset
# - /api/2.0/workspace/import
# - /api/2.0/pipelines/create or update
Exam tip: Bundles and REST APIs
The exam expects you to know that:
- Bundles are deployed via CLI (
databricks bundle deploy), not by calling REST APIs directly - The CLI uses REST APIs internally to create/update jobs, pipelines, and workspace objects
- For programmatic deployment in CI/CD, the CLI is invoked in pipeline steps (GitHub Actions, Azure DevOps)
- Direct REST API calls (e.g.,
/api/2.1/jobs/create) are used for individual resource management, not for deploying complete bundles
CI/CD with Asset Bundles
A typical CI/CD pipeline:
- Developer pushes code to feature branch
- CI pipeline (GitHub Actions/Azure DevOps) runs:
databricks bundle validateβ check config syntaxpytestβ run unit testsdatabricks bundle deploy --target devβ deploy to dev- Integration tests in dev workspace
- PR merged β deploy to staging, run E2E tests
- Release β
databricks bundle deploy --target prod
Knowledge check
Dr. Sarah Okafor needs to deploy Athena Group's ETL pipeline to three environments (dev, staging, prod) with the same code but different workspace URLs. Which tool should she use?
Next up: Monitoring Clusters & Troubleshooting β cluster monitoring, job repair, and Spark troubleshooting.