Domain 3 β€” Module 2 of 10 20%
12 of 28 overall
Domain 3: Prepare and Process Data Free ⏱ ~14 min read

SCD, Granularity & Temporal Tables

Slowly changing dimensions, choosing the right granularity, and designing temporal history tables β€” the data modeling patterns that exam scenarios are built on.

Slowly changing dimensions

Simple explanation

A customer moves house. What happens to their old address in your database?

That’s the SCD problem. Dimension data (customers, products, stores) changes slowly over time. You have three choices:

  • Type 1: Overwrite the old value. Simple, but you lose history. β€œWhat was their address last year?” β€” can’t answer.
  • Type 2: Keep both versions. Add a new row with the new address, mark the old row as expired. You can see both past and present.
  • Type 3: Add a column for the old value. Quick to query but limited β€” you can only track one previous value.
FeatureSCD Type 1SCD Type 2SCD Type 3
History preserved?NoYes (all versions)Partial (one previous)
Storage growthConstantGrows with changesConstant
ImplementationUPDATE in placeINSERT new row + expire oldUPDATE + shift columns
Query complexitySimpleNeed WHERE is_current = TRUEModerate
Best forData where history doesn't matterCustomer addresses, pricing, org changesRarely used in practice
Delta Lake approachMERGE with UPDATEMERGE with INSERT + UPDATEMERGE with UPDATE

SCD Type 2 in Delta Lake

-- SCD Type 2: Merge new customer data preserving history
MERGE INTO dim_customer AS target
USING staging_customer AS source
ON target.customer_id = source.customer_id AND target.is_current = TRUE

-- When existing record changed: expire old, insert new
WHEN MATCHED AND (target.address != source.address OR target.city != source.city) THEN
  UPDATE SET
    target.is_current = FALSE,
    target.effective_end = CURRENT_DATE()

-- When no match: insert new record
WHEN NOT MATCHED THEN
  INSERT (customer_id, name, address, city, is_current, effective_start, effective_end)
  VALUES (source.customer_id, source.name, source.address, source.city,
          TRUE, CURRENT_DATE(), DATE('9999-12-31'));

Mei Lin implements SCD Type 2 for Freshmart’s store dimension β€” when a store changes its category (from β€œExpress” to β€œSupermarket”), both the old and new versions are preserved for historical reporting.

Granularity decisions

Granularity is the level of detail in your table β€” the β€œgrain” of each row.

GranularityRow RepresentsExample
Transaction-levelOne event/transactionEach POS scan at Freshmart
Daily aggregateOne day’s summaryTotal sales per store per day
Monthly aggregateOne month’s summaryMonthly revenue per region

Choosing granularity:

  • Finer grain (transaction) β†’ more flexibility for analysis, more storage, slower aggregation queries
  • Coarser grain (monthly) β†’ less storage, faster queries, but can’t drill down to individual transactions

Exam tip: Choose the finest granularity that your business requirements demand. You can always aggregate up from transactions to daily/monthly, but you can’t disaggregate monthly back to transactions.

Temporal (history) tables

A temporal table records every version of a row with timestamps:

CREATE TABLE prod_sales.curated.product_history (
  product_id INT,
  product_name STRING,
  price DECIMAL(10,2),
  category STRING,
  valid_from TIMESTAMP,
  valid_to TIMESTAMP,
  is_current BOOLEAN
);

This enables point-in-time queries: β€œWhat was the price of product X on March 1st?”

-- Point-in-time query
SELECT product_name, price
FROM product_history
WHERE product_id = 42
  AND valid_from <= '2026-03-01'
  AND valid_to > '2026-03-01';
Temporal tables vs Delta time travel

Don’t confuse these:

  • Temporal table β€” a table YOU design with valid_from/valid_to columns to track business history. Permanent. Queryable forever.
  • Delta time travel β€” Delta Lake’s built-in version history. Governed by retention settings. Intended for data recovery and auditing, NOT for business history.

Exam pattern: β€œTrack how customer data changes over time for business reporting” β†’ temporal table (SCD Type 2). β€œRecover data accidentally deleted yesterday” β†’ Delta time travel.

Question

What are the three SCD types and when should you use each?

Click or press Enter to reveal answer

Answer

Type 1: Overwrite (no history needed). Type 2: Add rows with effective dates (full history for reporting). Type 3: Add 'previous' columns (rarely used, limited history).

Click to flip back

Question

How do you choose the right granularity for a table?

Click or press Enter to reveal answer

Answer

Choose the finest granularity your business requires. You can aggregate up (transactions β†’ daily β†’ monthly) but can't disaggregate down. Finer grain = more flexibility but more storage.

Click to flip back

Question

What is the difference between a temporal table and Delta time travel?

Click or press Enter to reveal answer

Answer

Temporal table: business-designed history with valid_from/valid_to columns, permanent. Delta time travel: system version history governed by retention, for recovery/auditing.

Click to flip back

Knowledge check

Knowledge Check

Freshmart changes its store classification system. Mei Lin needs to track how each store's category changed over time so historical reports show the correct category at the time of each sale. Which approach should she use?

Knowledge Check

Ravi is designing a sales fact table for DataPulse clients. Marketing wants to know which promotions drove individual purchases. Finance only needs monthly revenue by region. What granularity should Ravi choose?


Next up: Partitioning, Clustering & Table Optimization β€” partitioning schemes, liquid clustering, Z-ordering, and deletion vectors.