Domain 2 β€” Module 2 of 7 29%
9 of 27 overall
Domain 2: Relational Data on Azure Free ⏱ ~10 min read

Normalization: Why Duplicate Data is Bad

Normalization is the art of organising data to eliminate redundancy. It's fundamental to good database design β€” and a guaranteed exam topic.

What is normalization?

Simple explanation

Normalization means β€œdon’t write the same thing twice.”

Imagine Tom writes every driver’s full address inside every delivery record. If Sarah Park moves house, Tom has to update hundreds of delivery records β€” miss one and the data is wrong.

Normalization says: store Sarah’s address ONCE in the Drivers table, and just reference her DriverID in the Deliveries table. Now you update one place and everything stays consistent.

The problem normalization solves

Before normalization (bad design)

Jake’s early CloudPulse database stored everything in one table:

OrderIDCustomerNameCustomerEmailCustomerCityProductPrice
5001Meridian Healthinfo@meridian.co.nzAucklandPro Plan49.00
5002Meridian Healthinfo@meridian.co.nzAucklandAPI Add-on19.00
5003Kiwi Fitnesshello@kiwi.fitWellingtonPro Plan49.00

Problems:

  • Redundancy: β€œMeridian Health” and their email/city are stored in every order row
  • Update anomaly: If Meridian Health changes their email, Jake must update EVERY order row. Miss one? Inconsistent data.
  • Delete anomaly: If Jake deletes all orders for Kiwi Fitness, he loses the customer’s contact info entirely
  • Insert anomaly: Jake can’t add a new customer until they place an order

After normalization (good design)

Split into separate tables:

Customers table:

CustomerIDNameEmailCity
C001Meridian Healthinfo@meridian.co.nzAuckland
C002Kiwi Fitnesshello@kiwi.fitWellington

Orders table:

OrderIDCustomerID (FK)ProductPrice
5001C001Pro Plan49.00
5002C001API Add-on19.00
5003C002Pro Plan49.00

Now each fact lives in ONE place. Update Meridian’s email once, and it’s correct everywhere.

Normal forms (simplified)

Normalization is done in stages called normal forms. For DP-900, you need to understand the concept, not memorise every rule:

Normal FormKey RulePlain English
1NFNo repeating groups; each cell has one valueEvery column stores a single value, not a list
2NFEvery non-key column depends on the WHOLE primary keyNo partial dependencies in tables with composite keys
3NFEvery non-key column depends ONLY on the primary keyNo column depends on another non-key column
Normal forms by example

1NF violation β€” a β€œProducts” column storing β€œPro Plan, API Add-on” in one cell. Fix: one row per product.

2NF violation β€” a table with composite key (OrderID + ProductID) where CustomerName depends only on OrderID, not on the full key. Fix: move CustomerName to a separate table.

3NF violation β€” a table where City depends on PostalCode, which depends on CustomerID. City doesn’t depend directly on the primary key. Fix: move City and PostalCode to a separate location table.

In practice, most well-designed databases achieve 3NF. That’s the sweet spot between data integrity and query simplicity.

Normalization vs denormalization

Normalization minimises duplication β€” great for transactional databases. But analytical databases (data warehouses) often denormalize on purpose β€” combining tables to speed up read-heavy queries.

Normalized vs denormalized data
FeatureNormalizedDenormalized
DuplicationMinimal β€” each fact stored onceIntentional β€” data repeated for speed
UpdatesFast β€” update one placeSlow β€” update many places
Read queriesRequire joins (slower for big queries)Pre-joined data (faster reads)
Best forTransactional (OLTP) databasesAnalytical (OLAP) data warehouses
Integrity riskLow β€” constraints enforcedHigher β€” duplicates can get out of sync
Exam tip: normalization questions

The exam asks about normalization at a conceptual level:

  • β€œReduce data redundancy” β†’ Normalization
  • β€œPrevent update anomalies” β†’ Normalization
  • β€œSplit one large table into multiple related tables” β†’ Normalization
  • β€œCombine tables for faster reporting” β†’ Denormalization
  • β€œOptimise for analytical queries” β†’ Denormalization

You don’t need to identify specific normal forms (1NF, 2NF, 3NF) on the exam β€” just understand WHY normalization exists and when denormalization is appropriate.

Flashcards

Question

What is normalization in database design?

Click or press Enter to reveal answer

Answer

The process of organising data to reduce redundancy by splitting data into multiple related tables. Each fact is stored in one place, connected through key relationships.

Click to flip back

Question

What is an update anomaly?

Click or press Enter to reveal answer

Answer

An inconsistency that occurs when the same data is stored in multiple places and one copy is updated while others are not. Normalization prevents this by storing each fact once.

Click to flip back

Question

When would you denormalize data on purpose?

Click or press Enter to reveal answer

Answer

In analytical databases (OLAP/data warehouses) where read performance is more important than write efficiency. Pre-joining data into fewer tables speeds up aggregation queries.

Click to flip back

Knowledge check

Knowledge Check

Jake notices that when he updates a customer's email address, he has to change it in 47 different order records. What database design problem is this?

Knowledge Check

Priya's data warehouse combines customer, product, and sales data into a single wide table for faster reporting. This is an example of:

Next up: SQL Basics: SELECT, INSERT, UPDATE, DELETE β€” learn the language that relational databases speak.