Domain 1 β€” Module 5 of 7 71%
5 of 26 overall
Domain 1: Prepare the Data Free ⏱ ~13 min read

Combining Data: Merge, Append & Queries

Learn how to merge tables (like SQL JOIN), append tables (like UNION), handle semi-structured data, and choose between reference and duplicate queries in Power Query.

Bringing data together

Simple explanation

Think of two puzzle pieces.

Merge is like connecting two pieces side by side β€” matching them on a shared edge (a common column). You get a wider table. This is like a SQL JOIN.

Append is like stacking one piece on top of another β€” both have the same shape (same columns), and you just add more rows. This is like a SQL UNION.

This module covers both operations, plus how to handle messy JSON/XML data and when to use reference vs duplicate queries.

Merge Queries: combining tables side by side

Riley at Coastal Fresh (πŸ›’) has two tables:

Sales table: OrderID, ProductID, Quantity, Revenue Products table: ProductID, ProductName, Category, Supplier

She needs product names on her sales data. She merges Sales with Products on the ProductID column β€” this adds product details to each sales row.

The six join types

Left Outer is the default and most common join type in Power Query
Join TypeWhat It ReturnsSQL Equivalent
Left OuterAll rows from the first table, matched rows from second (nulls where no match)LEFT JOIN
Right OuterAll rows from second table, matched rows from firstRIGHT JOIN
InnerOnly rows that match in both tablesINNER JOIN
Full OuterAll rows from both tables (nulls where no match on either side)FULL OUTER JOIN
Left AntiOnly rows from first table that have NO match in secondLEFT JOIN WHERE right IS NULL
Right AntiOnly rows from second table that have NO match in firstRIGHT JOIN WHERE left IS NULL

How to Merge:

  1. Home β†’ Merge Queries (or Merge Queries as New)
  2. Select the first table and key column
  3. Select the second table and matching column
  4. Choose join type (Left Outer is default)
  5. Expand the resulting nested column to select which fields to include
Exam tip: anti joins

Left Anti and Right Anti joins are exam favourites. They find rows that DON’T have a match.

Example: β€œFind all products that have never been sold.” Merge the Products table (first) with the Sales table (second) using a Left Anti join on ProductID. The result contains only products with zero sales.

Append Queries: stacking tables on top

Nadia at Prism Agency (πŸ“Š) receives campaign data from Google Ads, Meta Ads, and LinkedIn β€” three separate tables with identical column structures. She appends them into a single unified campaign table.

Append = same columns, more rows.

How to Append:

  1. Home β†’ Append Queries (or Append Queries as New)
  2. Choose β€œTwo tables” or β€œThree or more tables”
  3. Select the tables to stack
  4. Columns are matched by name β€” extra columns become null

Merge vs Append at a glance:

  • Merge: Different columns, same key β†’ wider table
  • Append: Same columns, different data β†’ taller table

Semi-structured data: JSON and XML

Dr. Ethan Moss at Bayview Medical Centre (πŸ₯) connects to a REST API that returns patient appointment data in JSON format. The raw import shows a single column of nested records.

Converting JSON to a table:

  1. Power Query imports JSON as a single column of Record or List values
  2. Click the expand icon (↔) on the column header
  3. Select which fields to include
  4. Repeat for nested objects

Key M functions for semi-structured data:

  • Json.Document() β€” parse JSON text into records/lists
  • Table.FromRecords() β€” convert a list of records into a table
  • Record.FieldValues() β€” extract values from a record
Real-world: expanding nested JSON

API responses often have nested structures. For example, a patient record might contain a name field and an appointments array, where each appointment has a date and doctor name.

You expand the top-level record first to get the name, then expand the appointments list, which creates new rows β€” one per appointment per patient. This row multiplication is normal when expanding nested lists.

Reference vs Duplicate queries

When you need to create a variation of an existing query, you have two options:

Reference reuses work; Duplicate starts fresh
FeatureReference QueryDuplicate Query
How it worksPoints back to the original query's output as its starting pointCreates an independent copy of all steps from the original
DependencyDepends on the original β€” if original changes, reference updates tooIndependent β€” changes to original don't affect the duplicate
ProcessingOriginal runs once, reference builds on its resultBoth queries run their full step sequences independently
Use whenYou want to branch from a common base (e.g., filter the same cleaned data differently)You want a completely independent copy to modify freely
PerformanceMore efficient β€” avoids re-processing shared stepsLess efficient β€” duplicates all processing work

Example: Kenji at Apex Manufacturing (🏭) has a clean FactoryData query. He creates:

  • A Reference query called TokyoFactory that filters to Tokyo
  • A Reference query called ShanghaiFactory that filters to Shanghai

Both reference queries share the same cleaning steps from FactoryData. If he adds a new cleaning step to the original, both references benefit automatically.

Knowledge check

Question

What's the difference between Merge and Append?

Click or press Enter to reveal answer

Answer

Merge combines columns from two tables based on a matching key (like SQL JOIN β€” makes tables wider). Append stacks rows from tables with the same columns (like SQL UNION β€” makes tables taller).

Click to flip back

Question

What join type finds rows that DON'T have a match?

Click or press Enter to reveal answer

Answer

Anti joins. Left Anti returns rows from the first table with no match in the second. Right Anti returns rows from the second table with no match in the first.

Click to flip back

Question

What's the difference between a Reference query and a Duplicate query?

Click or press Enter to reveal answer

Answer

Reference points back to the original query's output (dependent, efficient). Duplicate creates an independent copy of all steps (independent, less efficient). Use Reference when branching from a shared base.

Click to flip back

Knowledge Check

Riley has a Products table and a Sales table. She wants to find all products that have never been sold. Which operation should she use?

Knowledge Check

Nadia needs to create two filtered versions of a cleaned campaign dataset β€” one for Google Ads and one for Meta Ads. She wants changes to the shared cleaning steps to apply to both. What should she use?

Next up: Fact Tables, Dimension Tables and Keys β€” design the right table structure for Power BI.