Domain 3 β€” Module 2 of 3 67%
16 of 28 overall
Domain 3: Integrate and Move Data Free ⏱ ~14 min read

Analytical Workloads: Synapse Link and Fabric Mirroring

Implement HTAP analytics on Cosmos DB data using Azure Synapse Link's auto-synced analytical store and Microsoft Fabric mirroring β€” no ETL pipelines required.

The problem: analytics vs transactions

Simple explanation

Running analytics on your live database is like doing a full inventory count while the store is open. Customers bump into counters, shelves get blocked, everything slows down.

Synapse Link creates a second copy of your data in a format optimised for analytics (columns instead of documents). This copy auto-syncs from your live data, so analysts query the copy while your app runs at full speed.

Amara’s analytics challenge

πŸ“‘ Amara at SensorFlow ingests 500M sensor events per day. Her data scientist TomΓ‘s wants to run daily aggregations β€” average temperature per device, anomaly detection, trend analysis. But running these queries against the transactional store would consume massive RU/s and slow down real-time ingestion.

Synapse Link is the answer: TomΓ‘s queries the analytical store via Synapse or Fabric while Amara’s ingestion runs unimpacted.

⚠️ Important (2025): Azure Synapse Link for Cosmos DB is no longer supported for new projects. Microsoft recommends Azure Cosmos DB Mirroring for Microsoft Fabric instead, which is now GA and provides the same zero-ETL benefits. The exam may still test Synapse Link concepts, but know that Fabric Mirroring is the recommended replacement.

Step 1: Enable Synapse Link on the Cosmos DB account (one-time, irreversible):

az cosmosdb update --name sensorflow-cosmos \
  --resource-group rg-sensorflow \
  --enable-analytical-storage true

Step 2: Enable the analytical store on each container:

ContainerProperties props = new ContainerProperties("readings", "/deviceId")
{
    AnalyticalStoreTimeToLiveInSeconds = -1  // no expiry
};
await database.CreateContainerAsync(props);
TTL ValueBehaviour
-1Analytical store enabled, data retained indefinitely
0 or nullAnalytical store disabled
N (positive)Data retained for N seconds in analytical store
Exam tip: analytical store TTL is independent

The analytical store TTL is separate from the transactional store TTL. You can keep data in the analytical store longer than in the transactional store:

  • Transactional TTL = 30 days (keep recent data for the app)
  • Analytical TTL = -1 (keep all data forever for analytics)

When a document expires from the transactional store, it persists in the analytical store until its own TTL expires. This is a common exam scenario.

How the analytical store works

Transactional Store (row-oriented)         Analytical Store (column-oriented)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ { id, deviceId, temp, ts }   │──auto──→  β”‚ id     β”‚ deviceId β”‚ temp β”‚tsβ”‚
β”‚ { id, deviceId, temp, ts }   β”‚  sync     β”‚ ────── β”‚ ──────── β”‚ ──── │──│
β”‚ { id, deviceId, temp, ts }   β”‚  (~2min)  β”‚ val    β”‚ val      β”‚ val  β”‚  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        App reads/writes                      Synapse / Fabric queries
        (RU budget)                           (no RU impact)
  • Auto-sync latency: Typically under 2 minutes, can be up to 5 minutes
  • Schema: Fully-faithful, auto-inferred from the transactional store
  • Nested properties: Flattened into columns automatically
  • No RU consumption: Analytical sync and queries don’t consume transactional RU/s

Querying via Synapse

TomΓ‘s queries the analytical store using Synapse serverless SQL or Spark:

-- Synapse serverless SQL pool
SELECT deviceId,
       AVG(temperature) as avg_temp,
       MAX(temperature) as max_temp,
       COUNT(*) as reading_count
FROM OPENROWSET(
    'CosmosDB',
    'Account=sensorflow-cosmos;Database=sensorflow;Key=...',
    readings
) WITH (
    deviceId VARCHAR(50),
    temperature FLOAT,
    _ts BIGINT
) AS readings
GROUP BY deviceId
HAVING AVG(temperature) > 80

Microsoft Fabric mirroring is the recommended replacement for Synapse Link. It is GA and continuously replicates Cosmos DB data into Fabric OneLake:

FeatureSynapse LinkFabric Mirroring
Data locationCosmos DB analytical storeFabric OneLake (Delta tables)
Query enginesSynapse SQL/Spark poolsFabric Spark, SQL endpoint, Power BI
Latency~2 min auto-syncNear real-time continuous sync
Schema handlingAuto-inferred columnsDelta tables with schema evolution
Cost modelSynapse compute + analytical storeFabric capacity units
Setup complexityEnable on account + containerConfigure in Fabric workspace
Best forSynapse-centric architecturesFabric/Power BI-centric architectures
Exam tip: Synapse Link is the DP-420 focus

The DP-420 exam focuses primarily on Synapse Link and the analytical store. Fabric mirroring may appear in β€œwhich tool to choose” questions but deep configuration questions will be about Synapse Link. Know how to enable it, set TTL, and understand the auto-sync behaviour.

🎬 Video walkthrough

Flashcards

Question

What is the analytical store in Cosmos DB?

Click or press Enter to reveal answer

Answer

A column-oriented, auto-synced copy of your transactional data. It's optimised for analytical queries (aggregations, scans) and is queried via Synapse or Fabric β€” without consuming transactional RU/s. Auto-syncs within ~2 minutes.

Click to flip back

Question

What does setting analytical store TTL to -1 mean?

Click or press Enter to reveal answer

Answer

The analytical store is enabled and data is retained indefinitely. TTL=0 or null means disabled. A positive value sets expiry in seconds. The analytical TTL is independent of the transactional TTL.

Click to flip back

Question

Does querying the analytical store consume transactional RU/s?

Click or press Enter to reveal answer

Answer

No β€” analytical queries run against the column store and do NOT consume the container's transactional RU/s. The compute cost comes from Synapse or Fabric, not from Cosmos DB.

Click to flip back

Knowledge Check

Knowledge Check

TomΓ‘s needs to run daily aggregation queries on 500M sensor events. Running these directly on the transactional store would consume 50,000+ RU/s. What's the best approach?

Knowledge Check

Amara sets transactional store TTL to 7 days and analytical store TTL to -1. A document is ingested on Monday. What happens after the following Monday?

Knowledge Check

Which is required before you can enable the analytical store on a container?


Next up: Data Movement β€” Azure Data Factory, Kafka connectors, and Spark connectors for moving data into and out of Cosmos DB.