Domain 3 β€” Module 1 of 8 13%
19 of 26 overall
Domain 3: Monitor and Optimize an Analytics Solution Free ⏱ ~13 min read

Monitoring & Alerts: Catch Problems Early

Use the Fabric Monitoring Hub to track data ingestion, transformations, and semantic model refreshes. Configure alerts to catch failures before users notice.

Why monitor?

Simple explanation

Think of a factory floor with cameras and alarms.

Without cameras, you don’t know a machine stopped until the production line backs up. Without alarms, you find out about a spill when someone slips. Monitoring gives you visibility (cameras) and early warning (alarms).

Fabric’s Monitoring Hub is those cameras β€” you see every pipeline run, notebook execution, dataflow refresh, and semantic model refresh. Alerts are the alarms β€” they notify you when something fails or takes too long.

The Monitoring Hub

Item TypeMetrics Shown
Pipeline runsStatus (succeeded/failed/in progress), duration, start time, activity-level details
Notebook executionsSpark job status, duration, resource usage (vCores, memory)
Dataflow Gen2 refreshesRefresh status, duration, rows processed, errors
Semantic model refreshRefresh status, duration, tables refreshed, partition details
Spark jobsJob status, stages, tasks, shuffle read/write, executor metrics
Eventstream healthEvents ingested, processing lag, error rate

Key monitoring areas

Ingestion: Watch pipeline durations, row counts (zero rows = source issue), Dataflow connector errors, Eventstream processing lag.

Transformation: Watch notebook execution time, Spark stage failures, memory usage (90%+ is danger zone), shuffle distribution (skewed = one executor overloaded).

Semantic model refresh: Watch refresh duration vs schedule interval, partition refresh behaviour (full when incremental expected), memory limit errors.

Scenario: Zoe's monitoring routine

Zoe at WaveMedia checks the Monitoring Hub every morning:

  1. Eventstreams: Processing lag under 5 seconds? βœ…
  2. Overnight notebooks: 3/4 succeeded, 1 failed at 2:47 AM (OOM error) β€” increases pool size
  3. Semantic model: Refreshed at 5 AM, duration 12 min (under 15 min SLA) βœ…

Configuring alerts

Alert SourceHow It WorksBest For
Pipeline failure pathAdd Teams/email activity on failure outputImmediate notification for ETL failures
Data Activator rulesCondition-based triggers on streaming dataReal-time SLA monitoring
Power BI alertsVisual value crosses thresholdBusiness metric anomalies
Scenario: Carlos's alert layers

Carlos configures three layers: (1) Pipeline failure β†’ Teams channel post, (2) Eventstream lag > 60s β†’ email on-call engineer, (3) Defect rate > 5% β†’ alert quality manager via Power BI.


Question

What is the Fabric Monitoring Hub?

Click or press Enter to reveal answer

Answer

A centralised dashboard showing status, duration, and outcome of all workspace activity β€” pipeline runs, notebook executions, Dataflow refreshes, Spark jobs, semantic model refreshes, and Eventstream health.

Click to flip back

Question

What is Data Activator?

Click or press Enter to reveal answer

Answer

A rule-based alert engine in RTI. Set conditions on data (e.g., processing lag > 60s) and trigger actions (email, Teams, Power Automate flow).

Click to flip back

Question

How do you alert on pipeline failure?

Click or press Enter to reveal answer

Answer

Add a Teams or email activity on the pipeline's failure path. It fires when the upstream activity fails, including error details in the notification.

Click to flip back


Knowledge Check

Zoe's overnight notebook has been taking 45 min instead of 20 min. The Monitoring Hub shows high shuffle on one executor. What's the likely cause?

Knowledge Check

Carlos wants Teams notifications when pipelines fail. Where does he configure this?

Next up: Troubleshoot Pipelines & Dataflows β€” identify and resolve the most common pipeline and Dataflow Gen2 errors.