Domain 3 β€” Module 1 of 7 14%
12 of 28 overall
Domain 3: Monitor, Configure, and Optimize Database Resources Free ⏱ ~13 min read

Performance Baselines and Monitoring Tools

Prepare operational performance baselines, identify metric sources, interpret performance data, and configure activity monitoring for Azure SQL.

Building a performance baseline

Simple explanation

A performance baseline is like knowing your resting heart rate.

If your normal heart rate is 70 BPM, then 120 BPM means something is wrong. But if you don’t know your normal, you can’t tell. A database baseline captures β€œnormal” β€” CPU usage, query times, I/O rates, wait stats β€” so you can spot problems when things change.

Creating a baseline

Tomas builds his first baseline for CloudFirst’s client database:

What to capture

Metric CategoryKey MetricsWhy It Matters
CPU% utilisation, vCore usageDetect compute bottlenecks
MemoryBuffer cache hit ratio, page life expectancyMemory pressure indicators
I/ORead/write IOPS, latency, throughput (MB/s)Storage performance
QueriesDuration, logical reads, CPU time per queryFind expensive queries
WaitsTop wait types, wait time per categoryIdentify what queries wait on
ConnectionsActive sessions, blocked sessionsConnection pool health
tempdbUsage, contention (PAGELATCH waits)tempdb pressure

When to capture

  • Capture during normal business hours (not maintenance windows)
  • Include both peak and off-peak periods
  • Run for at least one full business cycle (1-2 weeks minimum)
  • Re-baseline after major changes (migrations, schema changes, scaling)

Baseline DMV queries

-- CPU usage over time (Azure SQL DB)
SELECT end_time, avg_cpu_percent, avg_data_io_percent, avg_log_write_percent
FROM sys.dm_db_resource_stats
ORDER BY end_time DESC;

-- Wait statistics baseline
SELECT wait_type, wait_time_ms, signal_wait_time_ms, waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN ('SLEEP_TASK','BROKER_TO_FLUSH','CLR_AUTO_EVENT')
ORDER BY wait_time_ms DESC;

-- Top queries by CPU
SELECT TOP 20
  qs.total_worker_time / qs.execution_count AS avg_cpu_time,
  qs.execution_count,
  SUBSTRING(st.text, (qs.statement_start_offset/2)+1, 100) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
ORDER BY avg_cpu_time DESC;

Performance metric sources

Where to Find Performance Data
SourcePlatformData TypeRetention
Azure Monitor metricsSQL DB, MICPU, DTU, storage, connections93 days
sys.dm_db_resource_statsSQL DBCPU, I/O, memory (15-sec intervals)1 hour
sys.server_resource_statsMICPU, I/O, storage (5-min intervals)14 days
sys.dm_os_wait_statsAll platformsWait type aggregates since last resetUntil reset
sys.dm_exec_query_statsAll platformsCached query plan statisticsWhile plan is cached
Query StoreAll platformsHistorical query performanceConfigurable (30 days default)
Log AnalyticsSQL DB, MIDiagnostic logs, audit eventsConfigurable (up to 2 years)
Extended EventsAll platformsGranular event tracingSession lifetime
Database watcherSQL DB, MIMonitoring dashboard and metricsConfigurable

Interpreting key metrics

Sam teaches Tomas the critical thresholds:

MetricHealthyWarningCritical
CPU %Under 60%60-80%Over 80% sustained
Buffer cache hit ratioOver 99%97-99%Under 97%
Page life expectancyOver 300 seconds100-300Under 100
Avg disk latencyUnder 10ms10-20msOver 20ms
Batch requests/secBaseline + 20%Baseline + 50%Baseline + 100%

Common wait types and what they mean

Wait TypeIndicatesAction
CXPACKET / CXCONSUMERParallelism waits β€” uneven parallel query distributionTune MAXDOP, check for skewed parallelism
PAGEIOLATCH_SHWaiting for data pages from diskAdd memory, improve I/O, optimise queries
LCK_M_X / LCK_M_SLock contention β€” blockingInvestigate blocking chains, optimise transactions
WRITELOGTransaction log write latencyMove log to faster storage, reduce transaction frequency
SOS_SCHEDULER_YIELDCPU pressure β€” queries yielding for other queriesScale up CPU, optimise expensive queries
RESOURCE_SEMAPHOREMemory grant waits β€” queries can’t get enough memoryAdd memory, fix large sorts/hashes, check memory grants
Exam tip: Azure SQL DB specific metrics

Azure SQL Database uses DTU percentage as a composite metric:

  • DTU % = max(CPU %, Data I/O %, Log I/O %)
  • If DTU % is consistently above 80%, consider scaling up

For vCore databases, check avg_cpu_percent, avg_data_io_percent, and avg_log_write_percent separately in sys.dm_db_resource_stats.

The exam may show a graph and ask β€œwhat is the bottleneck?” β€” look for the highest percentage metric.

Configuring monitoring

Azure Monitor alerts

  1. Navigate to the SQL database in Azure Portal β†’ Alerts
  2. Create alert rules for key metrics (CPU > 80%, DTU > 90%, deadlocks > 0)
  3. Configure action groups (email, SMS, webhook, Logic App)

Diagnostic settings

  1. Database β†’ Diagnostic settings β†’ Add diagnostic setting
  2. Send to: Log Analytics workspace, Storage account, or Event Hub
  3. Select categories: SQLInsights, QueryStoreRuntimeStatistics, Errors, etc.

Azure SQL Analytics (Log Analytics)

  • Pre-built solution for monitoring multiple Azure SQL databases
  • Dashboards for DTU usage, query performance, timeouts, errors
  • KQL queries for custom analysis
Question

What is sys.dm_db_resource_stats and how long does it retain data?

Click or press Enter to reveal answer

Answer

A DMV specific to Azure SQL Database that shows resource utilisation (CPU, I/O, memory) at 15-second intervals. It retains data for 1 hour only β€” use it for recent monitoring, not historical analysis.

Click to flip back

Question

What does the PAGEIOLATCH_SH wait type indicate?

Click or press Enter to reveal answer

Answer

Queries are waiting for data pages to be read from disk into memory. Indicates I/O pressure β€” consider adding memory (more buffer cache), moving to faster storage, or optimising queries to read fewer pages.

Click to flip back

Question

What are the three components of DTU percentage?

Click or press Enter to reveal answer

Answer

DTU % = max of (CPU %, Data I/O %, Log I/O %). It's a composite metric β€” the highest of the three determines the DTU usage. Consistently above 80% suggests scaling up.

Click to flip back

Knowledge Check

Tomas notices that avg_cpu_percent is steady at 30%, but avg_data_io_percent spikes to 95% during business hours. What is the bottleneck?

Knowledge Check

Kenji needs to review query performance from the past 48 hours on Azure SQL Database. sys.dm_db_resource_stats only retains 1 hour. What should he use?

Next up: Database Watcher and Extended Events β€” deep-dive monitoring with two powerful diagnostic tools.