Domain 2 β€” Module 7 of 8 88%
15 of 27 overall
Domain 2: Develop AI solutions by using Azure data management services Free ⏱ ~12 min read

PostgreSQL Tuning: Compute, Memory, Connections

How to size Azure Database for PostgreSQL for vector workloads β€” vCores, memory, IOPS, storage, connection pooling, and the parameters that make pgvector queries fast.

Sizing in three numbers

Simple explanation

The three knobs that decide PostgreSQL performance are vCores, memory, and IOPS. vCores set how much CPU the planner has, memory controls how much of your data fits in cache (and how big your HNSW index can be), and IOPS controls how fast the disk delivers when you exceed the cache.

For vector workloads (pgvector with HNSW indexes), memory is usually the binding constraint. The HNSW graph wants to live in RAM. If it doesn’t fit, every query reads from disk and gets slow.

Connection pooling is the second big lever. Each PostgreSQL connection costs ~10 MB of memory; running 1000 idle connections from your container app eats gigabytes for nothing. Use PgBouncer (Flexible Server has it built in) or pool inside your app.

Sizing for vector workloads

Pick Memory Optimized when your HNSW index is large enough to thrash General Purpose.
FeatureBurstableGeneral PurposeMemory Optimized
Memory per vCore~2 GB~4 GB~8 GB
Best forDev/test, tiny POCsOLTP-style workloads, light vector useLarge vector indexes, RAG production
Cost shapeCheapest, throttles under loadSteady, mid-rangeMost per vCore, but vector workloads stop being I/O-bound

A rough sizing rule for HNSW:

HNSW index size β‰ˆ rows Γ— dimension Γ— 4 bytes Γ— m / 8

For 1M rows, 1536 dim, m=16: ~12 GB just for the index. Add the heap, the planner, the connections β€” you want a server with 32 GB+ RAM, putting you in Memory Optimized territory.

Memory parameters that matter

ParameterWhat it controlsTuning hint
shared_buffersPostgres’s main page cache25% of system RAM (default in Flexible Server is reasonable)
effective_cache_sizeHint to the planner about OS + Postgres cache50–75% of system RAM
work_memPer-operation sort/hash memoryStart at 16-64 MB; bump if EXPLAIN shows external sorts
maintenance_work_memMemory for VACUUM, CREATE INDEX1-4 GB during HNSW index builds
max_connectionsHard cap on connectionsKeep low (200-500); use a pooler for more

For HNSW index builds:

SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 4;
CREATE INDEX article_embeddings_hnsw_idx
  ON article_embeddings
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);
RESET maintenance_work_mem;

Building a 1M-row HNSW index with 4 GB of maintenance_work_mem and 4 parallel workers is roughly an order of magnitude faster than building with defaults.

Connection pooling β€” PgBouncer is your friend

Each PostgreSQL connection is a backend process consuming ~10 MB. A Container App scaling out to 50 replicas, each holding 20 connections, means 1000 connections β€” usually overkill, often hitting max_connections.

Two patterns:

PatternHowBest for
Built-in PgBouncer (Flexible Server)Enable in server parameters; clients connect to the same host on port 6432Most production workloads β€” Microsoft runs the pooler
Application-side pool (psycopg pool, Npgsql pool)Each replica caches a small number of connectionsLayered with PgBouncer for best results
# Python with psycopg_pool β€” connection pool inside the worker
from psycopg_pool import ConnectionPool

pool = ConnectionPool(
    conninfo="host=roo-pg.postgres.database.azure.com user=... ...",
    min_size=2,
    max_size=10,
    timeout=5,
)

with pool.connection() as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT 1")

Two pooler modes (PgBouncer):

ModeWhat it doesUse for
Transaction poolingConnection is reused per transactionMost workloads β€” highest concurrency
Session poolingConnection is reused per sessionWhen you need session-state features (LISTEN/NOTIFY, prepared statements with session lifetime, temporary tables)
Exam tip: 'connection storms' are pooling questions

When the symptom is β€œthousands of connection errors during traffic spike” or β€œPostgres is hitting max_connections”, the answer is almost always to add a connection pooler β€” built-in PgBouncer or an application-side pool. Increasing max_connections is a band-aid; PostgreSQL backends are heavyweight.

IOPS and storage tuning

Premium SSD v2 (the default for new Flexible Server deployments) decouples size from IOPS:

WorkloadIOPS targetWhy
OLTP, mostly cache hits3,000–6,000Most reads from shared_buffers; IOPS only on writes/checkpoints
Heavy vector ingest + reindex6,000–20,000Large bulk loads + index builds saturate IO
Large analytical queries12,000+Sequential scans hit storage hard

WAL (write-ahead log) is on the same volume as data on Flexible Server. High write workloads benefit from more IOPS even when reads cache well.

Read replicas β€” when to add one

A read replica is a standby that streams WAL from the primary and serves read-only queries. Use cases:

Use caseWhy
Offload reporting / analyticsBig aggregate queries on the primary slow OLTP β€” move them to a replica
Geographic read localityCross-region replica serves distant readers with lower latency
RAG retrieval readsVector queries can run on a replica while the primary handles ingest

Caveats: replicas lag the primary by a few hundred ms typically; for β€œmust read your own write” semantics, query the primary.

Key terms

Question

Which Flexible Server SKU tier is best for large vector workloads?

Click or press Enter to reveal answer

Answer

Memory Optimized (E-series). HNSW indexes want to live in RAM; Memory Optimized gives ~8 GB per vCore versus ~4 GB on General Purpose. The extra memory keeps the index resident, avoiding disk reads on every vector query.

Click to flip back

Question

What is `maintenance_work_mem`, and why bump it for HNSW builds?

Click or press Enter to reveal answer

Answer

The memory budget Postgres allocates for VACUUM and index builds. The default (~64-256 MB) is too small for HNSW on millions of vectors β€” builds spill to disk and take many times longer. Set it to 1-4 GB for the build session, then reset.

Click to flip back

Question

What is PgBouncer and why use it?

Click or press Enter to reveal answer

Answer

A connection pooler that multiplexes many client connections onto a small number of PostgreSQL backends. Each PostgreSQL backend is heavyweight (~10 MB); PgBouncer lets your app open thousands of logical connections backed by ~50 real ones. Built into Flexible Server (port 6432).

Click to flip back

Question

What's the difference between transaction-mode and session-mode pooling?

Click or press Enter to reveal answer

Answer

Transaction mode reuses connections at transaction boundaries β€” highest concurrency, the right choice for most apps. Session mode reuses per session β€” needed if you depend on LISTEN/NOTIFY, session-lifetime prepared statements, or temporary tables.

Click to flip back

Question

When does adding a read replica help an AI workload?

Click or press Enter to reveal answer

Answer

When you can route read-only traffic β€” analytics, RAG retrieval reads, reporting β€” away from the primary. The replica streams WAL from the primary; reads see slightly stale data but offload the primary. Useful for separating ingest writes from retrieval reads.

Click to flip back

Knowledge check

Knowledge Check

Mira's pgvector queries are slow even though the HNSW index exists. The server is General Purpose (8 vCore / 32 GB) and the embeddings table holds 4M rows of 1536-dim vectors (~28 GB embedding plus ~10 GB index). What's the most likely cause?

Knowledge Check

Theo's clinical app spikes from 50 to 1500 concurrent users at handover time. PostgreSQL starts returning `too many connections` errors. The Flexible Server has `max_connections=200`. What's the recommended fix?

Knowledge Check

Lin builds an HNSW index on 1M rows with default `maintenance_work_mem`. The build takes 9 hours. What's the simplest way to make it dramatically faster?