PostgreSQL Tuning: Compute, Memory, Connections
How to size Azure Database for PostgreSQL for vector workloads β vCores, memory, IOPS, storage, connection pooling, and the parameters that make pgvector queries fast.
Sizing in three numbers
The three knobs that decide PostgreSQL performance are vCores, memory, and IOPS. vCores set how much CPU the planner has, memory controls how much of your data fits in cache (and how big your HNSW index can be), and IOPS controls how fast the disk delivers when you exceed the cache.
For vector workloads (pgvector with HNSW indexes), memory is usually the binding constraint. The HNSW graph wants to live in RAM. If it doesnβt fit, every query reads from disk and gets slow.
Connection pooling is the second big lever. Each PostgreSQL connection costs ~10 MB of memory; running 1000 idle connections from your container app eats gigabytes for nothing. Use PgBouncer (Flexible Server has it built in) or pool inside your app.
Sizing for vector workloads
| Feature | Burstable | General Purpose | Memory Optimized |
|---|---|---|---|
| Memory per vCore | ~2 GB | ~4 GB | ~8 GB |
| Best for | Dev/test, tiny POCs | OLTP-style workloads, light vector use | Large vector indexes, RAG production |
| Cost shape | Cheapest, throttles under load | Steady, mid-range | Most per vCore, but vector workloads stop being I/O-bound |
A rough sizing rule for HNSW:
HNSW index size β rows Γ dimension Γ 4 bytes Γ m / 8
For 1M rows, 1536 dim, m=16: ~12 GB just for the index. Add the heap, the planner, the connections β you want a server with 32 GB+ RAM, putting you in Memory Optimized territory.
Memory parameters that matter
| Parameter | What it controls | Tuning hint |
|---|---|---|
shared_buffers | Postgresβs main page cache | 25% of system RAM (default in Flexible Server is reasonable) |
effective_cache_size | Hint to the planner about OS + Postgres cache | 50β75% of system RAM |
work_mem | Per-operation sort/hash memory | Start at 16-64 MB; bump if EXPLAIN shows external sorts |
maintenance_work_mem | Memory for VACUUM, CREATE INDEX | 1-4 GB during HNSW index builds |
max_connections | Hard cap on connections | Keep low (200-500); use a pooler for more |
For HNSW index builds:
SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 4;
CREATE INDEX article_embeddings_hnsw_idx
ON article_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
RESET maintenance_work_mem;
Building a 1M-row HNSW index with 4 GB of maintenance_work_mem and 4 parallel workers is roughly an order of magnitude faster than building with defaults.
Connection pooling β PgBouncer is your friend
Each PostgreSQL connection is a backend process consuming ~10 MB. A Container App scaling out to 50 replicas, each holding 20 connections, means 1000 connections β usually overkill, often hitting max_connections.
Two patterns:
| Pattern | How | Best for |
|---|---|---|
| Built-in PgBouncer (Flexible Server) | Enable in server parameters; clients connect to the same host on port 6432 | Most production workloads β Microsoft runs the pooler |
| Application-side pool (psycopg pool, Npgsql pool) | Each replica caches a small number of connections | Layered with PgBouncer for best results |
# Python with psycopg_pool β connection pool inside the worker
from psycopg_pool import ConnectionPool
pool = ConnectionPool(
conninfo="host=roo-pg.postgres.database.azure.com user=... ...",
min_size=2,
max_size=10,
timeout=5,
)
with pool.connection() as conn:
with conn.cursor() as cur:
cur.execute("SELECT 1")
Two pooler modes (PgBouncer):
| Mode | What it does | Use for |
|---|---|---|
| Transaction pooling | Connection is reused per transaction | Most workloads β highest concurrency |
| Session pooling | Connection is reused per session | When you need session-state features (LISTEN/NOTIFY, prepared statements with session lifetime, temporary tables) |
Exam tip: 'connection storms' are pooling questions
When the symptom is βthousands of connection errors during traffic spikeβ or βPostgres is hitting max_connectionsβ, the answer is almost always to add a connection pooler β built-in PgBouncer or an application-side pool. Increasing max_connections is a band-aid; PostgreSQL backends are heavyweight.
IOPS and storage tuning
Premium SSD v2 (the default for new Flexible Server deployments) decouples size from IOPS:
| Workload | IOPS target | Why |
|---|---|---|
| OLTP, mostly cache hits | 3,000β6,000 | Most reads from shared_buffers; IOPS only on writes/checkpoints |
| Heavy vector ingest + reindex | 6,000β20,000 | Large bulk loads + index builds saturate IO |
| Large analytical queries | 12,000+ | Sequential scans hit storage hard |
WAL (write-ahead log) is on the same volume as data on Flexible Server. High write workloads benefit from more IOPS even when reads cache well.
Read replicas β when to add one
A read replica is a standby that streams WAL from the primary and serves read-only queries. Use cases:
| Use case | Why |
|---|---|
| Offload reporting / analytics | Big aggregate queries on the primary slow OLTP β move them to a replica |
| Geographic read locality | Cross-region replica serves distant readers with lower latency |
| RAG retrieval reads | Vector queries can run on a replica while the primary handles ingest |
Caveats: replicas lag the primary by a few hundred ms typically; for βmust read your own writeβ semantics, query the primary.
Key terms
Knowledge check
Mira's pgvector queries are slow even though the HNSW index exists. The server is General Purpose (8 vCore / 32 GB) and the embeddings table holds 4M rows of 1536-dim vectors (~28 GB embedding plus ~10 GB index). What's the most likely cause?
Theo's clinical app spikes from 50 to 1500 concurrent users at handover time. PostgreSQL starts returning `too many connections` errors. The Flexible Server has `max_connections=200`. What's the recommended fix?
Lin builds an HNSW index on 1M rows with default `maintenance_work_mem`. The build takes 9 hours. What's the simplest way to make it dramatically faster?