Azure Managed Redis: Caching + Vector Search
The new fully-managed Redis on Azure β replacing Azure Cache for Redis Enterprise. Caching with TTLs, invalidation, and vector indexing through RediSearch for low-latency similarity over hot data.
What Azure Managed Redis is
Azure Managed Redis is the new managed Redis service on Azure β it replaces Azure Cache for Redis Enterprise. Itβs fast (microsecond reads), fully managed, and ships with the Redis Stack modules β including RediSearch, which gives you vector search.
Two main jobs in AI-200:
- Caching: store the result of an expensive operation (LLM call, vector search, computed feature) for a short time so repeated requests donβt pay the cost again
- Vector similarity over hot data: for embeddings you want sub-millisecond search on, store them in Redis with a vector index
The exam tests TTLs (expiration), invalidation patterns, and the basics of vector indexing in Redis.
Caching essentials β TTL, expiration, invalidation
The core caching pattern:
import redis
r = redis.Redis(
host="roo.redis.azure.net",
port=10000,
password="<entra-token-or-key>",
ssl=True,
)
cache_key = f"embedding:{hash_text(text)}"
cached = r.get(cache_key)
if cached:
return cached
# Cache miss β compute
result = openai.embeddings.create(...)
r.set(cache_key, json.dumps(result), ex=3600) # 1 hour TTL
return result
Three TTL strategies:
| Feature | Fixed TTL | Sliding TTL | Explicit invalidation |
|---|---|---|---|
| How | `SET key value EX 3600` | On every read, refresh: `EXPIRE key 3600` | On data change, `DEL key` (or pub/sub event) |
| Best for | Predictable freshness windows (hourly, daily) | Hot keys you want to keep alive while in use | Source-of-truth changes β invalidate when the underlying data changes |
| Risk | Cold cache after expiry (one slow request) | Stale data if nothing's invalidated and the source changed | Forgetting to invalidate on every write path |
Exam tip: 'cache stampede' and the dogpile problem
When a hot cache key expires under heavy traffic, dozens of replicas all miss simultaneously and all try to recompute β the cache βstampedesβ. Two classic mitigations:
- Probabilistic early expiration β refresh the key probabilistically a bit before TTL ends, so misses happen one at a time
- Lock-and-recompute β first miss takes a short Redis lock, recomputes, sets the value; others read the new value once the lock is released
AI scenarios with expensive recompute (LLM calls, large vector searches) are particularly vulnerable to stampedes.
Eviction policies β what gets dropped when memory fills
Managed Redis evicts when you exceed the maxmemory budget. Choose a policy that matches your access pattern:
| Policy | What it evicts | Best for |
|---|---|---|
noeviction | Nothing β writes fail with OOM | Strict caches that must not lose data |
allkeys-lru | Least Recently Used across all keys | General caching β the safe default |
allkeys-lfu | Least Frequently Used | When some keys are perennially hot regardless of recency |
volatile-ttl | Keys with the soonest TTL first, among keys with TTLs | Mixed workloads where some keys are persistent and others ephemeral |
allkeys-lru is the default Microsoft recommends for cache use. volatile-ttl makes sense when you store a mix of cache-with-TTL plus permanent state in the same instance.
Vector indexing β RediSearch
Redis Stackβs RediSearch module supports vector fields with HNSW or FLAT indexes. The pattern is conceptually similar to pgvector, but the API is Redis-flavoured.
from redis.commands.search.field import VectorField, TagField, NumericField
from redis.commands.search.index_definition import IndexDefinition, IndexType
# Create the index once
r.ft("idx:embeddings").create_index(
[
VectorField(
"embedding",
"HNSW",
{
"TYPE": "FLOAT32",
"DIM": 1536,
"DISTANCE_METRIC": "COSINE",
"M": 16,
"EF_CONSTRUCTION": 64,
},
),
TagField("tenant"),
TagField("language"),
NumericField("updated_at"),
],
definition=IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH),
)
# Insert a doc
r.hset(
f"doc:{doc_id}",
mapping={
"embedding": np.array(vec, dtype="float32").tobytes(),
"tenant": "tidewater",
"language": "en",
"updated_at": int(time.time()),
"title": "Insulin storage policy v3",
},
)
# Query top-5 nearest with metadata filter
res = r.ft("idx:embeddings").search(
Query("(@tenant:{tidewater} @language:{en})=>[KNN 5 @embedding $vec AS score]")
.sort_by("score").return_fields("title", "score").dialect(2),
query_params={"vec": np.array(query_vec, dtype="float32").tobytes()},
)
Two index types:
| Type | What it does | Best for |
|---|---|---|
FLAT | Brute-force, exact | Small datasets where exactness matters |
HNSW | Approximate nearest neighbour, low latency | Production AI workloads β usually the right choice |
Distance metrics: COSINE, IP (inner product), L2.
Hybrid search in Redis
The @field:{value} filter syntax inside a vector query is RediSearchβs hybrid search. Filter first, KNN second:
(@tenant:{tidewater} @language:{en} @updated_at:[1714521600 +inf])=>[KNN 5 @embedding $vec]
Reads like: βamong docs whose tenant is tidewater AND language is en AND updated_at >= 1714521600, return the 5 nearest to the query vector.β
Authentication
| Mode | When | Notes |
|---|---|---|
| Microsoft Entra ID | Recommended | Container Apps / AKS managed identity authenticates; map identities to Redis access policies |
| Access keys | Legacy / quick start | Two keys per cache; rotate one at a time |
# Grant a managed identity Data Owner on the cache
az redis identity assign \
--name roo-redis -g roo-prod \
--identities $UAI_RESOURCE_ID
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Redis Cache Contributor" \
--scope $(az redis show -n roo-redis -g roo-prod --query id -o tsv)
When to pick Managed Redis vs Cosmos vs PostgreSQL
| Feature | Cosmos DB NoSQL | PostgreSQL + pgvector | Azure Managed Redis |
|---|---|---|---|
| Latency target | Single-digit ms | 10-50 ms (depending on tuning) | Sub-millisecond |
| Volume per instance | TBs across partitions | TBs per server | GBs (RAM-bound) |
| Best for | Document store + vectors at scale | Relational + vectors with joins | Hot cache + ultra-low-latency vectors |
| Persistence | Persistent, replicated | Persistent, replicated | RAM-first; persistence available but watch trade-offs |
| Use as | Primary store for chat / RAG sources | Primary store with strong relational shape | Cache layer + tier 1 vector retrieval |
Real-world example: Priya's two-tier vector setup
BeanCraftβs menu-Q&A bot serves 240 stores. Priyaβs data architecture:
- PostgreSQL + pgvector β 80,000 menu items + dietary docs + brand standards (durable, joined to product catalog)
- Managed Redis β top 2,000 most-asked items embedded in a RediSearch HNSW index, refreshed nightly
The bot first searches Redis (sub-ms). On a miss, it falls back to PostgreSQL (~30 ms). 92% of queries hit Redis. Average retrieval latency: 4 ms β fast enough that customers feel the bot is βinstantβ.
Key terms
Knowledge check
Mira caches expensive vector-search results in Redis with `SET cache_key result EX 600`. During a popular product launch, 200 replicas hit the cache simultaneously, the key expires, and 200 replicas all recompute. What's the cleanest mitigation?
Theo wants vector similarity search with sub-millisecond latency over the 5,000 most-recent clinical-policy embeddings, with metadata filters for tenant and language. The full set of 200,000 embeddings stays in pgvector. Which Azure service fits the hot-tier role?
Lin's Redis instance is a mix of session caches with TTLs and a small set of permanent feature flags without TTLs. Memory pressure occasionally causes evictions. Which eviction policy fits best?