Domain 2 β€” Module 8 of 8 100%
16 of 27 overall
Domain 2: Develop AI solutions by using Azure data management services Free ⏱ ~12 min read

Azure Managed Redis: Caching + Vector Search

The new fully-managed Redis on Azure β€” replacing Azure Cache for Redis Enterprise. Caching with TTLs, invalidation, and vector indexing through RediSearch for low-latency similarity over hot data.

What Azure Managed Redis is

Simple explanation

Azure Managed Redis is the new managed Redis service on Azure β€” it replaces Azure Cache for Redis Enterprise. It’s fast (microsecond reads), fully managed, and ships with the Redis Stack modules β€” including RediSearch, which gives you vector search.

Two main jobs in AI-200:

  • Caching: store the result of an expensive operation (LLM call, vector search, computed feature) for a short time so repeated requests don’t pay the cost again
  • Vector similarity over hot data: for embeddings you want sub-millisecond search on, store them in Redis with a vector index

The exam tests TTLs (expiration), invalidation patterns, and the basics of vector indexing in Redis.

Caching essentials β€” TTL, expiration, invalidation

The core caching pattern:

import redis

r = redis.Redis(
    host="roo.redis.azure.net",
    port=10000,
    password="<entra-token-or-key>",
    ssl=True,
)

cache_key = f"embedding:{hash_text(text)}"
cached = r.get(cache_key)
if cached:
    return cached

# Cache miss β€” compute
result = openai.embeddings.create(...)
r.set(cache_key, json.dumps(result), ex=3600)  # 1 hour TTL
return result

Three TTL strategies:

The three classic cache freshness strategies. Most production AI systems combine fixed TTL + explicit invalidation.
FeatureFixed TTLSliding TTLExplicit invalidation
How`SET key value EX 3600`On every read, refresh: `EXPIRE key 3600`On data change, `DEL key` (or pub/sub event)
Best forPredictable freshness windows (hourly, daily)Hot keys you want to keep alive while in useSource-of-truth changes β€” invalidate when the underlying data changes
RiskCold cache after expiry (one slow request)Stale data if nothing's invalidated and the source changedForgetting to invalidate on every write path
Exam tip: 'cache stampede' and the dogpile problem

When a hot cache key expires under heavy traffic, dozens of replicas all miss simultaneously and all try to recompute β€” the cache β€œstampedes”. Two classic mitigations:

  • Probabilistic early expiration β€” refresh the key probabilistically a bit before TTL ends, so misses happen one at a time
  • Lock-and-recompute β€” first miss takes a short Redis lock, recomputes, sets the value; others read the new value once the lock is released

AI scenarios with expensive recompute (LLM calls, large vector searches) are particularly vulnerable to stampedes.

Eviction policies β€” what gets dropped when memory fills

Managed Redis evicts when you exceed the maxmemory budget. Choose a policy that matches your access pattern:

PolicyWhat it evictsBest for
noevictionNothing β€” writes fail with OOMStrict caches that must not lose data
allkeys-lruLeast Recently Used across all keysGeneral caching β€” the safe default
allkeys-lfuLeast Frequently UsedWhen some keys are perennially hot regardless of recency
volatile-ttlKeys with the soonest TTL first, among keys with TTLsMixed workloads where some keys are persistent and others ephemeral

allkeys-lru is the default Microsoft recommends for cache use. volatile-ttl makes sense when you store a mix of cache-with-TTL plus permanent state in the same instance.

Vector indexing β€” RediSearch

Redis Stack’s RediSearch module supports vector fields with HNSW or FLAT indexes. The pattern is conceptually similar to pgvector, but the API is Redis-flavoured.

from redis.commands.search.field import VectorField, TagField, NumericField
from redis.commands.search.index_definition import IndexDefinition, IndexType

# Create the index once
r.ft("idx:embeddings").create_index(
    [
        VectorField(
            "embedding",
            "HNSW",
            {
                "TYPE": "FLOAT32",
                "DIM": 1536,
                "DISTANCE_METRIC": "COSINE",
                "M": 16,
                "EF_CONSTRUCTION": 64,
            },
        ),
        TagField("tenant"),
        TagField("language"),
        NumericField("updated_at"),
    ],
    definition=IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH),
)

# Insert a doc
r.hset(
    f"doc:{doc_id}",
    mapping={
        "embedding": np.array(vec, dtype="float32").tobytes(),
        "tenant": "tidewater",
        "language": "en",
        "updated_at": int(time.time()),
        "title": "Insulin storage policy v3",
    },
)

# Query top-5 nearest with metadata filter
res = r.ft("idx:embeddings").search(
    Query("(@tenant:{tidewater} @language:{en})=>[KNN 5 @embedding $vec AS score]")
    .sort_by("score").return_fields("title", "score").dialect(2),
    query_params={"vec": np.array(query_vec, dtype="float32").tobytes()},
)

Two index types:

TypeWhat it doesBest for
FLATBrute-force, exactSmall datasets where exactness matters
HNSWApproximate nearest neighbour, low latencyProduction AI workloads β€” usually the right choice

Distance metrics: COSINE, IP (inner product), L2.

Hybrid search in Redis

The @field:{value} filter syntax inside a vector query is RediSearch’s hybrid search. Filter first, KNN second:

(@tenant:{tidewater} @language:{en} @updated_at:[1714521600 +inf])=>[KNN 5 @embedding $vec]

Reads like: β€œamong docs whose tenant is tidewater AND language is en AND updated_at >= 1714521600, return the 5 nearest to the query vector.”

Authentication

ModeWhenNotes
Microsoft Entra IDRecommendedContainer Apps / AKS managed identity authenticates; map identities to Redis access policies
Access keysLegacy / quick startTwo keys per cache; rotate one at a time
# Grant a managed identity Data Owner on the cache
az redis identity assign \
  --name roo-redis -g roo-prod \
  --identities $UAI_RESOURCE_ID

az role assignment create \
  --assignee $PRINCIPAL_ID \
  --role "Redis Cache Contributor" \
  --scope $(az redis show -n roo-redis -g roo-prod --query id -o tsv)

When to pick Managed Redis vs Cosmos vs PostgreSQL

Three Azure-native data services for AI. Pick based on volume, latency, and data shape.
FeatureCosmos DB NoSQLPostgreSQL + pgvectorAzure Managed Redis
Latency targetSingle-digit ms10-50 ms (depending on tuning)Sub-millisecond
Volume per instanceTBs across partitionsTBs per serverGBs (RAM-bound)
Best forDocument store + vectors at scaleRelational + vectors with joinsHot cache + ultra-low-latency vectors
PersistencePersistent, replicatedPersistent, replicatedRAM-first; persistence available but watch trade-offs
Use asPrimary store for chat / RAG sourcesPrimary store with strong relational shapeCache layer + tier 1 vector retrieval
Real-world example: Priya's two-tier vector setup

BeanCraft’s menu-Q&A bot serves 240 stores. Priya’s data architecture:

  1. PostgreSQL + pgvector β€” 80,000 menu items + dietary docs + brand standards (durable, joined to product catalog)
  2. Managed Redis β€” top 2,000 most-asked items embedded in a RediSearch HNSW index, refreshed nightly

The bot first searches Redis (sub-ms). On a miss, it falls back to PostgreSQL (~30 ms). 92% of queries hit Redis. Average retrieval latency: 4 ms β€” fast enough that customers feel the bot is β€œinstant”.

Key terms

Question

What is Azure Managed Redis?

Click or press Enter to reveal answer

Answer

The new fully-managed Redis service on Azure, replacing Azure Cache for Redis Enterprise. Ships with Redis Stack (RediSearch, RedisJSON, RedisTimeSeries, RedisBloom), supports Microsoft Entra ID auth, and offers performance tiers including Flash Optimized for very large datasets.

Click to flip back

Question

What's the recommended Redis eviction policy for general AI caching?

Click or press Enter to reveal answer

Answer

`allkeys-lru` β€” Least Recently Used across all keys. It's the safe default when you want Redis to behave as a cache that drops cold data automatically. Use `volatile-ttl` if you mix cache and persistent state.

Click to flip back

Question

What does RediSearch add for AI workloads?

Click or press Enter to reveal answer

Answer

Full-text search, secondary indexing, and vector similarity search (FLAT and HNSW indexes with cosine, IP, or L2 metrics). Lets Redis serve as a hot-tier vector database with sub-millisecond similarity queries.

Click to flip back

Question

What is a cache stampede, and how do you prevent it?

Click or press Enter to reveal answer

Answer

When many concurrent requests miss a freshly-expired hot cache key and all try to recompute simultaneously. Mitigations include probabilistic early expiration (refresh keys probabilistically before TTL) and lock-and-recompute (first miss takes a short lock, others wait or read the new value).

Click to flip back

Question

When does Redis make sense over Cosmos or pgvector for embeddings?

Click or press Enter to reveal answer

Answer

When latency below 5 ms matters more than data volume, and the working set fits in RAM. Common pattern: serve top-N hot embeddings from Redis, fall back to Cosmos / pgvector for cold ones. Don't put TBs of vectors in Redis β€” it's RAM-bound.

Click to flip back

Knowledge check

Knowledge Check

Mira caches expensive vector-search results in Redis with `SET cache_key result EX 600`. During a popular product launch, 200 replicas hit the cache simultaneously, the key expires, and 200 replicas all recompute. What's the cleanest mitigation?

Knowledge Check

Theo wants vector similarity search with sub-millisecond latency over the 5,000 most-recent clinical-policy embeddings, with metadata filters for tenant and language. The full set of 200,000 embeddings stays in pgvector. Which Azure service fits the hot-tier role?

Knowledge Check

Lin's Redis instance is a mix of session caches with TTLs and a small set of permanent feature flags without TTLs. Memory pressure occasionally causes evictions. Which eviction policy fits best?