Domain 2 β€” Module 3 of 8 38%
11 of 27 overall
Domain 2: Develop AI solutions by using Azure data management services Free ⏱ ~13 min read

Cosmos DB for Vectors: Embeddings + Similarity Search

Storing embeddings next to the documents that produced them, then finding the closest vectors with VectorDistance. Vector index types, dimensions, and the one-line RAG pattern.

What β€œvector search in Cosmos” actually means

Simple explanation

An embedding is a list of numbers that represents the meaning of a piece of text (or image). Two pieces of text with similar meanings have embeddings that are close to each other in mathematical space.

Cosmos DB lets you store those embeddings inside the same documents as the source data β€” your chat message, your support ticket, your knowledge-base article β€” and search by similarity using a SQL function called VectorDistance. It returns the closest documents to a given query vector.

That’s the foundation of semantic retrieval and RAG (Retrieval-Augmented Generation): store knowledge as embeddings, find the relevant ones at query time, ground the LLM in those results.

The data model

{
  "id": "kb-article-42",
  "tenantId": "tidewater",
  "title": "Insulin storage policy v3",
  "body": "Refrigerate at 2-8Β°C. Discard after 28 days at room temperature...",
  "embedding": [0.012, -0.034, 0.567, ..., 0.089],  // 1536 floats (text-embedding-3-small)
  "language": "en",
  "updatedAt": "2026-04-15T09:30:00Z"
}

The vector lives on /embedding. It’s just a JSON array of floats; Cosmos doesn’t care how you generated it (Azure OpenAI, a self-hosted model, anything).

Container configuration: two policies

Vector embedding policy

{
  "vectorEmbeddings": [
    {
      "path": "/embedding",
      "dataType": "float32",
      "distanceFunction": "cosine",
      "dimensions": 1536
    }
  ]
}

This declares: the embedding lives at /embedding, is a 1536-dimensional float32 vector, and we measure similarity with cosine distance.

SettingCommon choicesWhen to pick which
dataTypefloat32, float16, int8, uint8float32 β€” default; lower precisions trade accuracy for storage
distanceFunctioncosine, dotproduct, euclideancosine β€” most LLM embeddings (text-embedding-ada-002, text-embedding-3-*)
dimensions1536 (Azure OpenAI text-embedding-3-small), 3072 (text-embedding-3-large), 768 (sentence-transformers)Match your embedding model exactly

Vector indexing policy

{
  "indexingMode": "consistent",
  "includedPaths": [{ "path": "/*" }],
  "excludedPaths": [{ "path": "/_etag/?" }, { "path": "/embedding/*" }],
  "vectorIndexes": [
    {
      "path": "/embedding",
      "type": "diskANN"
    }
  ]
}

Two important things:

  • excludedPaths lists /embedding/* so the regular index doesn’t waste effort indexing each float
  • vectorIndexes is a separate array β€” that’s where the vector index goes
Pick the smallest index that fits your data; diskANN scales to millions but adds approximation.
FeatureflatquantizedFlatdiskANN
AlgorithmBrute-force scan, exactCompressed brute-force, near-exactApproximate nearest neighbour (ANN)
Best for sizesUp to ~10k vectors per partition10k-100k vectors per partition100k-millions of vectors per partition
Recall100% (exact)Very high (~99%)High but tunable (~95-99%)
RU costHighestMediumLowest at scale

The query β€” VectorDistance

-- Top-5 most similar articles to a query embedding, scoped to a tenant
SELECT TOP 5
  c.id, c.title,
  VectorDistance(c.embedding, @queryVec) AS score
FROM c
WHERE c.tenantId = 'tidewater'
ORDER BY VectorDistance(c.embedding, @queryVec)
# Python β€” generate query embedding then run the search
from azure.cosmos import CosmosClient
from openai import AzureOpenAI

oa = AzureOpenAI(...)
query_text = "How long can insulin sit out at room temperature?"
query_vec = oa.embeddings.create(
    model="text-embedding-3-small",
    input=query_text,
).data[0].embedding

results = container.query_items(
    query="""
        SELECT TOP @k c.id, c.title,
            VectorDistance(c.embedding, @vec) AS score
        FROM c
        WHERE c.tenantId = @tid
        ORDER BY VectorDistance(c.embedding, @vec)
    """,
    parameters=[
        {"name": "@k", "value": 5},
        {"name": "@vec", "value": query_vec},
        {"name": "@tid", "value": "tidewater"},
    ],
    partition_key="tidewater",
)

Key things:

  • VectorDistance(field, vector) returns the distance β€” smaller is closer
  • ORDER BY VectorDistance(...) triggers the vector index
  • Always include the partition-key filter when possible (turns it into a single-partition vector search)

Hybrid search β€” vectors plus metadata filters

Where Cosmos shines: filter by metadata BEFORE running vector similarity.

SELECT TOP 10 c.id, c.title, VectorDistance(c.embedding, @vec) AS score
FROM c
WHERE c.tenantId = @tid
  AND c.language = 'en'
  AND c.updatedAt > '2026-01-01'
  AND c.docType IN ('policy', 'guideline')
ORDER BY VectorDistance(c.embedding, @vec)

The metadata predicates run first (against the regular index), narrowing the candidate set. The vector index then ranks the remaining candidates. That’s significantly faster than a separate β€œvector search β†’ metadata filter” pipeline.

Real-world example: Theo's clinical RAG

Tidewater Health stores 30,000 clinical guidelines per tenant. Each is embedded with text-embedding-3-large (3072 dim) and stored on /embedding. The container is partitioned by /tenantId.

Theo’s RAG query at clinician question time:

  1. Embed the clinician’s question (Azure OpenAI)
  2. Single Cosmos query that filters by tenant, language, and recency, then ranks by vector distance
  3. Top 5 results pass to the LLM as RAG context

All in one SDK round-trip, all in one billing source, all with one identity. Without hybrid search Theo would need a separate vector store and a join β€” twice the operational surface, twice the secrets, twice the latency.

Storage and RU economics for vectors

Vector storage and indexing aren’t free:

  • A 1536-dim float32 vector is 6 KB. 100,000 documents = 600 MB just in vectors.
  • Quantising to int8 or float16 cuts storage 4Γ— or 2Γ— respectively, with small recall loss.
  • diskANN indexes have an upfront build cost; insert RU cost rises while the index ingests.
  • The first vector query after a long idle period may be slow (the index is paged in from storage).

For very large vector sets where storage matters, consider:

  • int8 quantisation
  • quantizedFlat index for partial precision
  • Splitting embeddings into a separate container partitioned the same way as source data, so vector queries don’t carry full document weight

Key terms

Question

What is a vector embedding?

Click or press Enter to reveal answer

Answer

A list of floats that represents the meaning of a piece of text (or image, audio, etc). Generated by an embedding model. Two embeddings are 'close' (small distance) when their source content has similar meaning. The basis of semantic search and RAG.

Click to flip back

Question

Which Cosmos DB index types support vector search?

Click or press Enter to reveal answer

Answer

`flat` (exact brute force, small sets), `quantizedFlat` (compressed flat for larger sets), and `diskANN` (approximate nearest neighbour for very large sets). Pick the smallest type that fits your data β€” diskANN trades exactness for scale.

Click to flip back

Question

What does VectorDistance() do in a Cosmos query?

Click or press Enter to reveal answer

Answer

Computes the distance between a stored vector field and a query vector, using the distance function declared in the container's vector embedding policy (cosine, dot product, or euclidean). Used in WHERE filters and especially in ORDER BY for top-N similarity.

Click to flip back

Question

Why exclude the vector path from the regular indexing policy?

Click or press Enter to reveal answer

Answer

Cosmos's default index would try to index every float in the array as a separate property β€” wasteful and unhelpful for similarity queries. Add the vector path to `excludedPaths` of the regular index, and add it separately to `vectorIndexes` for proper vector indexing.

Click to flip back

Question

How does hybrid search in Cosmos DB work?

Click or press Enter to reveal answer

Answer

A single SQL query combines metadata predicates (using the regular index) with VectorDistance ranking (using the vector index). The metadata filter narrows the candidate set first, then vectors rank the survivors. Faster and cheaper than a separate vector store + post-filter pipeline.

Click to flip back

Knowledge check

Knowledge Check

Mira stores 1.2 million product-image embeddings per tenant in Cosmos. Each tenant has its own partition. Top-5 similarity queries are slow at this scale. Which vector index should she configure?

Knowledge Check

Theo's vector search query is `SELECT TOP 5 c.id, VectorDistance(c.embedding, @vec) FROM c ORDER BY VectorDistance(c.embedding, @vec)`. The container is partitioned by `/tenantId` and there are 12 tenants. He sees high RU cost and high latency. What's the simplest fix?

Knowledge Check

Lin's container holds 30,000 docs per partition with 1536-dim embeddings. Storage is the limiting factor β€” the embedding alone is 6 KB per doc, and there's other content too. Which option offers the best storage-to-recall trade-off without sacrificing useful results?