Cosmos DB for NoSQL: Connect + Query with the SDK

Why Cosmos DB shows up so much on AI-200

Simple explanation

Cosmos DB is Azure’s fast, distributed JSON database. AI apps love it because the data is documents — chat history, agent memory, embeddings, prompts, scratchpads — and each document might have a slightly different shape. Cosmos handles that without crying about schemas.

It’s also globally distributed, scales horizontally, and has built-in vector search now. So one database can hold the chat log, the embeddings, the user profile, AND power similarity search.

The exam expects you to know how to connect, run queries, pick a partition key, and use the SDK without burning through Request Units.

The connection model

from azure.cosmos import CosmosClient
from azure.identity import DefaultAzureCredential

# Recommended: managed identity / DefaultAzureCredential
client = CosmosClient(
    url="https://roo-cosmos.documents.azure.com:443/",
    credential=DefaultAzureCredential(),
)

# Legacy: account key (works but discouraged for production)
# client = CosmosClient(url, credential=ACCOUNT_KEY)

database = client.get_database_client("inventory")
container = database.get_container_client("items")

Three things to know about the client:

Property	Why it matters
`CosmosClient` is thread-safe and expensive to construct	Create one per process, reuse. Don’t `new CosmosClient()` per request.
Connection mode: Direct (TCP) or Gateway (HTTPS)	Direct is faster and the default for .NET; Gateway works behind restrictive firewalls.
Preferred regions	Set `preferred_locations` so the SDK reads from the closest replica. Halves p99 latency in multi-region accounts.

Point reads, single-partition queries, cross-partition queries

This is where AI-200 separates careful candidates from careless ones. The same operation costs vastly different amounts of Request Units (RUs) depending on which path you take.

The single most exam-relevant fact about Cosmos: point reads cost ~1 RU; cross-partition queries can cost orders of magnitude more.
Feature	Point read	Single-partition query	Cross-partition query
What it is	Read by `id` AND partition key	Query that filters on the partition key	Query that doesn't filter on the partition key
Typical cost (1 KB doc)	~1 RU	~2.5+ RUs (depends on selectivity)	Multiplies — fan-out across partitions
Latency	~1 ms	~5 ms	~tens of ms, depends on partitions
When to use	You know id + partition key — always prefer this	Filter for many docs in one tenant/user	Last resort or rare admin queries

# Point read — fastest, cheapest
item = container.read_item(item="msg-789", partition_key="user-42")

# Single-partition query
results = container.query_items(
    query="SELECT * FROM c WHERE c.userId = @uid AND c.kind = 'message'",
    parameters=[{"name": "@uid", "value": "user-42"}],
    partition_key="user-42",  # ← keeps this single-partition
)

# Cross-partition query (avoid in hot paths)
results = container.query_items(
    query="SELECT * FROM c WHERE c.kind = 'message' AND c.createdAt > @t",
    parameters=[{"name": "@t", "value": "2026-05-01T00:00:00Z"}],
    enable_cross_partition_query=True,
)

Exam tip: 'this query is slow / expensive — what do I change?'

Almost always: the query lacks a partition-key filter. Check the WHERE clause. If it doesn’t reference the partition-key path, the query fans out to every physical partition. Either add the filter, or rethink the partition-key choice if you can’t.

Variants of this question dominate Cosmos-DB exam content. Whenever you see “high RU consumption” or “high latency on the read path”, look for the missing partition key first.

Partition keys — the most important design choice

Every Cosmos container is sliced into logical partitions by a single property path you choose at creation time. Get the choice wrong and you’ll feel it forever.

Good partition key	Bad partition key
High cardinality (millions of distinct values)	Low cardinality (a handful of distinct values)
Even access distribution	One value gets all the traffic (“hot partition”)
Filterable in most queries	Rarely appears in WHERE clauses
Examples: `/userId`, `/tenantId`, `/sessionId`	Examples: `/region`, `/type`, `/status`

Common AI patterns:

Container	Partition key	Why
Chat sessions	`/sessionId`	Each session is a hot read for its user; balanced load
Agent state	`/userId` (or `/tenantId` if multi-tenant)	All a user’s state pulled with one query
Embeddings keyed to documents	`/sourceDocumentId`	Vector docs co-locate with the source they came from
Audit log	`/tenantId` + a date suffix	Avoids ever-growing single-partition

A logical partition is capped at 20 GB of data plus 10,000 RU/s of provisioned throughput. If you’d cross either limit, your partition key isn’t selective enough.

Querying — the SQL surface area

Cosmos query language is SQL-flavoured, with extensions for working with JSON (arrays, nested objects, system functions).

-- Plain projection
SELECT c.id, c.userId, c.title FROM c WHERE c.kind = 'session'

-- Nested projection and filter
SELECT c.id, c.user.name AS userName
FROM c
WHERE c.user.tier IN ('gold', 'platinum')

-- Joining into an array property
SELECT c.id, m.content
FROM c JOIN m IN c.messages
WHERE m.role = 'assistant' AND c.userId = 'user-42'

-- System functions
SELECT VALUE COUNT(1) FROM c WHERE c.createdAt > '2026-05-01'
SELECT VALUE AVG(c.tokens) FROM c WHERE c.userId = 'user-42'

You also get string functions (STARTSWITH, CONTAINS), array functions (ARRAY_LENGTH, ARRAY_CONTAINS), and the vector function VectorDistance for similarity search (separate module).

Bulk and transactional operations

For high-throughput ingestion (a common AI-data-pipeline pattern), each SDK provides bulk APIs:

# Python — bulk by passing a partition key + list of operations
operations = [
    ("create", (msg,))
    for msg in batch
]
container.execute_item_batch(operations, partition_key="session-42")

Two important constraints:

Transactional batches are limited to one partition — you can’t atomically write across users.
Bulk creates are not transactional — they execute concurrently and report per-operation success.

Authentication: keys, RBAC, or Microsoft Entra

Mode	Pattern	Use
Account keys	`CosmosClient(url, credential=KEY)`	Quick demos, legacy code
Resource tokens	Server vends scoped, time-bound tokens to clients	Mobile / web apps to avoid distributing master keys
Microsoft Entra (data plane RBAC)	`CosmosClient(url, credential=DefaultAzureCredential())` plus a built-in role on the account	Recommended for production. No keys, audit logs every operation.

For Entra-backed access:

# Grant the Container App's managed identity Data Reader on the database
az cosmosdb sql role assignment create \
  --account-name roo-cosmos \
  --resource-group roo-prod \
  --scope "/dbs/inventory" \
  --principal-id $PRINCIPAL_ID \
  --role-definition-id "00000000-0000-0000-0000-000000000001"   # Data Reader