Cosmos DB for NoSQL: Connect + Query with the SDK
How an AI app reads and writes Cosmos DB documents using the .NET, Python, and Java SDKs. Connection patterns, partition keys, point reads vs queries, and the SQL query syntax that makes the exam fair.
Why Cosmos DB shows up so much on AI-200
Cosmos DB is Azureβs fast, distributed JSON database. AI apps love it because the data is documents β chat history, agent memory, embeddings, prompts, scratchpads β and each document might have a slightly different shape. Cosmos handles that without crying about schemas.
Itβs also globally distributed, scales horizontally, and has built-in vector search now. So one database can hold the chat log, the embeddings, the user profile, AND power similarity search.
The exam expects you to know how to connect, run queries, pick a partition key, and use the SDK without burning through Request Units.
The connection model
from azure.cosmos import CosmosClient
from azure.identity import DefaultAzureCredential
# Recommended: managed identity / DefaultAzureCredential
client = CosmosClient(
url="https://roo-cosmos.documents.azure.com:443/",
credential=DefaultAzureCredential(),
)
# Legacy: account key (works but discouraged for production)
# client = CosmosClient(url, credential=ACCOUNT_KEY)
database = client.get_database_client("inventory")
container = database.get_container_client("items")
Three things to know about the client:
| Property | Why it matters |
|---|---|
CosmosClient is thread-safe and expensive to construct | Create one per process, reuse. Donβt new CosmosClient() per request. |
| Connection mode: Direct (TCP) or Gateway (HTTPS) | Direct is faster and the default for .NET; Gateway works behind restrictive firewalls. |
| Preferred regions | Set preferred_locations so the SDK reads from the closest replica. Halves p99 latency in multi-region accounts. |
Point reads, single-partition queries, cross-partition queries
This is where AI-200 separates careful candidates from careless ones. The same operation costs vastly different amounts of Request Units (RUs) depending on which path you take.
| Feature | Point read | Single-partition query | Cross-partition query |
|---|---|---|---|
| What it is | Read by `id` AND partition key | Query that filters on the partition key | Query that doesn't filter on the partition key |
| Typical cost (1 KB doc) | ~1 RU | ~2.5+ RUs (depends on selectivity) | Multiplies β fan-out across partitions |
| Latency | ~1 ms | ~5 ms | ~tens of ms, depends on partitions |
| When to use | You know id + partition key β always prefer this | Filter for many docs in one tenant/user | Last resort or rare admin queries |
# Point read β fastest, cheapest
item = container.read_item(item="msg-789", partition_key="user-42")
# Single-partition query
results = container.query_items(
query="SELECT * FROM c WHERE c.userId = @uid AND c.kind = 'message'",
parameters=[{"name": "@uid", "value": "user-42"}],
partition_key="user-42", # β keeps this single-partition
)
# Cross-partition query (avoid in hot paths)
results = container.query_items(
query="SELECT * FROM c WHERE c.kind = 'message' AND c.createdAt > @t",
parameters=[{"name": "@t", "value": "2026-05-01T00:00:00Z"}],
enable_cross_partition_query=True,
)
Exam tip: 'this query is slow / expensive β what do I change?'
Almost always: the query lacks a partition-key filter. Check the WHERE clause. If it doesnβt reference the partition-key path, the query fans out to every physical partition. Either add the filter, or rethink the partition-key choice if you canβt.
Variants of this question dominate Cosmos-DB exam content. Whenever you see βhigh RU consumptionβ or βhigh latency on the read pathβ, look for the missing partition key first.
Partition keys β the most important design choice
Every Cosmos container is sliced into logical partitions by a single property path you choose at creation time. Get the choice wrong and youβll feel it forever.
| Good partition key | Bad partition key |
|---|---|
| High cardinality (millions of distinct values) | Low cardinality (a handful of distinct values) |
| Even access distribution | One value gets all the traffic (βhot partitionβ) |
| Filterable in most queries | Rarely appears in WHERE clauses |
Examples: /userId, /tenantId, /sessionId | Examples: /region, /type, /status |
Common AI patterns:
| Container | Partition key | Why |
|---|---|---|
| Chat sessions | /sessionId | Each session is a hot read for its user; balanced load |
| Agent state | /userId (or /tenantId if multi-tenant) | All a userβs state pulled with one query |
| Embeddings keyed to documents | /sourceDocumentId | Vector docs co-locate with the source they came from |
| Audit log | /tenantId + a date suffix | Avoids ever-growing single-partition |
A logical partition is capped at 20 GB of data plus 10,000 RU/s of provisioned throughput. If youβd cross either limit, your partition key isnβt selective enough.
Querying β the SQL surface area
Cosmos query language is SQL-flavoured, with extensions for working with JSON (arrays, nested objects, system functions).
-- Plain projection
SELECT c.id, c.userId, c.title FROM c WHERE c.kind = 'session'
-- Nested projection and filter
SELECT c.id, c.user.name AS userName
FROM c
WHERE c.user.tier IN ('gold', 'platinum')
-- Joining into an array property
SELECT c.id, m.content
FROM c JOIN m IN c.messages
WHERE m.role = 'assistant' AND c.userId = 'user-42'
-- System functions
SELECT VALUE COUNT(1) FROM c WHERE c.createdAt > '2026-05-01'
SELECT VALUE AVG(c.tokens) FROM c WHERE c.userId = 'user-42'
You also get string functions (STARTSWITH, CONTAINS), array functions (ARRAY_LENGTH, ARRAY_CONTAINS), and the vector function VectorDistance for similarity search (separate module).
Bulk and transactional operations
For high-throughput ingestion (a common AI-data-pipeline pattern), each SDK provides bulk APIs:
# Python β bulk by passing a partition key + list of operations
operations = [
("create", (msg,))
for msg in batch
]
container.execute_item_batch(operations, partition_key="session-42")
Two important constraints:
- Transactional batches are limited to one partition β you canβt atomically write across users.
- Bulk creates are not transactional β they execute concurrently and report per-operation success.
Authentication: keys, RBAC, or Microsoft Entra
| Mode | Pattern | Use |
|---|---|---|
| Account keys | CosmosClient(url, credential=KEY) | Quick demos, legacy code |
| Resource tokens | Server vends scoped, time-bound tokens to clients | Mobile / web apps to avoid distributing master keys |
| Microsoft Entra (data plane RBAC) | CosmosClient(url, credential=DefaultAzureCredential()) plus a built-in role on the account | Recommended for production. No keys, audit logs every operation. |
For Entra-backed access:
# Grant the Container App's managed identity Data Reader on the database
az cosmosdb sql role assignment create \
--account-name roo-cosmos \
--resource-group roo-prod \
--scope "/dbs/inventory" \
--principal-id $PRINCIPAL_ID \
--role-definition-id "00000000-0000-0000-0000-000000000001" # Data Reader
Key terms
Knowledge check
Mira's Container App reads chat sessions from a Cosmos container. The query `SELECT * FROM c WHERE c.kind = 'session'` is the slowest in the app. The container is partitioned by `/userId`. What's the most likely cause?
Theo retrieves a single chat message by its `id`, and the session is the partition key. He uses `container.query_items` with the SQL `SELECT FROM c WHERE c.id matches the message id`, plus the option `enable_cross_partition_query=True`. What's the better approach?
Lin is choosing a partition key for a multi-tenant chat-history container that holds 1 message per document. Most queries filter by tenant. Which is the best choice?