Domain 2 β€” Module 1 of 8 13%
9 of 27 overall
Domain 2: Develop AI solutions by using Azure data management services Free ⏱ ~13 min read

Cosmos DB for NoSQL: Connect + Query with the SDK

How an AI app reads and writes Cosmos DB documents using the .NET, Python, and Java SDKs. Connection patterns, partition keys, point reads vs queries, and the SQL query syntax that makes the exam fair.

Why Cosmos DB shows up so much on AI-200

Simple explanation

Cosmos DB is Azure’s fast, distributed JSON database. AI apps love it because the data is documents β€” chat history, agent memory, embeddings, prompts, scratchpads β€” and each document might have a slightly different shape. Cosmos handles that without crying about schemas.

It’s also globally distributed, scales horizontally, and has built-in vector search now. So one database can hold the chat log, the embeddings, the user profile, AND power similarity search.

The exam expects you to know how to connect, run queries, pick a partition key, and use the SDK without burning through Request Units.

The connection model

from azure.cosmos import CosmosClient
from azure.identity import DefaultAzureCredential

# Recommended: managed identity / DefaultAzureCredential
client = CosmosClient(
    url="https://roo-cosmos.documents.azure.com:443/",
    credential=DefaultAzureCredential(),
)

# Legacy: account key (works but discouraged for production)
# client = CosmosClient(url, credential=ACCOUNT_KEY)

database = client.get_database_client("inventory")
container = database.get_container_client("items")

Three things to know about the client:

PropertyWhy it matters
CosmosClient is thread-safe and expensive to constructCreate one per process, reuse. Don’t new CosmosClient() per request.
Connection mode: Direct (TCP) or Gateway (HTTPS)Direct is faster and the default for .NET; Gateway works behind restrictive firewalls.
Preferred regionsSet preferred_locations so the SDK reads from the closest replica. Halves p99 latency in multi-region accounts.

Point reads, single-partition queries, cross-partition queries

This is where AI-200 separates careful candidates from careless ones. The same operation costs vastly different amounts of Request Units (RUs) depending on which path you take.

The single most exam-relevant fact about Cosmos: point reads cost ~1 RU; cross-partition queries can cost orders of magnitude more.
FeaturePoint readSingle-partition queryCross-partition query
What it isRead by `id` AND partition keyQuery that filters on the partition keyQuery that doesn't filter on the partition key
Typical cost (1 KB doc)~1 RU~2.5+ RUs (depends on selectivity)Multiplies β€” fan-out across partitions
Latency~1 ms~5 ms~tens of ms, depends on partitions
When to useYou know id + partition key β€” always prefer thisFilter for many docs in one tenant/userLast resort or rare admin queries
# Point read β€” fastest, cheapest
item = container.read_item(item="msg-789", partition_key="user-42")

# Single-partition query
results = container.query_items(
    query="SELECT * FROM c WHERE c.userId = @uid AND c.kind = 'message'",
    parameters=[{"name": "@uid", "value": "user-42"}],
    partition_key="user-42",  # ← keeps this single-partition
)

# Cross-partition query (avoid in hot paths)
results = container.query_items(
    query="SELECT * FROM c WHERE c.kind = 'message' AND c.createdAt > @t",
    parameters=[{"name": "@t", "value": "2026-05-01T00:00:00Z"}],
    enable_cross_partition_query=True,
)
Exam tip: 'this query is slow / expensive β€” what do I change?'

Almost always: the query lacks a partition-key filter. Check the WHERE clause. If it doesn’t reference the partition-key path, the query fans out to every physical partition. Either add the filter, or rethink the partition-key choice if you can’t.

Variants of this question dominate Cosmos-DB exam content. Whenever you see β€œhigh RU consumption” or β€œhigh latency on the read path”, look for the missing partition key first.

Partition keys β€” the most important design choice

Every Cosmos container is sliced into logical partitions by a single property path you choose at creation time. Get the choice wrong and you’ll feel it forever.

Good partition keyBad partition key
High cardinality (millions of distinct values)Low cardinality (a handful of distinct values)
Even access distributionOne value gets all the traffic (β€œhot partition”)
Filterable in most queriesRarely appears in WHERE clauses
Examples: /userId, /tenantId, /sessionIdExamples: /region, /type, /status

Common AI patterns:

ContainerPartition keyWhy
Chat sessions/sessionIdEach session is a hot read for its user; balanced load
Agent state/userId (or /tenantId if multi-tenant)All a user’s state pulled with one query
Embeddings keyed to documents/sourceDocumentIdVector docs co-locate with the source they came from
Audit log/tenantId + a date suffixAvoids ever-growing single-partition

A logical partition is capped at 20 GB of data plus 10,000 RU/s of provisioned throughput. If you’d cross either limit, your partition key isn’t selective enough.

Querying β€” the SQL surface area

Cosmos query language is SQL-flavoured, with extensions for working with JSON (arrays, nested objects, system functions).

-- Plain projection
SELECT c.id, c.userId, c.title FROM c WHERE c.kind = 'session'

-- Nested projection and filter
SELECT c.id, c.user.name AS userName
FROM c
WHERE c.user.tier IN ('gold', 'platinum')

-- Joining into an array property
SELECT c.id, m.content
FROM c JOIN m IN c.messages
WHERE m.role = 'assistant' AND c.userId = 'user-42'

-- System functions
SELECT VALUE COUNT(1) FROM c WHERE c.createdAt > '2026-05-01'
SELECT VALUE AVG(c.tokens) FROM c WHERE c.userId = 'user-42'

You also get string functions (STARTSWITH, CONTAINS), array functions (ARRAY_LENGTH, ARRAY_CONTAINS), and the vector function VectorDistance for similarity search (separate module).

Bulk and transactional operations

For high-throughput ingestion (a common AI-data-pipeline pattern), each SDK provides bulk APIs:

# Python β€” bulk by passing a partition key + list of operations
operations = [
    ("create", (msg,))
    for msg in batch
]
container.execute_item_batch(operations, partition_key="session-42")

Two important constraints:

  • Transactional batches are limited to one partition β€” you can’t atomically write across users.
  • Bulk creates are not transactional β€” they execute concurrently and report per-operation success.

Authentication: keys, RBAC, or Microsoft Entra

ModePatternUse
Account keysCosmosClient(url, credential=KEY)Quick demos, legacy code
Resource tokensServer vends scoped, time-bound tokens to clientsMobile / web apps to avoid distributing master keys
Microsoft Entra (data plane RBAC)CosmosClient(url, credential=DefaultAzureCredential()) plus a built-in role on the accountRecommended for production. No keys, audit logs every operation.

For Entra-backed access:

# Grant the Container App's managed identity Data Reader on the database
az cosmosdb sql role assignment create \
  --account-name roo-cosmos \
  --resource-group roo-prod \
  --scope "/dbs/inventory" \
  --principal-id $PRINCIPAL_ID \
  --role-definition-id "00000000-0000-0000-0000-000000000001"   # Data Reader

Key terms

Question

What is a Cosmos DB point read, and why is it preferred?

Click or press Enter to reveal answer

Answer

A read by `id` AND partition key β€” both keys known. Costs ~1 RU and returns in ~1 ms. Always prefer point reads over single-partition queries when you have both keys; the SDK automatically optimises this case.

Click to flip back

Question

What's the difference between a single-partition and cross-partition query in Cosmos DB?

Click or press Enter to reveal answer

Answer

A single-partition query filters on the partition-key path, so the SDK targets exactly one physical partition. A cross-partition query has no partition-key filter and fans out to all partitions β€” slower and more expensive. Always include the partition key in WHERE when possible.

Click to flip back

Question

Why is `CosmosClient` expensive to construct?

Click or press Enter to reveal answer

Answer

It opens connections to Cosmos endpoints and caches metadata about the account, regions, and partitions. Construct once per process and reuse. Constructing per request adds ~100 ms latency and exhausts socket pools under load.

Click to flip back

Question

What are the limits of a Cosmos DB logical partition?

Click or press Enter to reveal answer

Answer

20 GB of data and 10,000 RU/s of provisioned throughput per partition-key value. If a partition key value would exceed either limit, it's not selective enough β€” choose a higher-cardinality property.

Click to flip back

Question

Which authentication mode is recommended for production Cosmos DB access?

Click or press Enter to reveal answer

Answer

Microsoft Entra ID with data-plane RBAC (DefaultAzureCredential / managed identity). Account keys still work but distribute master credentials and lack per-operation audit.

Click to flip back

Knowledge check

Knowledge Check

Mira's Container App reads chat sessions from a Cosmos container. The query `SELECT * FROM c WHERE c.kind = 'session'` is the slowest in the app. The container is partitioned by `/userId`. What's the most likely cause?

Knowledge Check

Theo retrieves a single chat message by its `id`, and the session is the partition key. He uses `container.query_items` with the SQL `SELECT FROM c WHERE c.id matches the message id`, plus the option `enable_cross_partition_query=True`. What's the better approach?

Knowledge Check

Lin is choosing a partition key for a multi-tenant chat-history container that holds 1 message per document. Most queries filter by tenant. Which is the best choice?