Domain 1 β€” Module 3 of 8 38%
3 of 27 overall
Domain 1: Plan and Manage an Azure AI Solution Free ⏱ ~12 min read

Retrieval, Indexing & Agent Memory

Vector search, hybrid search, semantic search β€” and how agents remember conversations. Learn the architecture decisions behind retrieval and memory before you build anything.

Choosing a retrieval strategy

Simple explanation

Retrieval is how your AI finds the right information before answering a question.

Imagine you’re studying for an exam with 500 pages of notes. You could: search for exact words (keyword search), search by meaning (semantic search), search by β€œvibes” β€” finding notes that feel similar even if the words are different (vector search), or combine all three (hybrid search).

Each method has trade-offs. The exam tests whether you know when to pick which one.

The four search methods

Four retrieval methods in Azure AI Search
FeatureKeywordSemanticVectorHybrid
How it worksMatches exact words (BM25)Understands meaning using a re-rankerCompares embeddings in vector spaceCombines keyword + vector + re-ranker
StrengthsFast, precise for exact termsUnderstands synonyms and intentFinds conceptually similar contentBest of all worlds
WeaknessesMisses synonyms ('car' won't find 'vehicle')Slower than keyword aloneNeeds embedding pipelineMost complex to configure
Best forProduct codes, error IDs, exact namesNatural language questionsFinding similar documentsProduction RAG applications
Azure AI Search featureFull-text search (default)Semantic ranker (add-on)Vector search (configure embeddings)Hybrid search (combine all)
Exam tip: Hybrid search is usually the right answer

When the exam asks β€œwhich search method should you use for a RAG application?” and the scenario doesn’t have a specific constraint, hybrid search is almost always correct. It combines the precision of keyword search with the semantic understanding of vector search, plus re-ranking for relevance.

Only pick a single method when the scenario explicitly constrains you (e.g., β€œexact product SKU lookup” = keyword, β€œfind conceptually similar research papers” = vector).

Indexing strategies

Before you can search, you need to index your content. Key decisions:

DecisionOptionsImpact
Chunking strategyFixed-size, paragraph, semantic, documentAffects retrieval precision β€” too big and you get noise, too small and you lose context
Embedding modeltext-embedding-3-small, text-embedding-ada-002, customAffects vector search quality and cost
MetadataTitle, source URL, date, section headingsEnables filtering and improves citation quality
Refresh frequencyReal-time, scheduled, on-changeBalances freshness against indexing cost
Real-world example: NeuralMed's indexing strategy

NeuralMed indexes 10,000 medical articles for their patient chatbot:

  • Chunking: Paragraph-level (medical information needs context β€” a sentence alone is often meaningless)
  • Embedding: text-embedding-3-small (good accuracy, lower cost than large)
  • Metadata: Article title, publication date, medical specialty, source journal
  • Refresh: Weekly batch (medical literature doesn’t change hourly)
  • Search type: Hybrid (patients ask natural-language questions, but drug names need exact match)

Agent memory and knowledge integration

Agents need three types of memory:

Memory TypeWhat It StoresScope
Conversation memoryChat history within a sessionPer-thread (one conversation)
Persistent memoryFacts learned across conversationsPer-user or per-agent
KnowledgeExternal data sources the agent can searchShared across all conversations

Tool and knowledge integration for agents

Integration TypeServiceUse Case
Knowledge storesFoundry IQ, Azure AI SearchAgent searches company docs to answer questions
Function callingCustom functions, APIsAgent calls external systems (CRM, database, calendar)
Code interpreterBuilt-in Foundry toolAgent writes and runs Python code to analyse data
Web searchBing groundingAgent searches the web for current information
Exam tip: Memory vs knowledge

The exam distinguishes between memory (what the agent remembers from conversations) and knowledge (external data the agent can search). A common trap:

  • β€œThe agent needs to remember the user’s preferences across sessions” β†’ Persistent memory
  • β€œThe agent needs to answer questions about company policies” β†’ Knowledge integration (Foundry IQ or Search)

Memory is about the conversation. Knowledge is about the data.

Key terms

Question

What is vector search?

Click or press Enter to reveal answer

Answer

A search method that converts text into numerical vectors (embeddings) and finds similar content by measuring distance in vector space. Finds conceptually similar results even when words differ.

Click to flip back

Question

What is hybrid search?

Click or press Enter to reveal answer

Answer

A search strategy that combines keyword search (BM25) with vector search, then applies semantic re-ranking to produce the most relevant results. Recommended for most production RAG applications.

Click to flip back

Question

What is chunking in the context of indexing?

Click or press Enter to reveal answer

Answer

The process of splitting documents into smaller segments (chunks) for indexing. Chunk size affects retrieval quality β€” too large captures noise, too small loses context. Common strategies: fixed-size, paragraph, or semantic chunking.

Click to flip back

Question

What is conversation memory in an agent?

Click or press Enter to reveal answer

Answer

The chat history stored within a single conversation thread. It allows the agent to reference earlier messages in the same session. Scope is per-thread β€” starting a new conversation starts fresh.

Click to flip back

Question

What is Foundry IQ?

Click or press Enter to reveal answer

Answer

Foundry's built-in knowledge integration for agents. Upload documents and Foundry IQ automatically indexes them, making them searchable by agents without configuring Azure AI Search manually.

Click to flip back

Knowledge check

Knowledge Check

Atlas Financial needs to search 50,000 regulatory documents. Compliance officers type natural-language questions like 'What are the capital requirements for commercial lending?' but also search for specific regulation numbers like 'Basel III Section 4.2'. Which search method should they use?

Knowledge Check

MediaForge's content agent needs to remember each client's brand guidelines across multiple conversations over weeks. Which type of memory should they implement?