Embeddings: Design, Chunk, and Generate

Turning text into searchable numbers

Simple explanation

Imagine every product in a store as a point on a map.

Similar products are close together: all the red shoes are in one area, all the running shoes in another, and red running shoes are where those areas overlap. That map is what embeddings create — except instead of physical location, each product is represented as a list of numbers (a vector) where similar items have similar numbers.

To create this map, you need to: (1) decide what information matters (product name? description? reviews?), (2) break long text into manageable pieces (chunks), and (3) send each piece to an AI model that converts it into a vector. This module covers all three steps.

Which columns to embed

Not every column should be embedded. Choose columns that contain semantic meaning users would search for:

Column Type	Embed?	Why
Product description	Yes	Rich semantic content users search with natural language
Customer review text	Yes	Contains opinions and experiences users want to find
Technical documentation	Yes	Long-form content with searchable concepts
Product name	Maybe	Short text — useful if combined with description
Price, quantity	No	Numeric — use standard SQL comparison operators
Status codes, IDs	No	Categorical — use exact match, not semantic search
Timestamps	No	Temporal — use date range queries

Combine related columns for richer embeddings:

-- Concatenate meaningful columns before embedding
SELECT
    ProductId,
    Name + ' | ' + Category + ' | ' + Description AS EmbeddingInput
FROM Products;

Chunking: breaking long text into pieces

Embedding models have token limits (typically 512-8,192 tokens). Long documents must be split into chunks.

Chunking strategies

Strategy	How It Works	Pros	Cons
Fixed-size	Split every N characters/tokens	Simple, predictable	May split mid-sentence
Sentence-based	Split on sentence boundaries	Preserves meaning	Variable chunk sizes
Paragraph-based	Split on paragraph breaks	Natural boundaries	Paragraphs vary in length
Semantic	Split where topics change	Best meaning preservation	Most complex to implement
Overlapping	Each chunk includes some text from adjacent chunks	Preserves cross-boundary context	More chunks to store and search

SQL Server 2025: AI_GENERATE_CHUNKS

-- Built-in chunking function
SELECT chunk, chunk_order, chunk_offset, chunk_length
FROM AI_GENERATE_CHUNKS(
    SOURCE = @text,
    CHUNK_TYPE = FIXED,
    CHUNK_SIZE = 500        -- characters (not tokens)
);

Exam tip: Chunk size affects search quality

Smaller chunks (256-512 characters) give more precise search results but lose broader context. Larger chunks (1024-2048 characters) preserve context but may dilute the relevance signal. The sweet spot depends on your content:

FAQ entries, short descriptions: no chunking needed (already small)
Product documentation: 500-character chunks with overlap
Legal documents: paragraph-based chunking to preserve clause boundaries
Conversations: message-level chunking (each message = one chunk)

The exam may present a scenario where search results are too broad — the fix is usually smaller chunks.

Generating embeddings

Using AI_GENERATE_EMBEDDINGS (SQL Server 2025)

-- Generate embeddings using a registered model
SELECT
    ProductId,
    AI_GENERATE_EMBEDDINGS(Description USE MODEL EmbeddingModel) AS DescriptionVector
FROM Products;

-- Store embeddings in a vector column
ALTER TABLE Products ADD DescriptionEmbedding VECTOR(1536);

UPDATE Products
SET DescriptionEmbedding = AI_GENERATE_EMBEDDINGS(Description USE MODEL EmbeddingModel);

Maintaining embeddings as data changes

When source data changes, embeddings become stale. Choose a maintenance method based on your data change patterns:

Embedding maintenance methods — choose based on volume, latency, and architecture
Method	How It Detects Changes	Latency	Best For
Table triggers	AFTER INSERT/UPDATE trigger regenerates embedding	Immediate (synchronous)	Low-volume tables where latency matters
Change Tracking	Lightweight change flags on rows	Near-real-time (polling)	Medium-volume, polling-based sync
Azure Functions SQL trigger	Serverless function triggered by row changes	Seconds to minutes	Serverless architecture, event-driven
CDC (Change Data Capture)	Full change history in change tables	Near-real-time	When you need the full change history, not just current state
CES (Change Event Streaming)	Streams changes to Event Hubs	Real-time streaming	High-volume, event-driven architectures
Azure Logic Apps	Workflow triggered by schedule or event	Minutes to hours	Low-code workflow, scheduled batch updates
Microsoft Foundry	AI pipeline orchestration	Configurable	Complex AI pipelines with multiple models

Scenario: Leo's embedding pipeline at SearchWave

SearchWave’s product catalog has 500,000 products. About 1,000 products change daily (new listings, updated descriptions). Leo designs the embedding maintenance strategy:

Initial load: batch process all 500K products overnight using a SQL Agent job
Ongoing updates: Azure Functions with SQL trigger binding — when a product’s description changes, the function automatically regenerates its embedding
Why not triggers? Calling an external AI model synchronously inside a trigger would block the INSERT/UPDATE transaction. Azure Functions processes the embedding asynchronously.

Question

Why should you not embed numeric columns like Price or Quantity?

Click or press Enter to reveal answer

Answer

Embeddings capture semantic meaning in text. Numeric values like prices do not have semantic meaning — 29.99 and 30.00 are semantically identical but would produce different vectors. Use standard SQL comparison operators (=, >, <, BETWEEN) for numeric filtering.

Click to flip back

Question

What is the purpose of overlapping chunks?

Click or press Enter to reveal answer

Answer

Overlapping chunks include some text from adjacent chunks at the boundaries. This preserves context that would be lost if a concept spans two chunks. For example, with 50-token overlap, the last 50 tokens of chunk 1 are also the first 50 tokens of chunk 2.

Click to flip back

Question

Why should you avoid using triggers for embedding maintenance on high-volume tables?

Click or press Enter to reveal answer

Answer

Triggers run synchronously inside the transaction. Calling an external AI model from a trigger blocks the INSERT/UPDATE until the model responds — adding seconds of latency to every write. For high-volume tables, use async methods (Azure Functions, CES, CDC) instead.

Click to flip back

Knowledge Check

Priya at Vault Bank has a customer support database with 10 million tickets. Each ticket has a subject line and a 2,000-word description. She wants to enable semantic search. How should she prepare the data for embeddings?

Next up: Choose Your Search Strategy — compare full-text, vector, and hybrid search to pick the right approach for your data.