Domain 3 β€” Module 6 of 7 86%
27 of 28 overall
Domain 3: Implement AI Capabilities in Database Solutions Free ⏱ ~12 min read

Hybrid Search and Reciprocal Rank Fusion

Combine full-text and vector search for superior results β€” implement hybrid search patterns and use Reciprocal Rank Fusion (RRF) to merge ranked lists intelligently.

The best of both worlds

Simple explanation

Imagine two talent scouts looking for the best footballer.

Scout A ranks players by statistics (goals, assists, pass accuracy). Scout B ranks players by watching them play (game intelligence, creativity, leadership). Each scout’s list is good, but combining them gives a better list than either alone. Hybrid search does the same: full-text search finds keyword matches, vector search finds semantic matches, and Reciprocal Rank Fusion (RRF) intelligently merges both lists into one final ranking.

How hybrid search works

User Query: "comfortable running shoes for flat feet"
                    β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                       β–Ό
  Full-Text Search         Vector Search
  (keyword match)          (semantic match)
        β”‚                       β”‚
  1. Nike Flat Support     1. Brooks Adrenaline
  2. ASICS Kayano          2. Nike Flat Support
  3. Brooks Stability      3. New Balance 860
  4. Saucony Guide         4. ASICS Gel-Kayano
                                β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
           Reciprocal Rank Fusion
                    β”‚
  1. Nike Flat Support  (in both lists!)
  2. ASICS Kayano/Gel-Kayano (in both)
  3. Brooks Adrenaline  (top in vector)
  4. Brooks Stability   (keyword match)

Implementing hybrid search in SQL

-- Step 1: Full-text search results with ranking
SELECT ProductId, ROW_NUMBER() OVER (ORDER BY ft.RANK DESC) AS FTRank
INTO #FullTextResults
FROM CONTAINSTABLE(Products, Description, @searchTerms) ft;

-- Step 2: Vector search results with ranking
SELECT TOP (50) WITH APPROXIMATE
    ProductId,
    ROW_NUMBER() OVER (ORDER BY VECTOR_DISTANCE('cosine', DescriptionEmbedding, @queryVector)) AS VecRank
INTO #VectorResults
FROM Products
ORDER BY VECTOR_DISTANCE('cosine', DescriptionEmbedding, @queryVector);

-- Step 3: Reciprocal Rank Fusion
WITH RRF AS (
    SELECT
        COALESCE(ft.ProductId, vs.ProductId) AS ProductId,
        ISNULL(1.0 / (60 + ft.FTRank), 0) +
        ISNULL(1.0 / (60 + vs.VecRank), 0) AS RRFScore
    FROM #FullTextResults ft
    FULL OUTER JOIN #VectorResults vs ON ft.ProductId = vs.ProductId
)
SELECT TOP 10 r.ProductId, p.Name, r.RRFScore
FROM RRF r
INNER JOIN Products p ON r.ProductId = p.ProductId
ORDER BY r.RRFScore DESC;

Reciprocal Rank Fusion explained

The RRF formula for each result:

RRF Score = sum of 1 / (k + rank) across all ranked lists

Where k is a constant (typically 60) that dampens the impact of high ranks.

Example calculation

ProductFull-Text RankVector RankRRF Score
Nike Flat Support121/(60+1) + 1/(60+2) = 0.0164 + 0.0161 = 0.0325
ASICS Kayano241/(60+2) + 1/(60+4) = 0.0161 + 0.0156 = 0.0317
Brooks AdrenalineNot found10 + 1/(60+1) = 0.0164
Saucony Guide4Not found1/(60+4) + 0 = 0.0156

Results in BOTH lists score highest. Nike Flat Support wins because it ranked well in both searches.

Exam tip: Why k = 60?

The constant k controls how much the rank position matters. With k = 60:

  • Rank 1 gets score 1/61 = 0.0164
  • Rank 2 gets score 1/62 = 0.0161
  • The difference between rank 1 and rank 10 is small (0.0164 vs 0.0143)

This means RRF focuses more on which lists a result appears in rather than its exact position. A result at rank 5 in both lists beats a result at rank 1 in only one list. This is the correct behaviour for fusion β€” appearing in multiple search modalities is a stronger signal than ranking first in just one.

Evaluating hybrid search performance

MetricWhat to MeasureHybrid vs Single
Precision@KAre the top K results relevant?Hybrid usually +10-20% over vector alone
Recall@KAre all relevant results found?Hybrid usually +5-15% over full-text alone
MRR (Mean Reciprocal Rank)How high is the first relevant result?Hybrid typically best
LatencyTotal query time (both searches + fusion)~2x single search (run in parallel to minimise)
Scenario: Leo optimises SearchWave's hybrid search

Leo at SearchWave compares three search approaches on 1,000 test queries:

  • Full-text only: Precision@10 = 62%, MRR = 0.71
  • Vector only: Precision@10 = 74%, MRR = 0.78
  • Hybrid (RRF): Precision@10 = 83%, MRR = 0.89

Hybrid search improves precision by 9 points over vector alone and 21 points over full-text. The latency cost is minimal because Leo runs both searches in parallel (total time is the max of the two, not the sum).

Question

What is Reciprocal Rank Fusion (RRF)?

Click or press Enter to reveal answer

Answer

A score fusion algorithm that combines multiple ranked lists. Each result gets a score of 1/(k + rank) from each list it appears in. Results appearing in multiple lists score highest. The constant k (typically 60) dampens rank position impact, emphasising cross-list presence over exact rank.

Click to flip back

Question

Why does hybrid search outperform either search type alone?

Click or press Enter to reveal answer

Answer

Full-text excels at exact keyword matches. Vector excels at semantic meaning. Each misses what the other catches. Hybrid combines both strengths: a query for 'refund policy' finds documents containing those exact words (full-text) AND documents about 'return process' or 'money back guarantee' (vector).

Click to flip back

Question

What does the k constant control in the RRF formula?

Click or press Enter to reveal answer

Answer

k dampens the impact of exact rank position. With k=60, the difference between rank 1 and rank 10 is small. This makes RRF focus on WHICH lists a result appears in rather than its exact position. Appearing in multiple search modes is a stronger relevance signal than ranking first in one mode.

Click to flip back

Knowledge Check

Leo at SearchWave runs hybrid search but notices that some very relevant products only appear in vector results (not full-text) and get low hybrid scores. What should he adjust?

Next up: RAG with SQL: Prompt, Process, Respond β€” build retrieval-augmented generation directly in your SQL database.