Domain 1 β€” Module 8 of 10 80%
8 of 28 overall
Domain 1: Design and Develop Database Solutions Free ⏱ ~10 min read

Fuzzy String Matching: Finding Similar Text

Use EDIT_DISTANCE and JARO_WINKLER functions to find strings that are close but not identical β€” catch typos, match misspelled names, and deduplicate messy data.

When exact matching is not enough

Simple explanation

Imagine searching for β€œJon Smith” in a customer database.

An exact search finds nothing β€” but the customer exists as β€œJohn Smith.” A fuzzy match says: β€œthose two strings are 90% similar β€” probably the same person.” That is fuzzy string matching: measuring how close two strings are, even when they are not identical.

SQL Server 2025 adds built-in functions that calculate the β€œdistance” between strings β€” how many edits (insertions, deletions, swaps) it takes to turn one into the other. Fewer edits = more similar.

The fuzzy matching functions

Four fuzzy functions β€” two algorithms, each with distance and similarity variants
FunctionAlgorithmReturnsBest For
EDIT_DISTANCEDamerau-LevenshteinInteger (number of edits)Measuring exact number of changes needed
EDIT_DISTANCE_SIMILARITYDamerau-Levenshtein0-100 (percentage)Comparing similarity as a normalised score
JARO_WINKLER_DISTANCEJaro-WinklerFloat 0.0-1.0 (distance)Matching names and short strings where prefix matters
JARO_WINKLER_SIMILARITYJaro-Winkler0-100 (percentage)Similarity score favouring strings that match from the start

EDIT_DISTANCE: count the changes

EDIT_DISTANCE calculates the minimum number of insertions, deletions, substitutions, and transpositions needed to transform one string into another.

-- How many edits to go from "Colour" to "Color"?
SELECT EDIT_DISTANCE('Colour', 'Color') AS Distance;
-- Returns: 1 (delete the 'u')

-- Find customers whose name is within 2 edits of a search term
SELECT FullName, Email
FROM Customers
WHERE EDIT_DISTANCE(FullName, 'Jon Smith') <= 2;
-- Matches: "John Smith" (1 edit), "Jon Smyth" (2 edits)

EDIT_DISTANCE_SIMILARITY: normalised score

SELECT EDIT_DISTANCE_SIMILARITY('Colour', 'Color') AS Similarity;
-- Returns: 83  (83% similar)

-- Find products with names at least 80% similar to the search
SELECT Name, EDIT_DISTANCE_SIMILARITY(Name, @SearchTerm) AS Score
FROM Products
WHERE EDIT_DISTANCE_SIMILARITY(Name, @SearchTerm) >= 80
ORDER BY Score DESC;

The formula: (1 - (edit_distance / GREATEST(LEN(string1), LEN(string2)))) * 100

JARO_WINKLER: prefix-weighted matching

Jaro-Winkler gives extra weight to strings that match from the beginning β€” making it ideal for matching names where the first few characters are usually correct.

-- Compare "Martha" and "Marhta" (transposition)
SELECT JARO_WINKLER_DISTANCE('Martha', 'Marhta') AS Distance;
-- Returns: ~0.039 (very close β€” low distance = high similarity)

SELECT JARO_WINKLER_SIMILARITY('Martha', 'Marhta') AS Similarity;
-- Returns: 96 (96% similar)

When to use which algorithm?

ScenarioBest AlgorithmWhy
Name matching (people, companies)Jaro-WinklerPrefix weighting helps β€” β€œJohn” vs β€œJon” scores higher than β€œohnJ” vs β€œJon”
General typo detectionEdit DistanceCounts all edits equally β€” good for any position
Address matchingEdit DistanceAddresses have errors anywhere, not just at the end
Short strings (2-5 chars)Jaro-WinklerMore sensitive to small differences
Long strings (50+ chars)Edit DistanceMore meaningful absolute counts
DeduplicationEDIT_DISTANCE_SIMILARITYNormalised score makes threshold setting easier across varying string lengths
Exam tip: Preview features require configuration

Fuzzy matching functions are currently in preview. To use them:

ALTER DATABASE SCOPED CONFIGURATION SET PREVIEW_FEATURES = ON;

The exam may test whether you know this prerequisite. If a question mentions fuzzy functions failing, check whether PREVIEW_FEATURES is enabled.

Scenario: Priya's customer deduplication at Vault Bank

Vault Bank has 2 million customer records from three merged systems. Many are duplicates with slight name variations: β€œRobert Chen” / β€œRob Chen” / β€œRobert C Chen.” Priya builds a deduplication query:

SELECT a.CustomerId, a.FullName, b.CustomerId, b.FullName,
       EDIT_DISTANCE_SIMILARITY(a.FullName, b.FullName) AS NameScore,
       JARO_WINKLER_SIMILARITY(a.Email, b.Email) AS EmailScore
FROM Customers a
INNER JOIN Customers b ON a.CustomerId < b.CustomerId
WHERE EDIT_DISTANCE_SIMILARITY(a.FullName, b.FullName) >= 85
  AND JARO_WINKLER_SIMILARITY(a.Email, b.Email) >= 90;

She uses both algorithms together β€” Edit Distance for name similarity and Jaro-Winkler for email prefix matching β€” to find likely duplicates with high confidence.

Question

What does EDIT_DISTANCE measure?

Click or press Enter to reveal answer

Answer

The minimum number of insertions, deletions, substitutions, and transpositions needed to transform one string into another (Damerau-Levenshtein algorithm). A distance of 0 means identical strings. Lower distance = more similar.

Click to flip back

Question

Why is Jaro-Winkler better than Edit Distance for name matching?

Click or press Enter to reveal answer

Answer

Jaro-Winkler gives extra weight to strings that match from the beginning (prefix weighting). Since people's names usually have the correct first few characters (even with typos), this makes it more accurate for name matching. 'John' vs 'Jon' scores higher with Jaro-Winkler than with Edit Distance.

Click to flip back

Question

What prerequisite is needed to use fuzzy matching functions in SQL Server 2025?

Click or press Enter to reveal answer

Answer

Fuzzy matching functions are in preview and require: ALTER DATABASE SCOPED CONFIGURATION SET PREVIEW_FEATURES = ON; Without this, the functions will not be available.

Click to flip back

Knowledge Check

Ingrid at Nordic Shield Insurance needs to match incoming claim names against the policyholder database. Names often have minor typos but the first few characters are usually correct. Which function is most appropriate?

Next up: AI-Assisted SQL with GitHub Copilot β€” use AI tools to write, debug, and optimise T-SQL code.