Domain 4 — Module 1 of 2 50%
23 of 27 overall
Domain 4: Implement Text Analysis Solutions Free ⏱ ~14 min read

Text Analysis with Language Models

Extract entities, detect sentiment, summarise documents, translate text, and customise language models for domain-specific tasks — all using generative prompting and Foundry Tools.

Making sense of text

Simple explanation

Text analysis is like having a speed reader who can instantly tell you: what’s this about (topics), who’s mentioned (entities), what’s the mood (sentiment), and give you a one-paragraph summary — for any document, in any language.

In AI-103, you use two approaches: (1) prompt a language model to extract information (“Read this contract and extract all party names as JSON”), or (2) use Foundry Tools like Azure Translator for specialised tasks.

Text analysis capabilities

CapabilityApproachOutput
Entity extractionPrompt LLM: “Extract all person names, organisations, and dates”Structured JSON with entities and types
Topic extractionPrompt LLM: “What are the main topics discussed?”List of topics with relevance scores
SummarisationPrompt LLM: “Summarise this document in 3 sentences”Concise summary
Structured JSON outputPrompt LLM with schema: “Extract fields matching this schema”JSON matching specified schema
Sentiment detectionPrompt LLM: “Classify the sentiment as positive, negative, or neutral”Positive/negative/neutral + confidence
Tone detectionPrompt LLM: “What is the tone of this message?”Formal/informal/urgent/frustrated/etc.
Safety detectionContent Safety APIFlags for hate, violence, self-harm, sexual content
Sensitive contentPrompt LLM + custom rulesPII detection, confidential information flags

Translation approaches

Azure Translator vs LLM translation
FeatureAzure Translator (Foundry Tool)LLM-Powered Translation
How it worksDedicated translation enginePrompt an LLM to translate
Best forLarge-volume document translation, 100+ languagesNuanced translation with context awareness
CostLower per characterHigher (LLM tokens)
QualityExcellent for standard textBetter for idioms, context, tone preservation
SpeedVery fastSlower (model inference)
Custom terminologyCustom glossaries and dictionariesFew-shot examples in the prompt
Exam tip: When to use Translator vs LLM

Decision rule for the exam:

  • Bulk document translation → Azure Translator (cost-effective, fast)
  • Translation needing context and nuance → LLM (better quality for complex text)
  • Real-time chat translation → Depends on volume — low volume = LLM, high volume = Translator

If the scenario mentions cost or scale, lean toward Translator. If it mentions nuance or context, lean toward LLM.

Domain customisation

TechniqueWhat It DoesExample
System prompt with domain contextTell the model about industry terminology”You are a legal analyst. ‘Material adverse change’ means…”
Few-shot examplesShow the model expected input/output pairs3 examples of correctly extracted contract clauses
Output schemaDefine exact JSON structure for extracted data”Return JSON with fields: clause_type, parties, obligation, deadline”
Custom glossaryMap domain terms to standard definitions”EBITDA” → “Earnings Before Interest, Taxes, Depreciation, and Amortization”
Real-world example: Atlas Financial's compliance summariser

Atlas Financial customises text analysis for compliance:

Entity extraction: Custom prompt extracts regulatory-specific entities:

  • Regulation references (Basel III, Dodd-Frank, MiFID II)
  • Financial amounts and thresholds
  • Compliance deadlines
  • Responsible parties

Compliance summarisation: System prompt includes:

  • Financial regulatory terminology definitions
  • Output format: risk level, key obligations, deadlines, affected departments
  • Few-shot examples of correctly summarised regulations

Sensitive content detection: Custom rules flag:

  • Client SSNs and account numbers (PII)
  • Non-public financial data
  • Insider information indicators

Key terms

Question

What is structured JSON output from an LLM?

Click or press Enter to reveal answer

Answer

Prompting a language model to return its response in a specific JSON format with defined fields and types. Used to extract structured data from unstructured text for database storage or API consumption.

Click to flip back

Question

What is domain customisation for text analysis?

Click or press Enter to reveal answer

Answer

Tailoring language model outputs for industry-specific tasks using system prompts with domain context, few-shot examples with domain terminology, and custom output schemas. No fine-tuning required — it's all prompt engineering.

Click to flip back

Question

When should you use Azure Translator vs an LLM for translation?

Click or press Enter to reveal answer

Answer

Azure Translator: bulk document translation, 100+ languages, lower cost. LLM: nuanced translation needing context awareness, tone preservation, or domain-specific terminology. Scale/cost → Translator; nuance/context → LLM.

Click to flip back

Knowledge check

Knowledge Check

Kai needs to extract shipment details (tracking number, origin, destination, weight, delivery date) from 50,000 shipping confirmation emails and store them in a database. Which approach is most appropriate?

Knowledge Check

MediaForge needs to translate their client's 200-page product catalogue from English into 15 languages. Budget is tight. Which approach minimises cost?