Choosing the Right AI Model

Why model choice matters

Simple explanation

Picking an AI model is like choosing the right tool from a toolbox.

You wouldn’t use a hammer to cut wood, and you wouldn’t use a saw to drive a nail. AI models work the same way — each one is designed for specific tasks. A text model excels at writing, a vision model excels at understanding images, and a speech model excels at converting voice to text.

The exam tests your ability to match the right model to the right scenario.

Model categories

Category	What They Do	Examples	Best For
Large Language Models (LLMs)	Generate and understand text	GPT-4o, GPT-4, Phi-4	Chat, summarisation, translation, code
Small Language Models (SLMs)	Text tasks with lower cost/latency	Phi-4-mini, Phi-3-small	Simple tasks, edge devices, cost-sensitive apps
Image generation models	Create images from text descriptions	GPT-image-1.5	Marketing visuals, concept art, design
Vision models	Analyse and understand images	GPT-4o (vision), Florence	Image classification, object detection, OCR
Speech models	Convert speech ↔ text	Azure Speech Service	Transcription, voice assistants, TTS
Embedding models	Convert text to numerical vectors	text-embedding-ada-002	Search, similarity, RAG retrieval

How to choose: the decision framework

When the exam gives you a scenario, use this framework:

Step 1: What type of input/output do you need?

Text in, text out → LLM
Text in, image out → Image generation
Image in, text out → Vision model
Audio in, text out → Speech model
Multiple types → Multimodal model (GPT-4o)

Step 2: What’s the complexity?

Simple task (classify, extract) → Smaller, cheaper model
Complex task (reason, create) → Larger, more capable model

Step 3: What are your constraints?

Low budget → SLM (Phi-4-mini)
Low latency → SLM or smaller LLM
Highest quality → GPT-4o or GPT-4
Privacy-sensitive → On-device or edge model

Model selection guide — matching scenarios to models
Feature	When to Use	Model Choice
Chat assistant for customers	Need natural conversation, reasoning	GPT-4o or GPT-4
Summarise meeting notes	Text in, text out, moderate complexity	GPT-4o-mini or Phi-4
Generate product images	Text description → new image	GPT-image-1.5
Classify support tickets	Simple text classification	Phi-4-mini (cost-efficient)
Transcribe phone calls	Audio → text	Azure Speech Service
Analyse medical X-rays	Image understanding + reasoning	GPT-4o (multimodal)
Search company documents	Need to find relevant passages	Embedding model + RAG

Large vs small models

Large language models vs small language models
Feature	Large Models (GPT-4o)	Small Models (Phi-4-mini)
Parameters	Hundreds of billions	Billions (10x-100x smaller)
Capability	Broad, complex reasoning	Focused, specific tasks
Cost	Higher per-token pricing	Significantly cheaper
Latency	Slower (more computation)	Faster responses
Best for	Complex tasks, multimodal, creative	Classification, extraction, simple chat
Can run on edge?	No — cloud only	Yes — can run on devices

Microsoft's Phi family — small but mighty

Microsoft developed the Phi family of small language models specifically for scenarios where cost, latency, or deployment location matters more than maximum capability.

Phi-4 — latest, most capable small model
Phi-4-mini — even smaller, great for classification and extraction
Phi-3 — previous generation, still widely deployed

The key insight: for many business tasks (email classification, FAQ answers, data extraction), a small model performs nearly as well as GPT-4o at a fraction of the cost.

Exam relevance: When a scenario mentions “cost-effective” or “edge deployment” or “low latency” → think Phi or other SLMs.

The Foundry model catalog

Microsoft Foundry includes a model catalog — a library of models from multiple providers that you can deploy directly:

Provider	Models	Strengths
OpenAI	GPT-4o, GPT-4, GPT-image-1.5	Best general-purpose, multimodal
Microsoft	Phi-4, Phi-4-mini	Cost-efficient, edge-friendly
Meta	Llama 3	Open-source, customisable
Mistral	Mistral Large, Mistral Small	European alternative, efficient
Cohere	Command R	Strong at RAG and retrieval

Key exam concept: You don’t need to memorise every model. You need to understand the categories (LLM, SLM, vision, speech, embedding) and know how to choose based on task requirements.

🎬 Video walkthrough

Flashcards

Question

What is the difference between a Large Language Model (LLM) and a Small Language Model (SLM)?

Click or press Enter to reveal answer

Answer

LLMs have hundreds of billions of parameters and excel at complex reasoning and multimodal tasks but cost more and are slower. SLMs have billions of parameters (10-100x smaller), are cheaper and faster, and work well for focused tasks like classification and extraction.

Click to flip back

Question

When should you choose a small model (like Phi-4-mini) over GPT-4o?

Click or press Enter to reveal answer

Answer

When the task is simple (classification, extraction, FAQ), when cost is a concern, when low latency is required, or when you need to run the model on edge devices.

Click to flip back

Question

What is an embedding model used for?

Click or press Enter to reveal answer

Answer

Converting text into numerical vectors (lists of numbers) that capture semantic meaning. Used for search, document similarity, and RAG retrieval — finding relevant documents to feed to an LLM.

Click to flip back

Question

What model would you use to generate images from text descriptions?

Click or press Enter to reveal answer

Answer

GPT-image-1.5 — an image generation model available in Microsoft Foundry. You provide a text prompt, and it creates a new image matching that description.

Click to flip back

Knowledge Check

GreenLeaf wants to automatically classify incoming support emails into categories: billing, technical, general inquiry. They process 50,000 emails per day and need to keep costs low. Which model approach is most appropriate?

Knowledge Check

MediSpark needs an AI model that can accept both a medical image (X-ray) and a text question ('What abnormalities are visible?') and return a text response. Which type of model do they need?

Knowledge Check

Priya needs to build a search feature that finds the most relevant company documents when a user types a question. Which combination of models should she use?

Next up: Deploying AI Models — configuration parameters like temperature, top-p, and max tokens that control how your model behaves.