Domain 1 β€” Module 4 of 11 36%
4 of 26 overall
Domain 1: AI Concepts and Capabilities Free ⏱ ~12 min read

Choosing the Right AI Model

Not all AI models are the same. Some are great at text, others at images, others at code. This module teaches you how to pick the right model for the job β€” a key exam skill.

Why model choice matters

Simple explanation

Picking an AI model is like choosing the right tool from a toolbox.

You wouldn’t use a hammer to cut wood, and you wouldn’t use a saw to drive a nail. AI models work the same way β€” each one is designed for specific tasks. A text model excels at writing, a vision model excels at understanding images, and a speech model excels at converting voice to text.

The exam tests your ability to match the right model to the right scenario.

Model categories

CategoryWhat They DoExamplesBest For
Large Language Models (LLMs)Generate and understand textGPT-4o, GPT-4, Phi-4Chat, summarisation, translation, code
Small Language Models (SLMs)Text tasks with lower cost/latencyPhi-4-mini, Phi-3-smallSimple tasks, edge devices, cost-sensitive apps
Image generation modelsCreate images from text descriptionsGPT-image-1.5Marketing visuals, concept art, design
Vision modelsAnalyse and understand imagesGPT-4o (vision), FlorenceImage classification, object detection, OCR
Speech modelsConvert speech ↔ textAzure Speech ServiceTranscription, voice assistants, TTS
Embedding modelsConvert text to numerical vectorstext-embedding-ada-002Search, similarity, RAG retrieval

How to choose: the decision framework

When the exam gives you a scenario, use this framework:

Step 1: What type of input/output do you need?

  • Text in, text out β†’ LLM
  • Text in, image out β†’ Image generation
  • Image in, text out β†’ Vision model
  • Audio in, text out β†’ Speech model
  • Multiple types β†’ Multimodal model (GPT-4o)

Step 2: What’s the complexity?

  • Simple task (classify, extract) β†’ Smaller, cheaper model
  • Complex task (reason, create) β†’ Larger, more capable model

Step 3: What are your constraints?

  • Low budget β†’ SLM (Phi-4-mini)
  • Low latency β†’ SLM or smaller LLM
  • Highest quality β†’ GPT-4o or GPT-4
  • Privacy-sensitive β†’ On-device or edge model
Model selection guide β€” matching scenarios to models
FeatureWhen to UseModel Choice
Chat assistant for customersNeed natural conversation, reasoningGPT-4o or GPT-4
Summarise meeting notesText in, text out, moderate complexityGPT-4o-mini or Phi-4
Generate product imagesText description β†’ new imageGPT-image-1.5
Classify support ticketsSimple text classificationPhi-4-mini (cost-efficient)
Transcribe phone callsAudio β†’ textAzure Speech Service
Analyse medical X-raysImage understanding + reasoningGPT-4o (multimodal)
Search company documentsNeed to find relevant passagesEmbedding model + RAG

Large vs small models

Large language models vs small language models
FeatureLarge Models (GPT-4o)Small Models (Phi-4-mini)
ParametersHundreds of billionsBillions (10x-100x smaller)
CapabilityBroad, complex reasoningFocused, specific tasks
CostHigher per-token pricingSignificantly cheaper
LatencySlower (more computation)Faster responses
Best forComplex tasks, multimodal, creativeClassification, extraction, simple chat
Can run on edge?No β€” cloud onlyYes β€” can run on devices
Microsoft's Phi family β€” small but mighty

Microsoft developed the Phi family of small language models specifically for scenarios where cost, latency, or deployment location matters more than maximum capability.

  • Phi-4 β€” latest, most capable small model
  • Phi-4-mini β€” even smaller, great for classification and extraction
  • Phi-3 β€” previous generation, still widely deployed

The key insight: for many business tasks (email classification, FAQ answers, data extraction), a small model performs nearly as well as GPT-4o at a fraction of the cost.

Exam relevance: When a scenario mentions β€œcost-effective” or β€œedge deployment” or β€œlow latency” β†’ think Phi or other SLMs.

The Foundry model catalog

Microsoft Foundry includes a model catalog β€” a library of models from multiple providers that you can deploy directly:

ProviderModelsStrengths
OpenAIGPT-4o, GPT-4, GPT-image-1.5Best general-purpose, multimodal
MicrosoftPhi-4, Phi-4-miniCost-efficient, edge-friendly
MetaLlama 3Open-source, customisable
MistralMistral Large, Mistral SmallEuropean alternative, efficient
CohereCommand RStrong at RAG and retrieval

Key exam concept: You don’t need to memorise every model. You need to understand the categories (LLM, SLM, vision, speech, embedding) and know how to choose based on task requirements.

🎬 Video walkthrough

Flashcards

Question

What is the difference between a Large Language Model (LLM) and a Small Language Model (SLM)?

Click or press Enter to reveal answer

Answer

LLMs have hundreds of billions of parameters and excel at complex reasoning and multimodal tasks but cost more and are slower. SLMs have billions of parameters (10-100x smaller), are cheaper and faster, and work well for focused tasks like classification and extraction.

Click to flip back

Question

When should you choose a small model (like Phi-4-mini) over GPT-4o?

Click or press Enter to reveal answer

Answer

When the task is simple (classification, extraction, FAQ), when cost is a concern, when low latency is required, or when you need to run the model on edge devices.

Click to flip back

Question

What is an embedding model used for?

Click or press Enter to reveal answer

Answer

Converting text into numerical vectors (lists of numbers) that capture semantic meaning. Used for search, document similarity, and RAG retrieval β€” finding relevant documents to feed to an LLM.

Click to flip back

Question

What model would you use to generate images from text descriptions?

Click or press Enter to reveal answer

Answer

GPT-image-1.5 β€” an image generation model available in Microsoft Foundry. You provide a text prompt, and it creates a new image matching that description.

Click to flip back

Knowledge Check

Knowledge Check

GreenLeaf wants to automatically classify incoming support emails into categories: billing, technical, general inquiry. They process 50,000 emails per day and need to keep costs low. Which model approach is most appropriate?

Knowledge Check

MediSpark needs an AI model that can accept both a medical image (X-ray) and a text question ('What abnormalities are visible?') and return a text response. Which type of model do they need?

Knowledge Check

Priya needs to build a search feature that finds the most relevant company documents when a user types a question. Which combination of models should she use?


Next up: Deploying AI Models β€” configuration parameters like temperature, top-p, and max tokens that control how your model behaves.