Question 1

Should I always start with the simplest pattern?

Accepted Answer

Yes. Default to direct LLM call → if grounding needed, add RAG → if actions needed, add tools → if multi-step needed, add a planner. Each step up the ladder ADDS latency (more LLM round-trips), cost (more tokens), and failure modes (more places things go wrong). Most production agents land on RAG + 1-2 tools. Multi-step planner-executor and multi-agent debate are powerful but expensive — make sure simpler patterns genuinely fall short before reaching for them.

Question 2

What's the difference between RAG and fine-tuning?

Accepted Answer

RAG (Retrieval-Augmented Generation) — at runtime, fetch relevant documents from a vector store + include in the prompt. The model is unchanged. Pros: cheap, easy to update (just add docs), citable. Fine-tuning — train custom weights into the model. Pros: bakes in tone/style, no retrieval latency. Cons: expensive, hard to update, risks overfitting. RAG is the right default for 95% of knowledge-task agents. Fine-tuning is for narrow style/tone needs after RAG quality is genuinely insufficient.

Question 3

What's a 'planner-executor' agent?

Accepted Answer

An agent that does multiple LLM calls in sequence: first a 'planning' call to break down the user's request into steps, then 'executor' calls (often using tools) to perform each step, sometimes a 'reflection' call to check the result and re-plan if needed. ReAct (Reason + Act) is one popular implementation. Useful for genuinely multi-step tasks (e.g., 'analyse Q3 sales, draft a recap, post to Teams'). Cost: 3-10x a single LLM call. Watch costs and add max-step limits.

Question 4

When do I need multi-agent debate or LLM-as-judge?

Accepted Answer

Both are quality-amplification patterns for high-stakes outputs. Multi-agent debate — multiple agents propose answers, debate or vote, surface the consensus. Useful when reasoning quality matters more than cost. LLM-as-judge — a separate LLM evaluates the output against criteria you've defined (correctness, tone, safety). Useful for quality monitoring at scale. Both 5-20x the cost of a single call. Good for: legal drafts, medical advice, financial analysis, regulated outputs. Overkill for: chatbots, simple Q&A, internal tools.

Free AI Agent Patterns — Pick the Right Architecture

Frequently Asked Questions