Prompt Tester

A/B compare two prompts with CRAFTS scoring β€” see which is better

85 prompts compared β˜• Support
0 chars
0 chars
Ctrl+Enter

Comparison History

Stored locally in this browser only. Nothing leaves your device.

No comparisons yet. Compare two prompts to see them here.

The CRAFTS Scoring Framework

Every AI prompt is scored across six pillars. Here's what the Prompt Tester looks for and how to score higher.

CContext15 pts

Background information, domain keywords, specific details that help the AI understand your situation.

Low (0–5): No background. "Write about AI."
Mid (6–10): Some detail. "Write about AI for my company blog."
High (11–15): Rich context. "I'm an IT admin at a 500-person company migrating to Azure. Write about AI governance for our internal newsletter."
RRole20 pts

A persona or expertise level for the AI to adopt β€” guides the depth and style of the response.

Low (0): No role assigned.
Mid (15): "Act as a consultant."
High (20): "You are a senior cloud architect with 10 years of Azure experience."
AAction25 pts

Clear task verbs, specific instructions, multi-step requests. The core of what you want done.

Low (0–8): Vague. "Help me with security."
Mid (9–18): Clear verb. "Create a security checklist for Azure."
High (19–25): Detailed. "Create a 10-item security checklist for Azure VMs, covering network, identity, and encryption."
FFormat15 pts

Explicit output structure β€” table, bullet points, JSON, step-by-step, code block, etc.

Low (0): No format specified.
Mid (6–12): Implicit format. "List the benefits."
High (13–15): Explicit. "Format as a markdown table with columns: Risk, Impact, Mitigation."
TTone10 pts

Voice, formality, audience awareness β€” professional, casual, beginner-friendly, executive, etc.

Low (0): No tone guidance.
Mid (5–8): Audience specified. "For a technical audience."
High (9–10): Explicit tone. "Use a professional but approachable tone suitable for IT managers."
SScope15 pts

Boundaries, constraints, exclusions, word counts, focus areas β€” tells the AI what NOT to do.

Low (0): No boundaries set.
Mid (4–8): Some limits. "Keep it under 500 words."
High (9–15): Multiple constraints. "Focus on Azure only, exclude AWS. Maximum 300 words."

🎯 Pro tip: You don't need a perfect 100 for every prompt. Aim for 60+ on important prompts. The biggest gains come from adding Role (often forgotten) and Scope (most underused).

Want to practise? Try our Prompt Engineering Guide or score a single prompt with the Prompt Polisher.

Frequently Asked Questions

How does the Prompt Tester compare two prompts?

Both prompts are scored independently using the CRAFTS framework β€” Context, Role, Action, Format, Tone, and Scope. Each pillar is scored out of its maximum, and the totals (0–100) are compared. You’ll see which prompt is more complete and exactly which elements make the difference.

What does 'more CRAFTS-complete' mean?

It means one prompt includes more of the six elements that help AI give better responses. A higher CRAFTS score doesn’t guarantee a better AI output β€” but it does mean you’ve given the AI more to work with, which consistently leads to better results.

Can I compare prompts from different AI platforms?

Absolutely. The CRAFTS framework is platform-agnostic β€” it measures prompt structure, not platform-specific features. A well-structured prompt works better on ChatGPT, Claude, Copilot, Gemini, and any other AI.

What does the diff view show?

The diff view highlights word-level differences between your two prompts β€” green for words only in B, red for words only in A. It’s most useful when comparing a revised version of the same prompt. If the prompts are very different, the diff is hidden automatically.

Is my data private?

Yes. Everything runs 100% in your browser β€” your prompts are never sent to any server or API. History is stored in your browser’s localStorage only, and you can clear it at any time.

How does the 'Improve' feature work?

Click ‘Improve Weaker Prompt’ to auto-generate a rewritten version of the lower-scoring prompt. The rewrite adds missing CRAFTS elements β€” a role, context, format instructions, tone, and scope β€” based on what’s missing. It uses the same engine as our Prompt Polisher tool.

What if the scores are very close?

If the difference is less than 5 points, the tool shows ‘Too close to call’ instead of declaring a winner. Both prompts are similarly complete, and the difference is within scoring margin.

Can I use this for Copilot Studio agent instructions?

Yes! Paste two versions of your agent instructions to see which is more structured. For dedicated agent instruction building, also check out our Agent Instruction Builder tool.

πŸ’¬