Prompt Tester
A/B compare two prompts with CRAFTS scoring β see which is better
πΈοΈ CRAFTS Profile
Comparison History
Stored locally in this browser only. Nothing leaves your device.
No comparisons yet. Compare two prompts to see them here.
The CRAFTS Scoring Framework
Every AI prompt is scored across six pillars. Here's what the Prompt Tester looks for and how to score higher.
Background information, domain keywords, specific details that help the AI understand your situation.
A persona or expertise level for the AI to adopt β guides the depth and style of the response.
Clear task verbs, specific instructions, multi-step requests. The core of what you want done.
Explicit output structure β table, bullet points, JSON, step-by-step, code block, etc.
Voice, formality, audience awareness β professional, casual, beginner-friendly, executive, etc.
Boundaries, constraints, exclusions, word counts, focus areas β tells the AI what NOT to do.
π― Pro tip: You don't need a perfect 100 for every prompt. Aim for 60+ on important prompts. The biggest gains come from adding Role (often forgotten) and Scope (most underused).
Want to practise? Try our Prompt Engineering Guide or score a single prompt with the Prompt Polisher.
Frequently Asked Questions
How does the Prompt Tester compare two prompts?
Both prompts are scored independently using the CRAFTS framework β Context, Role, Action, Format, Tone, and Scope. Each pillar is scored out of its maximum, and the totals (0β100) are compared. You’ll see which prompt is more complete and exactly which elements make the difference.
What does 'more CRAFTS-complete' mean?
It means one prompt includes more of the six elements that help AI give better responses. A higher CRAFTS score doesn’t guarantee a better AI output β but it does mean you’ve given the AI more to work with, which consistently leads to better results.
Can I compare prompts from different AI platforms?
Absolutely. The CRAFTS framework is platform-agnostic β it measures prompt structure, not platform-specific features. A well-structured prompt works better on ChatGPT, Claude, Copilot, Gemini, and any other AI.
What does the diff view show?
The diff view highlights word-level differences between your two prompts β green for words only in B, red for words only in A. It’s most useful when comparing a revised version of the same prompt. If the prompts are very different, the diff is hidden automatically.
Is my data private?
Yes. Everything runs 100% in your browser β your prompts are never sent to any server or API. History is stored in your browser’s localStorage only, and you can clear it at any time.
How does the 'Improve' feature work?
Click ‘Improve Weaker Prompt’ to auto-generate a rewritten version of the lower-scoring prompt. The rewrite adds missing CRAFTS elements β a role, context, format instructions, tone, and scope β based on what’s missing. It uses the same engine as our Prompt Polisher tool.
What if the scores are very close?
If the difference is less than 5 points, the tool shows ‘Too close to call’ instead of declaring a winner. Both prompts are similarly complete, and the difference is within scoring margin.
Can I use this for Copilot Studio agent instructions?
Yes! Paste two versions of your agent instructions to see which is more structured. For dedicated agent instruction building, also check out our Agent Instruction Builder tool.
Your Prompt Engineering Suite
84 ready-to-use prompts
β¨PolisherScore and rewrite
π¬TesterA/B compare
πGuide8 fundamentals
π§ͺAdvanced Lab12 expert techniques
π¬ Got feedback? Share it here β