AI Prompt Word Counter

Paste your prompt. See token count, context fit, and cost for every major AI model side by side. No signup, nothing sent to a server.

0

words

0

characters

0

no spaces

Fit and cost per model

ModelTokensContext usedInput cost
GPT-4o00.00%$0.0000
GPT-4.100.00%$0.0000
o300.00%$0.0000
o4-mini00.00%$0.0000
Claude Opus 4.600.00%$0.0000
Claude Sonnet 4.600.00%$0.0000
Claude Haiku 4.500.00%$0.0000
Gemini 2.5 Pro00.00%$0.0000
Gemini 2.5 Flash00.00%$0.0000
Grok 300.00%$0.0000

Token counts estimated at 0.75 words per token (English prose). Prices shown are October 2026 list rates per million input tokens. Output is billed separately at 3 to 5x input. Verify current provider pricing before relying on these numbers at scale.

Quick Answer

This tool counts your prompt in words and estimates tokens for every major model. It shows what percentage of each model's context window your prompt uses, whether it fits at all, and the input cost at current list pricing. Token estimates use the published 0.75 words-per-token ratio for English prose. For exact counts, use each provider's official tokenizer.

Why prompt length matters more than people think

Three things get harder as your prompt gets longer, and most people discover them the expensive way.

Cost scales with tokens, not with request count. A 100-token prompt and a 100,000-token prompt are both "one API call," but one costs a thousand times more. If you're sending the same lengthy system prompt plus a reference document to every user query, you're paying for the full context on every turn. This is what makes RAG cheaper than just pasting your whole knowledge base.

Quality degrades with length. Every major model performs measurably worse on multi-step reasoning as prompts get longer, even well within the stated context window. Research from Anthropic, Google DeepMind, and independent academic teams shows consistent drop-off. The sweet spot for most complex tasks is under 20,000 tokens, even on models that accept 200,000 or 2,000,000.

Latency scales with input. Large prompts take longer to process before the first output token arrives. On a 200,000-token prompt to Claude, time-to-first-token can be 20 to 40 seconds. Your users notice.

How to read the fit column

The percentage tells you how much of each model's context window your prompt eats up. Colors mean:

  • Green (under 80%): Fits comfortably with room for a meaningful response. Safe to send.
  • Amber (80% to 99%): Fits, but the model has limited space to think or respond. Output quality may suffer. Trim if you can.
  • Red (OVER): Won't fit at all. The API will reject it or the web app will truncate.

Rule of thumb: aim for under 50% utilization if you care about response quality. The model's best work happens when it has plenty of room, not when it's squeezed against the ceiling.

Why the cost estimate is input-only

The tool shows input cost because that's the portion you control by adjusting your prompt. Output cost is a function of what the model generates, which is usually 3 to 5x more expensive per token than input and harder to predict before you send.

For a typical use case where output is roughly 10% of input length, double the shown cost for a rough total. For tasks that generate long responses (translation, summarization into full documents, code generation), triple it. For classification or extraction tasks where output is a few words, the shown number is basically the full cost.

Accuracy of the token estimate

This tool uses a word-to-token ratio of roughly 0.75 for English prose, matching OpenAI's published guidance. Actual tokenization varies by model and content type. Expect:

  • Within 5 to 10% for English prose. The estimate is close enough for planning.
  • Under-counts for code. Code tokenizes 30 to 60% denser than prose because variable names split aggressively and punctuation becomes its own tokens. A 1,000-word Python file might be 1,600+ tokens instead of the estimated 1,330.
  • Under-counts for non-English languages. Chinese, Japanese, Arabic, and most non-Latin scripts tokenize at 1.5x to 3x the rate of English. Gemini's tokenizer handles non-English better than OpenAI's and Anthropic's, but the gap is still significant.
  • Under-counts for numeric data. Long numbers, timestamps, and JSON often split digit-by-digit or key-by-key. A table of 10,000 numeric values can tokenize at nearly 1 token per digit.

For exact counts before you spend real money, use each provider's official tokenizer. OpenAI has a free tokenizer playground at platform.openai.com/tokenizer. Anthropic has a count_tokens API endpoint that runs for free. Google AI Studio includes a live token counter in its interface.

When to use which model for prompt length

Given a long prompt, the cheapest model that gives you acceptable quality wins. A practical decision tree:

  • Under 10,000 tokens, simple task: Gemini Flash-Lite or Claude Haiku 4.5. Pennies per call, sub-second response.
  • Under 50,000 tokens, needs reasoning: Claude Sonnet 4.6 or GPT-4o. Good quality at reasonable cost.
  • 100,000 to 200,000 tokens, complex task: Claude Opus 4.6 or GPT-4.1. Higher cost, but quality holds up on long contexts.
  • 200,000 to 1,000,000 tokens, any task: Gemini 2.5 Pro or GPT-4.1. Only two models handle this range.
  • Over 1,000,000 tokens: Gemini 2.5 Pro is the only option (and quality on multi-step reasoning starts degrading past 500k regardless of the stated ceiling).

For repeated queries against the same long document, context caching on Anthropic or Google drops the cost of reused prefixes to 10 to 25% of normal input rates. This is the single biggest lever for running long-context apps affordably.

Privacy — what happens to your prompt

Nothing leaves your browser. This tool counts words and estimates tokens locally with JavaScript. Open DevTools, Network tab, paste a 10,000-word document, and watch: no outbound requests containing your text. The calculation is arithmetic on the word count, nothing more.

We don't log the text. We don't train anything on it. Standard site analytics collect page views and referrer data, same as any site. Your prompt contents stay in your browser.

Frequently Asked Questions

How accurate are the token counts?

Within 5 to 10 percent for English prose. Code and non-English text tokenize denser, so the estimate under-counts those. For exact counts, use the provider's official tokenizer.

Why do some models cost so much more than others?

Reasoning models (o3, Claude Opus) cost more because they're optimized for complex multi-step work. Fast models (Gemini Flash-Lite, Claude Haiku) cost less and work well for simpler tasks. Match the model to the task complexity to control costs.

Is output cost included in the estimate?

No. The shown cost is input only. Output is billed separately at 3 to 5x the input rate. For rough total cost, double or triple the shown number depending on how much output you expect.

Does the tool work offline?

Yes once the page loads. All counting happens in your browser with no server calls. If you lose internet mid-session, the tool keeps working.

Is my prompt sent anywhere?

No. Nothing leaves your browser. The tool runs entirely client-side. Standard site analytics capture page views but not textarea contents.

Why do prices differ from what I see on the provider's site?

Prices shown are October 2026 list rates. Provider pricing updates regularly. Enterprise tiers, volume discounts, and regional pricing can all reduce costs below list. Always verify current pricing before making large commitments.

How do I reduce my prompt size?

Remove examples you don't need, trim system prompts, chunk long documents and send only relevant sections, use retrieval-augmented generation for large knowledge bases, and enable prompt caching for reused prefixes.

Related Tools and Guides