Claude Word Limit by Model (2026)

How much text Claude Opus 4.6, Sonnet 4.6 and Haiku 4.5 can actually handle, where the real ceiling sits, and what it costs to fill the window.

Quick Answer

All current Claude 4 models (Opus 4.6, Sonnet 4.6, Haiku 4.5) accept 200,000 tokens of input, roughly 150,000 English words or about 600 single-spaced pages. Output is capped separately at around 64,000 tokens (~48,000 words). The Claude.ai web app enforces tighter per-message limits than the API, and attached files get processed through a retrieval layer rather than stuffed into context directly.

Word limits by Claude model

ModelInput tokensInput wordsMax output words
Claude Opus 4.6200,000~150,000~48,000
Claude Sonnet 4.6200,000~150,000~48,000
Claude Haiku 4.5200,000~150,000~48,000
Claude 3.5 Sonnet (legacy)200,000~150,000~6,000
Claude 3 Opus (legacy)200,000~150,000~3,000

Token figures from Anthropic's published model documentation. Word conversions at ~0.75 words per token for English prose.

The three Claude models, practically speaking

They share a context window. They do not share a price tag or a speed profile, and the difference matters once you start moving real volume through them.

  • Opus 4.6. The flagship. Best reasoning, best at long complex tasks, highest price. Use it when output quality matters more than the bill — legal analysis, research synthesis, complex code. Input is about $15 per million tokens.
  • Sonnet 4.6. The workhorse. Close to Opus for most tasks, about a fifth the price at $3 per million input tokens. Most production apps standardize here.
  • Haiku 4.5. The fast one. Sub-second response times, cheapest at $0.80 per million input tokens. Use it for classification, extraction, routing, and anywhere latency beats nuance.

All three see the same 200,000 tokens. The question isn't "can this fit?" but "is this worth Opus-grade reasoning, or will Haiku handle it for a twentieth of the cost?"

What 150,000 words actually looks like

Abstract numbers don't help when you're trying to decide if your document fits. Concrete examples of what Claude's 200k context window holds:

  • The full text of Pride and Prejudice (~122,000 words) with 28,000 words of headroom for your question and the response
  • A PhD dissertation (typically 70,000 to 100,000 words) — fits with plenty of room for analysis output
  • A corporate 10-K filing (60,000 to 100,000 words for most large companies) with room for summarization
  • About 600 single-spaced pages, or 1,200 double-spaced pages
  • Roughly 10 hours of podcast transcript at 150 wpm speaking rate
  • Every email you sent last quarter (if you're normal) — about 150,000 words for a heavy email user

Where it stops fitting: full-length epic fantasy novels (The Way of Kings is 387,000 words), complete codebases of medium apps, multi-year meeting transcripts, or combined collections of 3+ books. For those, you need chunking with retrieval, or Gemini 2.5 Pro's 2M window as an alternative.

Claude.ai web app vs. API — they have different limits

The 200k-token figure is the raw model limit, available through the API. If you use Claude.ai or the Claude mobile apps, you run into several additional caps the consumer interface layers on top:

  • Per-message limits on pasted text. The web app has historically limited single messages before they get rejected or truncated. This expands periodically but stays tighter than the API.
  • Attached file handling. When you upload a PDF or DOCX, the web app extracts text server-side and often routes it through a retrieval layer, meaning only relevant chunks get sent to the model — not the whole document. This is why asking about "page 47 of this 300-page PDF" sometimes misses.
  • Conversation context compression. Long chat sessions get silently summarized to keep fitting in the window. The model sees a compressed version of earlier messages, not the verbatim history.
  • Rate limits on the Free and Pro tiers cap total tokens per hour or per day, independent of the context window size.

If you need the full 200k for a single request, use the API directly (or a tool that wraps it cleanly, like the Anthropic Workbench). Claude.ai is optimized for conversational use, not bulk document stuffing.

Output cap — the one that surprises people

Input gets all the headlines. Output is where people actually hit walls.

Claude 4 models cap output at around 64,000 tokens per response (~48,000 words). That's generous, but it's still a cap. Ask Claude to "translate this whole novel" or "rewrite all 50 of these blog posts in one response" and the output stops when it hits the ceiling, usually mid-sentence.

Three patterns work when you need more:

  • Chunk the task. Ask for section one, get it, ask for section two. Slower but reliable.
  • "Continue" prompts. When output cuts mid-thought, reply "continue" and Claude usually picks up from where it stopped. Surprisingly robust.
  • Raise max_tokens on the API. SDK defaults are usually lower than the model's true ceiling. Set it explicitly.

Pricing — what it costs to fill the window

Claude is billed per token. At October 2026 list prices (verify with Anthropic before relying on these):

ModelInput / 1M tokensOutput / 1M tokensFull 200k input cost
Opus 4.6$15.00$75.00$3.00
Sonnet 4.6$3.00$15.00$0.60
Haiku 4.5$0.80$4.00$0.16

A team pushing 1,000 full-context Opus requests a day spends $3,000 on input tokens alone, before output. This math is why retrieval-augmented generation became the default pattern at scale: send 5,000 tokens of relevant chunks instead of 200,000 tokens of everything.

Prompt caching — the workaround most people miss

Anthropic supports prompt caching: if you send the same long prefix (like a 100,000-word manual) in multiple requests within a few minutes, the cached read is charged at about a tenth of the normal input rate. A 200,000-token document that would cost $3.00 per call at Opus rates drops to roughly $0.30 per call after the first.

If you're building a product that lets users ask repeated questions against the same long document, caching changes the unit economics entirely. Most teams discover it months in and wish they'd known at design time.

Does Claude actually use the full context?

Big window and effective window aren't the same thing. Research from Anthropic and independent teams on "needle in a haystack" tests shows Claude is stronger than most models at retrieving specific facts from deep context, but performance still degrades compared to short prompts. Information at the beginning and end of the context is weighted more heavily than information in the middle.

Practical takeaway: put the most important instructions and reference material at the start or end of your prompt, not buried at the 40% mark. For long documents, extract the relevant passages first if you can. Claude won't refuse a 180,000-word prompt. It will just answer better with a focused 20,000-word one.

Check if your prompt fits before sending

Paste your prompt into our AI word counter to see token count and cost for every model

AI Prompt Word CounterTokens to Words

Frequently Asked Questions

What is Claude's word limit?

All current Claude 4 models (Opus 4.6, Sonnet 4.6, Haiku 4.5) accept 200,000 tokens of input, or roughly 150,000 English words (about 600 single-spaced pages).

Does Claude have a character limit?

Anthropic measures in tokens, not characters. 200,000 tokens is roughly 800,000 characters of English text. Non-English scripts and code use more tokens per character.

Can Claude read an entire book?

Most books fit. Pride and Prejudice (~122k words), Gatsby (~50k words), and standard novels under 150k words all fit in one prompt with room for analysis. Very long epic fantasy novels and multi-book series do not fit and require chunking.

Is Claude.ai's limit the same as the API?

No. The API gives you the full 200k window. Claude.ai adds consumer-facing caps on per-message size, and it routes attached files through a retrieval layer rather than stuffing them directly into context.

What's Claude's output limit?

Claude 4 models cap output at around 64,000 tokens (~48,000 words) per response. For longer outputs, chunk the task or use "continue" prompts.

Does Claude Pro raise the word limit?

No. Pro subscription raises rate limits (more messages per hour) and unlocks higher-tier models, but the raw 200k context window is the same across tiers.

What's the cheapest way to use Claude's full context?

Use Haiku 4.5 ($0.80 per million input tokens, so about $0.16 to fill the full 200k window) for simpler tasks, or enable prompt caching with Sonnet or Opus when you'll reuse the same long prefix across multiple requests.

Related Tools and Guides