ChatGPT Word Limit by Model (2026)
Every current ChatGPT model has a different ceiling. Here's how many words fit in each, what happens when you go over, and why the number you see in the UI isn't always what you get.
Quick Answer
GPT-4o accepts inputs up to 128,000 tokens, roughly 96,000 English words. GPT-4.1 accepts up to 1,000,000 tokens (~750,000 words). The o3 and o4-mini reasoning models accept 200,000 tokens (~150,000 words). Output is capped separately, usually at 16,384 tokens (~12,000 words) for GPT-4o and 100,000 tokens for GPT-4.1. The ChatGPT web app enforces smaller per-message limits than the raw API.
Word limits by ChatGPT model
| Model | Input tokens | Input words | Max output words |
|---|---|---|---|
| GPT-4o | 128,000 | ~96,000 | ~12,000 |
| GPT-4o-mini | 128,000 | ~96,000 | ~12,000 |
| GPT-4.1 | 1,000,000 | ~750,000 | ~75,000 |
| GPT-4.1-mini | 1,000,000 | ~750,000 | ~24,000 |
| o3 | 200,000 | ~150,000 | ~75,000 |
| o4-mini | 200,000 | ~150,000 | ~75,000 |
| GPT-3.5 Turbo (legacy) | 16,385 | ~12,300 | ~3,000 |
Token figures are from OpenAI's official model documentation. Word conversions use the 0.75 words-per-token ratio OpenAI publishes for English prose.
Why the ChatGPT website and the API give you different limits
The word counts above are API limits. If you use ChatGPT through chat.openai.com, the interface enforces its own caps that are usually much tighter. A few reasons:
- Per-message input limits. The ChatGPT web app historically capped single messages at around 4,000 to 8,000 words even on Plus, because longer inputs hit server timeouts. This expanded in 2024 and again in 2025, but the web interface still rejects very long single pastes.
- Total conversation context. The website runs a silent summarization when the chat gets long, condensing earlier messages so the model keeps fitting in context. This means the model technically still "sees" the whole conversation, but it sees a compressed version of the early parts.
- Attachment handling. When you attach a PDF or Word file, the web app extracts the text server-side and inserts it as context. Large files get chunked and only relevant sections may be sent to the model, which is why asking about "page 47" of a 200-page PDF sometimes returns wrong answers.
If you need the full context window, use the API or a playground that talks directly to the API. ChatGPT.com is the consumer-friendly version with extra safety rails.
What happens when you exceed the limit
On the API, you get an error. Specifically, OpenAI returns a 400 with a message like This model's maximum context length is 128000 tokens. However, your messages resulted in 142000 tokens. Please reduce the length. Your request is rejected, not truncated.
On ChatGPT.com, behavior varies. Sometimes the UI refuses to send the message. Sometimes it quietly truncates your input and only sends the first portion. Sometimes it splits the message into chunks and sends them as separate turns, which can confuse the model. The UI has gotten better at flagging this, but the failure modes are inconsistent across browsers and sessions.
If you're working with long documents, the reliable workflow is: count tokens yourself before sending, leave 10% headroom for the model's response, and chunk if you have to. Our tokens to words converter does the math instantly.
ChatGPT character limit vs. word limit
OpenAI measures in tokens, not characters or words. A character limit for ChatGPT isn't a number they publish, because two strings of the same character length can tokenize very differently. But if you want a rough rule of thumb for planning:
- 1 token ≈ 4 characters of English text on average
- 128,000 tokens ≈ 512,000 characters for GPT-4o input
- 1,000,000 tokens ≈ 4,000,000 characters for GPT-4.1 input
Watch out for non-English text and code. Chinese, Japanese, Arabic and most code languages are significantly more token-dense per character. A 10,000-character Python file might use 4,000 tokens where English prose would use 2,500.
How much text is really 128,000 tokens?
GPT-4o's 128,000 token context is the one most people hit most often. Concrete examples of what fits:
- The entire text of The Great Gatsby (~50,000 words) — fits comfortably with room for ~46,000 words of output
- A full PhD dissertation (~70,000 words) — fits with ~26,000 words of headroom
- 300 pages of single-spaced text (~96,000 words) — fills the window, leaving almost no room for a meaningful response
- A typical corporate annual 10-K filing (~65,000 to 100,000 words) — usually fits, but may need trimming for long ones
- All seven Harry Potter books combined (~1.08 million words) — doesn't fit in GPT-4o. You'd need GPT-4.1 or chunking.
The rule of thumb: if your source document is under 70,000 words, GPT-4o handles it one-shot. Between 70,000 and 150,000, use o3 or GPT-4.1. Over 150,000, you're either using GPT-4.1's million-token window or you're chunking and using retrieval.
Output limits — the quieter cap that bites you
Everyone talks about input limits. The output limit catches people off guard.
GPT-4o's default output ceiling is 16,384 tokens, which is about 12,000 words. Ask it to "translate this entire novel" or "rewrite these 50 blog posts in one response" and the output stops, usually mid-sentence. The model doesn't always tell you it stopped because it hit the cap — it just stops, and you have to prompt it to continue.
For long outputs, three patterns work:
- Chunk the task. Ask for the first section, get it, then ask for the next. Chat gets slow, but completions stay reliable.
- Use "continue" prompts. When you see output cut off, reply with just "continue" and the model usually picks up where it stopped. Works better than it has any right to.
- Increase max_tokens on the API. The default in most SDKs is lower than the model's true ceiling. Explicitly set max_tokens to get closer to the full limit.
The reasoning models (o3, o4-mini) have higher output limits (~75,000 words) because long outputs are often the point with those models.
Does the model actually use the full context?
Having a 128k window and using all 128k well are different problems. Published research from Anthropic, Google and independent teams has consistently shown that large-context models perform worse on needle-in-a-haystack tasks when the needle sits in the middle of the context vs. the beginning or end. The effect is real. It's called "lost in the middle."
Practically, this means:
- Put your most important information (the question, the key constraints, the primary reference) near the start or end of your prompt, not buried in the middle.
- For long documents, extract the relevant passages first rather than pasting the whole thing. A focused 5,000-word prompt usually beats a scattered 80,000-word one.
- Don't assume that because your prompt fit, every sentence got equal attention. It didn't.
This is more about getting better responses than avoiding errors. The model won't refuse to answer. It'll just answer worse.
Pricing implications of hitting the word limit
API pricing is per token, so longer prompts cost more. At roughly current (October 2026) list pricing:
- A full 128,000-token GPT-4o input run costs around $0.32 for input alone
- A full 1,000,000-token GPT-4.1 input run is around $2.00 to $5.00 depending on tier
- Output is 3 to 5x more expensive than input per token on most models
A team running 1,000 large-context requests per day at GPT-4.1 prices is spending $2,000 to $5,000 a day on input tokens alone. This is why retrieval-augmented generation (RAG) became the dominant pattern for applications at scale — you send 2,000 tokens of relevant context, not 1,000,000 tokens of everything.
Check if your prompt fits before you send it
Our free tokens-to-words converter shows you exactly how much context window your text uses
Tokens to Words ConverterWord CounterFrequently Asked Questions
What is ChatGPT's word limit?
It depends on the model. GPT-4o accepts about 96,000 words of input. GPT-4.1 accepts about 750,000. o3 and o4-mini accept about 150,000. These are API limits; the ChatGPT web app enforces tighter per-message caps.
How many words can I paste into ChatGPT?
On the web interface, a single message historically caps around 4,000 to 8,000 words before the UI starts rejecting or truncating. On the API, you can paste up to the model's full context window minus whatever you need for output.
What is ChatGPT's character limit?
OpenAI measures in tokens, not characters. For English text, 128,000 tokens is roughly 512,000 characters. Non-English text and code use more tokens per character.
Can ChatGPT read an entire book?
GPT-4o can read most novels in one shot (under 96,000 words). GPT-4.1 can handle very long books (up to ~750,000 words), and GPT-4.1 can fit all seven Harry Potter books combined in a single prompt. The reading quality degrades somewhat on very long contexts due to the "lost in the middle" effect.
What happens if I exceed ChatGPT's word limit?
On the API you get a rejection error. On the web app, the UI may refuse to send, silently truncate your input, or split it into multiple messages. Behavior is inconsistent, so count tokens yourself before sending anything near the cap.
Is the ChatGPT word limit the same as the output limit?
No. Input and output have separate caps. GPT-4o takes up to 128,000 tokens of input but generates at most ~16,000 tokens of output per request (about 12,000 words). For longer outputs, chunk the task or use "continue" prompts.
Does Plus or Pro raise the word limit?
The underlying model limits are the same. ChatGPT Plus and Pro remove some rate limits and unlock access to higher-tier models (like o3 with 200,000 token context), which effectively raises your usable limit. The raw model caps don't change by subscription tier.