DeepSeek Word Limit by Model (2026)
DeepSeek changed the economics of frontier AI in early 2025 and stayed cheap. Here's what each current model accepts, what it costs, and why V4 at $0.30 per million tokens is eating the market.
Quick Answer
DeepSeek V4 (March 2026) accepts 1,000,000 tokens (~750K words). DeepSeek V3.2 accepts 128,000 tokens (~96K words). DeepSeek R1 (reasoning model) accepts 64,000 tokens input but allows up to 64K output, unusual among frontier models. At $0.30 per million input tokens, V4 is roughly 50x cheaper than Claude Opus for equivalent context, which is why it dominates high-volume workloads.
DeepSeek context windows by model
| Model | Input tokens | Max output | Released |
|---|---|---|---|
| DeepSeek V4 | 1,000,000 | 8,000 | Mar 2026 |
| DeepSeek V3.2 | 128,000 | 8,000 | Late 2025 |
| DeepSeek R1 | 64,000 | 64,000 | Jan 2025 |
| DeepSeek V3.1 | 128,000 | 7,168 | Jan 2025 |
| DeepSeek V3 (original) | 64,000 | 8,000 | Dec 2024 |
Specs from DeepSeek API documentation and Hugging Face model cards, April 2026.
The DeepSeek pricing story
DeepSeek R1's launch in January 2025 is often called the "DeepSeek moment" because it demonstrated ChatGPT-level reasoning at a fraction of the training and API cost. The pricing held. Current rates:
| Model | Input / 1M | Cache hit / 1M | Output / 1M |
|---|---|---|---|
| DeepSeek V4 | $0.30 | $0.03 | $0.50 |
| DeepSeek V3.2 Chat | $0.28 | $0.028 | $0.42 |
| DeepSeek R1 | $0.55 | $0.055 | $2.19 |
For reference: Claude Opus 4.6 is $15 per million input tokens. DeepSeek V4 is $0.30. That is a 50x difference. For output, R1 at $2.19 compares to OpenAI o1 at $60, making R1 roughly 96% cheaper for reasoning-heavy workloads.
The cache hit discount is the detail that actually changes the math. If your prompts share a common prefix (system prompt, tool definitions, a reference document), cached input tokens cost 90% less. A production app with a well-structured system prompt sees effective input costs below $0.05 per million tokens on V4. That is approaching commodity pricing.
R1 and the separate output budget
DeepSeek R1 does something no other major model does: output tokens don't count against the input budget. Most models share one 200K or 1M token pool between input and output. R1 gives you 64K for input and then up to 64K more for output, including chain-of-thought reasoning tokens.
This matters for reasoning workloads. If you're asking R1 to solve a multi-step math problem with step-by-step working shown, the reasoning chain itself can consume 10K-40K tokens. Other reasoning models pay for this out of the shared context, meaning your usable input window shrinks accordingly. R1's architecture lets you think long without sacrificing input capacity.
When DeepSeek V4 is the right pick
V4 scores 81% on SWE-bench Verified (vs V3's 69%) and holds its own against GPT-5 on general benchmarks. At $0.30 input, it's the best price-to-quality ratio on the market for most production workloads. Specifically strong for:
- Any high-volume workflow where per-token cost dominates (classification, extraction, basic Q&A)
- Code agents and coding assistants (V4's coding scores are competitive with GPT-4 tier)
- Long-document summarization with the 1M token context
- Multilingual applications (DeepSeek was trained heavily on Chinese and English, strong on both)
- Startups that cannot afford Claude Opus pricing but need frontier-tier quality
Where V4 is not the right pick: workloads that genuinely need the deepest reasoning (use R1 or Claude Opus), vision-heavy tasks (DeepSeek's multimodal is behind GPT-4o and Gemini), or enterprise environments with data-residency concerns about servers hosted in mainland China.
The statelessness gotcha
DeepSeek's API is stateless. There is no persistent conversation memory. Every multi-turn chat requires re-sending the full conversation history in each API call. For long sessions, this is expensive even at DeepSeek's low rates because token volume grows quadratically with turn count.
Workarounds: use context caching aggressively for shared prefixes, summarize older turns instead of replaying verbatim, or use conversation-summarization techniques to compress the history. The API is powerful but you're responsible for managing context yourself.
See DeepSeek cost estimates for your actual prompt
Our AI prompt counter shows token count and input cost across DeepSeek and 9 other models
AI Prompt Word CounterFAQ
What is DeepSeek's word limit?
V4 accepts about 750,000 words (1M tokens). V3.2 accepts about 96,000 words (128K). R1 accepts about 48,000 words input (64K tokens) but allows up to 64K tokens output.
Why is DeepSeek so much cheaper than Claude or GPT?
Lower training costs (DeepSeek pioneered efficient MoE and FP8 training), simpler deployment, and a deliberate pricing strategy to capture market share. The quality is genuinely frontier-tier; the pricing isn't an accident.
Can I trust DeepSeek with sensitive data?
Their privacy policy states data may be stored on servers in mainland China. For sensitive or regulated data, check whether that meets your compliance requirements. Alternatively, self-host DeepSeek weights (they're open-source under MIT License) on your own infrastructure.
Does DeepSeek R1 really use reasoning tokens like OpenAI o1?
Yes. R1 generates visible chain-of-thought reasoning before its final answer. Unlike o1, R1's reasoning is fully visible and you can see the model's step-by-step working.
How do I handle DeepSeek's small output limit?
V4 and V3.2 cap output at 8,000 tokens per response. For longer outputs, chunk the task or use continue prompts. R1's 64K output cap is much more generous and designed for long reasoning chains.