What is a reasoning model?
Standard AI models work like this: you ask a question, the model immediately starts generating the most likely response one token at a time. It is extremely fast and often excellent — but it has no ability to pause and think before committing to a direction.
A reasoning model adds a step before the visible response: it generates a long internal "chain of thought" — working through the problem, checking its reasoning, considering alternatives — before producing its final answer. You often see this as a collapsed "Thinking..." section in interfaces like Claude or ChatGPT.
Why does thinking in tokens help?
This sounds almost too simple to work — but the key insight is that those intermediate reasoning tokens become part of the model's context. The model is literally reading its own reasoning as it generates it, which allows it to catch errors, reconsider assumptions, and build on intermediate conclusions in a way that immediate response generation cannot.
Think of it like the difference between answering a maths problem in your head immediately versus writing out the working. The act of writing the working changes the answer — you catch mistakes you'd otherwise miss.
When does it actually matter?
Reasoning mode makes a meaningful difference for:
Complex maths and logic. Multi-step problems where an error at step two invalidates everything after. This is where reasoning models most dramatically outperform standard models — the difference on competition-level maths can be the gap between 40% and 90% accuracy.
Code debugging. Tracing through what code actually does versus what it was intended to do requires holding multiple states in mind simultaneously. Reasoning models are significantly better at this.
Strategic analysis. Problems with multiple interacting variables where the right answer depends on considering second and third-order effects — business decisions, policy analysis, risk assessment.
Where it adds less value: Simple questions, creative writing, summarisation, and conversational tasks where the standard model already performs well. Using reasoning mode for these is slower and more expensive without meaningful quality gain.
Use standard mode for: writing, summarising, brainstorming, simple Q&A. Switch to reasoning/thinking mode for: maths, complex logic, code debugging, multi-step planning, and any task where you've been burned by a confident wrong answer before.
The models available right now
OpenAI o3 and o4-mini — the current benchmark leaders on hard reasoning tasks. o4-mini offers strong reasoning at lower cost than o3. Available in ChatGPT and via API.
Claude's extended thinking — available in Claude 4 Opus and Sonnet. Particularly strong on complex writing and analysis tasks that benefit from careful reasoning. Toggle in the interface or set via API.
Gemini 2.5 Pro with deep research — Google's reasoning-enabled model, particularly strong when combined with its 1M context window for analysing large documents carefully.
DeepSeek R1 — the open-weight reasoning model that matched o1's performance in January 2025. Available to self-host, making frontier-quality reasoning accessible without API costs.