Q2 2026 at a glance
Model of the quarter
Claude 4 Opus — clearest quality jump in long-form reasoning since GPT-4
Biggest trend
Agentic AI crossed from experiment to production deployment at scale
Most important shift
Cost per million tokens fell another 40% across most frontier providers
Watch in Q3
Multimodal reasoning — models that see, hear, and read — enters daily professional use

Where the models stand

The tier-one cluster is now genuinely three-way competitive in a way it wasn't a year ago. OpenAI's o3 retains the top reasoning benchmark position, but the gap between it and the field has narrowed substantially. Claude 4 Opus and Gemini 2.5 Pro are genuinely competitive on most real-world professional tasks — this is the first quarter where the choice between frontier providers has been a meaningful one rather than a default to OpenAI.

The more important development is below the top tier. Claude 4 Sonnet, GPT-4o, and Gemini 2.0 Flash now sit at a capability level that would have been considered frontier twelve months ago — and they're significantly cheaper to run. For the vast majority of professional use cases, the "best" model is no longer the most capable one; it's the one with the best capability-to-cost ratio for your specific task.

Claude 4 Opus
Anthropic
Best long-form reasoning and writing of any model this quarter. Significant jump over Claude 3.5.
↑ Model of the quarter
o3 / o4-mini
OpenAI
Still leads on hard maths and competitive coding benchmarks. o4-mini offers strong reasoning at lower cost.
→ Holding top position
Gemini 2.5 Pro
Google DeepMind
Unmatched on long-context and multimodal tasks. 1M token context is genuinely useful at this point.
→ Stable at tier 1
DeepSeek V3 / R1
DeepSeek
Continues to dominate on cost. Open weights enable self-hosted deployments at near-zero marginal cost.
⚠ Data sovereignty concerns

The three trends that defined Q2

01
Agentic AI crossed the production threshold

This is the structural shift of the quarter — and possibly of the year. "Agentic" AI systems don't just answer questions; they plan and execute multi-step tasks autonomously. In Q2 2026, this stopped being an experimental capability and became a production reality at scale.

The evidence: Microsoft Copilot agents are now deployed in tens of thousands of enterprises. Anthropic's Claude handles complex multi-tool workflows as a standard API feature. Google's Gemini operates autonomously within Workspace. For the first time, the question is no longer "can AI do multi-step tasks?" but "how much autonomy should we give it?"

Q1 2026 — AI as assistant
You describe a task → AI suggests how to do it → You execute it → You come back with the next step. Every action still requires human initiation and decision-making between steps.
Q2 2026 — AI as agent
You define a goal and constraints → AI plans the steps, executes them in sequence, handles errors, and delivers a finished result. Human oversight moves from step-by-step to goal-level.
02
The cost collapse continued — and accelerated

The per-token cost of frontier AI fell another 40% on average across major providers in Q2. This continues a trend that has seen costs drop approximately 100x since 2023. The mechanism: competition, improved model efficiency (smaller models getting smarter), and hardware improvements — all compounding simultaneously.

The practical implication: the ROI calculation for AI integration has crossed a threshold for many use cases that previously looked marginal. Tasks where AI assistance cost more than human time now frequently cost less. This is driving deployment decisions more than model quality improvements.

03
Open weight models reached professional viability

Llama 3.3 70B and DeepSeek V3 are now genuinely competitive with closed frontier models on most professional tasks — not in academic benchmarks, but in the things professionals actually need: clear writing, structured analysis, code generation, document summarisation.

This matters for two reasons. First, cost: organisations running high-volume workflows can self-host open models at near-zero marginal cost. Second, privacy: sensitive data never leaves your infrastructure. For legal, healthcare, and financial services, this has moved open models from "interesting experiment" to "serious option."

"The most significant shift in Q2 isn't which model is smartest — it's that the capability gap between open and closed models has narrowed to the point where the decision is now about deployment model and cost, not intelligence."

LLMs 101 editorial assessment, June 2026

What changed for non-technical professionals

If you use AI for writing and analysis

Claude 4 Opus is the clearest upgrade this quarter for long-form writing, editing, and complex analysis. If you're on the standard Claude plan, Sonnet provides 90% of Opus capability at lower cost. The before/after on document quality is genuinely noticeable compared to six months ago.

If you manage a team or organisation

Agentic workflows deserve your attention now rather than later. The 2026 productivity gap between organisations that have integrated AI agents into their workflows and those still using AI as a chat tool is becoming measurable. This doesn't require technical staff — most agentic tools are now point-and-click configuration.

If you're evaluating which AI tool to pay for

The honest answer in Q2 2026: Claude for writing-heavy, reasoning-heavy, and document-heavy work. ChatGPT / GPT-4o for breadth, integrations, and anything where the OpenAI plugin ecosystem matters. Gemini if you're deep in Google Workspace or need to process very long documents. For budget-constrained high-volume use: DeepSeek via API (with data sovereignty caveats).

If you're building something on top of AI

The Anthropic and OpenAI APIs are both now mature enough for production deployment. The key architectural decision of Q2: whether to use large context windows (feed the whole document) versus RAG (retrieve relevant chunks). The answer is increasingly context window for smaller corpora and RAG for large knowledge bases — the crossover point has shifted substantially as context costs fell.

What to watch in Q3 2026

Multimodal reasoning at scale. The ability to reason across text, images, audio, and video simultaneously is moving from demo feature to daily professional tool. Expect Q3 to bring the first genuinely useful real-world multimodal workflows to non-technical users — particularly in sectors dealing with visual data: architecture, engineering, medicine, retail.

AI regulation implementation. The EU AI Act's substantive requirements begin to take effect in Q3. European businesses using AI in consequential decisions face new compliance obligations. This will drive procurement decisions and is already influencing which models enterprises choose in regulated industries.

The next wave of open models. Llama 4, Qwen 3, and likely a new DeepSeek release are all expected in Q3. The open model tier is about to take another step toward frontier parity. For anyone evaluating self-hosted deployment, waiting until Q3 before committing to a model choice is reasonable.

Further reading on this quarter's themes
← All trends Next: The Autonomous Agent Era →