State of LLMs — Q2 2026

Q2 2026 at a glance

Model of the quarter

Claude Opus 4.8 — the most capable model people can actually, reliably use right now

Biggest trend

Open-weight models (DeepSeek, Qwen, Kimi) closed the gap to near-frontier — while Meta itself moved away from open weights

Most important shift

Frontier model access proved to be conditional on government policy, not just your subscription

Watch in Q3

The EU AI Act's next major deadline lands August 2 — and whether the suspended model returns at all

Where the models stand

The tier-one cluster is genuinely three-way competitive. Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro all trade places depending on the specific task and benchmark — there is no longer a single, stable "best model," and that's been true for several quarters running now. Reasoning that used to live in separate model lines (OpenAI's o-series) has folded into a Thinking-mode toggle within the main flagship models at both OpenAI and Anthropic.

The more important development is how close the open-weight field has gotten. DeepSeek V4, Qwen3.6, and Kimi K2.6 are now genuinely competitive with closed frontier models on most real-world tasks, at a fraction of the cost — measured on independent benchmarks, not just vendor claims. For the vast majority of professional use cases, the "best" model is no longer the most capable one available; it's the one with the best capability-to-cost ratio for your specific task.

Claude Opus 4.8

Anthropic

The most capable model people can actually use right now. 1M-token context at flat API rates, no surcharge.

↑ Model of the quarter

GPT-5.5

OpenAI

Strong across reasoning and coding via Thinking mode. GPT-5.5 Instant became ChatGPT's default for all users in May.

→ Holding top tier

Gemini 3.1 Pro

Google DeepMind

Unmatched on long-context and multimodal tasks. 1M token context, deeply integrated into Workspace.

→ Stable at tier 1

DeepSeek V4

DeepSeek

Continues to dominate on cost. Open weights enable self-hosted deployments at near-zero marginal cost.

⚠ Data sovereignty concerns

The three trends that defined Q2

A frontier model went offline by government order, three days after launch

This is the quarter's genuine surprise. On June 9, Anthropic launched Claude Fable 5 — by its own account, more capable than any model it had ever released. On June 12, the US Commerce Department ordered it suspended worldwide, citing national security concerns over a disputed jailbreak finding. Anthropic complied within hours, disabling Fable 5 (and its more restricted sibling, Mythos 5) for every customer, while publicly disputing the severity of the underlying claim.

As of this writing, Fable 5 remains unavailable to the general public. A June 26 government letter partially restored Mythos 5 — but only for a short list of approved US entities, not for ordinary users. This is the first time a government export-control directive has reached into a live commercial AI service rather than physical chips or downloadable weights, and it's a genuinely new kind of risk for anyone building on a single model. Claude Opus 4.8 remains fully available throughout and is Anthropic's recommended fallback.

Before — access was a subscription decision

Whether you could use a frontier model came down to which plan you paid for. Outages were rare, brief, and technical.

After — access can be a policy decision

A model you depend on today can be withdrawn by government order overnight, for reasons outside the provider's control, with no transition window.

Open-weight models reached genuine near-frontier parity

DeepSeek V4, Alibaba's Qwen3.6 family, and Moonshot AI's Kimi K2.6 all advanced significantly this quarter, narrowing the gap to closed frontier models on independent benchmarks — not just on cost, but on actual capability. Kimi in particular built a strong reputation for agentic coding and long, multi-step tool-use tasks, releasing a dedicated coding-specialised variant (K2.7 Code) in June.

For organisations running high-volume workflows, this matters twice over: cost (self-hosting at near-zero marginal cost) and privacy (sensitive data never leaves your infrastructure). For legal, healthcare, and financial services, open weights have moved from "interesting experiment" to "serious option" this quarter.

Meta walked away from open weights for its own flagship

In a genuine reversal of the strategy that built its reputation, Meta's new Superintelligence Labs division launched Muse Spark in April — a proprietary, closed-weight model that now powers Meta AI across WhatsApp, Instagram, Facebook, Messenger, and Meta's smart glasses, replacing Llama in that role. Meta says it hopes to open-source future versions, but hasn't yet.

Llama itself hasn't been discontinued — Llama 4 Maverick and Scout remain open, downloadable, and widely deployed — but Meta's own frontier investment has clearly moved elsewhere. It's a striking signal alongside trend #2: the open-weight ecosystem is thriving, increasingly led by Chinese labs, even as the company that did the most to popularise open weights in the West has stepped back from the approach.

"The most capable model Anthropic had ever shipped became unavailable to most of its customers before they'd had a real chance to use it — proof that in 2026, what you can access depends as much on policy as on price."

LLMs 101 editorial assessment, June 2026

What changed for non-technical professionals

If you use AI for writing and analysis

Claude Opus 4.8 remains the strongest choice for long-form writing, editing, and complex analysis. If you're on a standard plan, Sonnet 4.6 provides most of Opus's capability at lower cost. If you'd been using Fable 5 for the few days it was available, Opus 4.8 is the recommended fallback — the gap shows up mainly on the hardest, longest reasoning tasks, not daily work.

If you manage a team or organisation

The practical lesson of the Fable 5 suspension applies even if you never used it: don't hard-wire your workflows to one specific model. A model you depend on today can become unavailable overnight for reasons entirely outside your provider's control. Build with model flexibility in mind, not as an afterthought.

If you're evaluating which AI tool to pay for

The honest answer in Q2 2026: Claude (Opus 4.8 or Sonnet 4.6) for writing-heavy, reasoning-heavy, and document-heavy work. ChatGPT / GPT-5.5 for breadth, integrations, and anything where the OpenAI ecosystem matters. Gemini 3.1 Pro if you're deep in Google Workspace or need to process very long documents. For budget-constrained high-volume use: DeepSeek, Qwen, or Kimi via API (with data sovereignty caveats, given all three are Chinese-operated).

If you're building something on top of AI

The Anthropic and OpenAI APIs are both mature for production deployment. The new architectural lesson of Q2: abstract your model selection behind a config setting rather than hardcoding a model name, so a provider-level disruption — whether technical or, as this quarter showed, regulatory — becomes a routing decision instead of an emergency.

What to watch in Q3 2026

The EU AI Act's next major deadline. August 2, 2026 — squarely in Q3 — is when transparency obligations and most high-risk AI system rules take effect across the EU. A proposed "Digital Omnibus" amendment, agreed in principle in May, would push some high-risk obligations (recruitment, credit scoring, border control) out to December 2027, but it still needs formal adoption before the August deadline arrives. Watch whether that lands in time.

Whether Fable 5 actually comes back. Anthropic has said it believes the suspension is a misunderstanding and is negotiating restoration; as of this writing, only a narrow, government-approved subset of Mythos 5 access has been restored, and Fable 5 itself remains fully offline for the public. Whatever the outcome, the precedent — that a deployed commercial AI model can be pulled by national security order — doesn't go away even if this specific case resolves quickly.

The next wave of model releases. Google's Gemini 3.5 Pro has been announced but wasn't yet generally available as of late June. OpenAI's GPT-5.6 exists in a restricted preview with a small number of partner companies. Whether either reaches general availability in Q3 — and under what access conditions — is worth watching given this quarter's events.

Further reading on this quarter's themes

State of LLMs —Q2 2026