State of LLMs — Q4 2025

Q4 2025 at a glance

Model of the quarter

DeepSeek V3 — open weight, near-frontier capability, training cost that shocked the industry

Biggest trend

Open weight models crossed the threshold of professional viability across most business tasks

Most important shift

The cost gap between open and closed models widened to the point of being a serious procurement consideration

Watch in Q1 2026

Whether Western labs respond to DeepSeek's efficiency pressure with architectural changes of their own

The open-weight revolution that defined Q4

The headline story of Q4 2025 was the open-weight tier closing the gap to frontier closed models in a way that was qualitatively different from anything before. Previous open models had been impressive for their size but clearly inferior to GPT-4 or Claude 3 on complex tasks. By Q4 2025, that distinction was becoming hard to defend for a wide range of professional tasks.

Llama 3.3 70B was the key data point. At 70 billion parameters — large but self-hostable on modest infrastructure — it performed comparably to GPT-4-class models on writing, analysis, and coding tasks that most professionals actually care about. The benchmark gaps that remained were real but increasingly irrelevant for most use cases.

DeepSeek V3 extended this further. A 671 billion parameter Mixture of Experts model that, despite its enormous total size, activated only ~37 billion parameters per token — making inference cost comparable to a much smaller model. The quality was unmistakably frontier-adjacent. And the weights were free.

Self-hosting became a serious enterprise option

For the first time, the calculus for self-hosted AI tilted decisively for high-volume enterprise use cases. A legal firm processing thousands of contracts monthly, a financial institution running millions of document extractions, or a technology company serving AI features to users at scale — all faced economics that increasingly favoured open weights on owned infrastructure over paying per-token to closed providers.

The barrier shifted from capability (open models weren't good enough) to operational (does your team have the capacity to manage the infrastructure?).

The safety and alignment conversation matured

Q4 2025 was the quarter where AI safety moved from theoretical concern to active policy. The EU AI Act's requirements were confirmed. Major enterprises started issuing internal AI governance policies. Anthropic's Constitutional AI approach attracted serious enterprise interest as organisations sought models with documented, consistent safety properties.

For non-technical professionals, the practical implication was increased procurement scrutiny — IT and legal teams started asking more detailed questions about model safety properties, data handling, and audit capabilities before approving AI tool adoption.

Multimodal capability became table stakes

By Q4 2025, the ability to process images alongside text was no longer a differentiating feature — it was expected. GPT-4o, Claude 3.5, and Gemini 1.5 all handled images natively. The competitive frontier moved to video understanding (Gemini), real-time audio (GPT-4o), and the combination of multimodal input with long context windows.

The DeepSeek R1 shadow

By Q4, DeepSeek R1's January release was most of a year in the rear-view mirror — but its effects were still working through the industry. R1's benchmark performance and reported training cost had already forced every major lab to respond months earlier; DeepSeek V3's Q4 emergence was a direct continuation of that pressure, building on techniques that had been visible in DeepSeek's earlier publications, not a separate surprise.

In retrospect, Q4 2025 was the quarter the AI industry's reset cost assumptions fully played out in mainstream open models. The R1 release back in January had already changed the conversation fundamentally — Q4 was when that shift showed up directly in DeepSeek V3 and Llama 3.3's near-frontier performance.

Looking back from 2026

Q4 2025 looks, in hindsight, like the quarter when AI transitioned from an exciting technology to a business infrastructure decision. The questions shifted from "should we use AI?" to "which models, at what cost, with what governance?" That's a fundamentally more mature question — and it reflects how far the landscape had moved in eighteen months.

State of LLMs —Q4 2025

The open-weight revolution that defined Q4

The DeepSeek R1 shadow

Looking back from 2026

State of LLMs —
Q4 2025