The open-weight revolution that defined Q4
The headline story of Q4 2025 was the open-weight tier closing the gap to frontier closed models in a way that was qualitatively different from anything before. Previous open models had been impressive for their size but clearly inferior to GPT-4 or Claude 3 on complex tasks. By Q4 2025, that distinction was becoming hard to defend for a wide range of professional tasks.
Llama 3.3 70B was the key data point. At 70 billion parameters — large but self-hostable on modest infrastructure — it performed comparably to GPT-4-class models on writing, analysis, and coding tasks that most professionals actually care about. The benchmark gaps that remained were real but increasingly irrelevant for most use cases.
DeepSeek V3 extended this further. A 671 billion parameter Mixture of Experts model that, despite its enormous total size, activated only ~37 billion parameters per token — making inference cost comparable to a much smaller model. The quality was unmistakably frontier-adjacent. And the weights were free.
For the first time, the calculus for self-hosted AI tilted decisively for high-volume enterprise use cases. A legal firm processing thousands of contracts monthly, a financial institution running millions of document extractions, or a technology company serving AI features to users at scale — all faced economics that increasingly favoured open weights on owned infrastructure over paying per-token to closed providers.
The barrier shifted from capability (open models weren't good enough) to operational (does your team have the capacity to manage the infrastructure?).
Q4 2025 was the quarter where AI safety moved from theoretical concern to active policy. The EU AI Act's requirements were confirmed. Major enterprises started issuing internal AI governance policies. Anthropic's Constitutional AI approach attracted serious enterprise interest as organisations sought models with documented, consistent safety properties.
For non-technical professionals, the practical implication was increased procurement scrutiny — IT and legal teams started asking more detailed questions about model safety properties, data handling, and audit capabilities before approving AI tool adoption.
By Q4 2025, the ability to process images alongside text was no longer a differentiating feature — it was expected. GPT-4o, Claude 3.5, and Gemini 1.5 all handled images natively. The competitive frontier moved to video understanding (Gemini), real-time audio (GPT-4o), and the combination of multimodal input with long context windows.
The DeepSeek R1 shadow
By late Q4, word was circulating in AI research circles about DeepSeek's upcoming reasoning model. The January 2025 release of R1 was not a complete surprise to the research community — the techniques being used had been visible in earlier DeepSeek publications. What surprised everyone was the benchmark performance and the reported training cost.
In retrospect, Q4 2025 was the last quarter before the AI industry's cost assumptions were reset. The R1 release in January 2026 changed the conversation fundamentally.
Looking back from 2026
Q4 2025 looks, in hindsight, like the quarter when AI transitioned from an exciting technology to a business infrastructure decision. The questions shifted from "should we use AI?" to "which models, at what cost, with what governance?" That's a fundamentally more mature question — and it reflects how far the landscape had moved in eighteen months.