Frontier model directory
No benchmark scores. No jargon. Just what each model family is genuinely good at, where it falls short, and who should use it.
Updated June 2026OpenAI's GPT-4o and o-series models are the most widely deployed large language models in the world. GPT-4o provides fast, multimodal responses combining text, image, and audio understanding. The o3 and o4-mini reasoning models use extended chain-of-thought processing to achieve near-expert performance on mathematics, coding, and complex logic tasks. GPT-4.1 is optimised for long-context coding and instruction following. Available via the OpenAI API and ChatGPT.
Anthropic's Claude models are built with Constitutional AI, a safety training approach that makes Claude unusually reliable and honest. Claude 4 Opus offers the highest capability for complex reasoning and writing tasks. Claude 4 Sonnet balances capability and cost for everyday professional use. Claude Haiku is optimised for speed and cost efficiency. Claude models support a 200,000 token context window — large enough to process entire books or codebases in a single conversation.
Google DeepMind's Gemini models are natively multimodal, processing text, images, audio, and video as equal first-class inputs. Gemini 2.5 Pro features the largest publicly available context window — 1 million tokens — enabling analysis of entire large codebases or lengthy video content. Gemini 2.0 Flash is optimised for speed and cost efficiency in production applications. Gemini powers Google Search AI Overviews and is deeply integrated into Google Workspace products including Docs, Gmail, and Sheets.
Meta's Llama 3 series are the most capable openly available large language models. Llama 3.3 70B delivers performance competitive with proprietary frontier models at zero licensing cost. Llama 3.1 405B approaches GPT-4 class performance and is available for local deployment. The open weights enable the Ollama ecosystem, allowing anyone to run AI locally on consumer hardware including Apple Silicon Macs. Llama models can be fine-tuned on private data without exposing that data to any third party.
DeepSeek's January 2025 release of R1 shocked the AI industry by matching OpenAI o1's reasoning benchmark performance at approximately $6 million training cost versus hundreds of millions. DeepSeek V3 is a 671-billion parameter Mixture of Experts model that activates only 37 billion parameters per token, achieving frontier performance at dramatically lower inference cost. DeepSeek R1 uses chain-of-thought reasoning and is available as open weights. DeepSeek-Coder is specialised for software development tasks.
Mistral AI, founded in France in 2023, created a stir with Mistral 7B which outperformed much larger models at launch. Mixtral 8x7B is an open-weight Mixture of Experts model with 46.7 billion total parameters activating only 12.9 billion per token — delivering strong performance at low inference cost. Mistral Large competes with GPT-4 class models. Mistral is a strong advocate for open-source AI and European digital sovereignty. Models are available via the Mistral API or as open weights for local deployment.