Frontier AI Models — LLMs 101 Directory

OpenAI Market Leader

GPT-5.6 Sol (flagship) · GPT-5.6 Terra · GPT-5.6 Luna · GPT-5.5 (Thinking / Pro) · GPT-5.5 Instant · GPT-5.4

Core superpower

A three-tier flagship family that lets you dial cost against capability — from top-end reasoning to fast, cheap, high-volume work — without switching brands.

Key trade-off

The tiers can be genuinely confusing: "available in ChatGPT" doesn't mean every model shows up in your picker, and access depends on your plan, effort settings, and rollout wave.

Speed

8/10

Reasoning

9.5/10

Cost

Premium

Best non-technical use

Working through a hard, multi-step task — drafting a detailed report, planning a project, or reasoning through a tricky decision — where you want the strongest thinking available in ChatGPT.

Cost tier

Premium

Anthropic Market Leader

Claude

Fable 5, Opus 4.8, Sonnet 5, Haiku 4.5

Core superpower

Careful, nuanced reasoning and long-form writing that stays coherent across very long, complex tasks.

Key trade-off

Top-tier models sit at a premium price, and Fable 5 access briefly hinged on shifting US export rules — a reminder that frontier availability can change fast.

Speed

7/10

Reasoning

9.6/10

Cost

Premium

Best non-technical use

Drafting, editing, and thinking through long documents — reports, proposals, research summaries — where tone and accuracy matter.

Cost tier

Premium

Google DeepMind Highly Competitive

Gemini

Gemini 3.5 Flash, Gemini 3.1 Pro, Gemini 3.1 Flash-Lite

Core superpower

Gemini 3.5 Flash packs near-Pro coding and agentic ability into a fast, high-throughput package, so it can churn through multi-step tool-use tasks quickly and cheaply.

Key trade-off

It is fast for its intelligence class but not the outright speed leader — and on deep reasoning, long-context retrieval, and knowledge-heavy work, the pricier Gemini 3.1 Pro (and rivals from OpenAI and Anthropic) still pull ahead.

Speed

8.5/10

Reasoning

8.2/10

Cost

Good value

Best non-technical use

Running an AI assistant that works through long, multi-step tasks — like reading a stack of invoices or a 100-page document and pulling out the answers — where you want speedy, affordable results and can accept slightly-below-frontier accuracy.

Cost tier

Low

xAI Highly Competitive

Grok

Grok 4.5 · Grok 4.3 · Grok 4.1 Fast

Core superpower

Live access to what people are posting on X right now, so it's unusually good at "what's happening lately" questions that stump models working from a fixed training set.

Key trade-off

The newest model has the smallest memory: Grok 4.5 reads about 500K tokens at once, while the older Grok 4.3 handles 1M and Grok 4.1 Fast handles 2M — so for very long documents, the newest isn't always the right pick.

Speed

8/10

Reasoning

8/10

Cost

Moderate

Best non-technical use

Getting a fast, current read on breaking news or public reaction — asking what people are saying about a company, event, or trend as it unfolds.

Cost tier

Standard

Meta AI Rising

Llama

Llama 4 Maverick · Llama 4 Scout · Llama 3.3 70B

Core superpower

Fully open weights — download and run privately on your own hardware, free forever with no API costs; Llama 4 Scout's 10-million-token context window is the largest of any openly available model

Key trade-off

Meta has shifted its frontier investment to a new proprietary model (Muse Spark, below) — Llama remains available but is expected to see maintenance updates rather than continued frontier development, and now trails closed frontier models by a wider margin than before

Speed

7.3/10

Reasoning

4.8/10

Cost

Free*

Best non-technical use

Privacy-sensitive workflows where data cannot leave your machine, high-volume automation where API costs would otherwise be prohibitive

Cost tier

Free

Meta Superintelligence Labs Rising

Muse Spark

Muse Spark 1.1 (current), Muse Spark 1.0

Core superpower

A multimodal reasoning model built for agentic tasks — planning, using tools, and orchestrating work across apps with a very large 1-million-token memory that it actively manages during long jobs.

Key trade-off

It's tuned for tool use and orchestration rather than raw coding accuracy, so it trails top rivals on the hardest coding and reasoning tasks — and all the standout benchmark numbers so far are Meta's own, not independent tests.

Speed

7.5/10

Reasoning

7.8/10

Cost

Low

Best non-technical use

Chatting free in the Meta AI app or at meta.ai, where "Thinking" mode shows the model's step-by-step reasoning before it answers — handy for multi-step questions like planning an event or working through a problem out loud.

Cost tier

Low

DeepSeek Disruptor

DeepSeek

DeepSeek V4 Pro · DeepSeek V4 Flash

Core superpower

Frontier-competitive reasoning at a small fraction of closed-model pricing — still the most significant price disruption in AI history

Key trade-off

Chinese-operated; some content restrictions; data privacy considerations for sensitive enterprise use

Speed

5.4/10

Reasoning

8.6/10

Cost

Ultra-low

Best non-technical use

High-volume automation where cost-per-prompt needs to be near zero; advanced coding tasks where quality needs to match closed frontier models at a fraction of the cost

Cost tier

Ultra-low

Alibaba Rising

Qwen

Qwen3.8-Max-Preview, Qwen3.7-Max, Qwen3.7-Plus, Qwen3.6-27B (open weights)

Core superpower

A fast-shipping frontier lineup topped by a huge new preview model, backed by long-horizon agent skills and a genuinely open-weight budget tier you can download and run yourself.

Key trade-off

The lineup is split: the newest and strongest models (Qwen3.8-Max-Preview, Qwen3.7-Max, Qwen3.7-Plus) are cloud-only and proprietary, while the free-to-download open weights are a generation behind — so you can't have the top capability and self-hosting at the same time.

Speed

5/10

Reasoning

8.5/10

Cost

Mixed

Best non-technical use

Working through long, multi-step reasoning tasks — like analyzing a big pile of documents or research at once — thanks to a very large context window that fits huge amounts of text in a single request.

Cost tier

Low

Moonshot AI Rising

Kimi

Kimi K3 (current flagship), K3 Swarm Max; prior: K2.7 Code, K2.6, K2.5, K2

Core superpower

A huge open-weight model that can hold an entire large project in view at once and reason across all of it without losing the thread.

Key trade-off

It always "thinks" before answering, and at launch you can't dial that down — so simple questions cost more time and money than they need to.

Speed

6/10

Reasoning

8.8/10

Cost

Mid

Best non-technical use

Working through a very long document set — say a stack of contracts or a full research folder — and getting answers that stay consistent from the first page to the last.

Cost tier

Low

Mistral AI Rising

Mistral

Mistral Medium 3.5 · Mistral Large 3 · Mistral Small 4

Core superpower

Fast, efficient models with a genuinely strong price-to-performance ratio, and a unified Vibe agent now handling both research and coding tasks across web, IDE, and terminal

Key trade-off

The lineup has gotten genuinely complex — Medium 3.5 is the newest release and now the default in Mistral's own tools, but Large 3 remains the largest model for the heaviest workloads, so "which Mistral model" isn't a one-line answer anymore

Speed

9.5/10

Reasoning

7.1/10

Cost

Standard

Best non-technical use

Fast, cost-efficient production applications where European data sovereignty and open weights matter; real-time summarisation and classification at scale

Cost tier

Standard

Every major AI model,actually explained

Every major AI model,
actually explained