LLMs 101 — Complete Guide to Large Language Models

What is a Large Language Model?

Overview

Large Language Models

The foundation

A Large Language Model (LLM) is an AI system trained to understand and generate human language at scale. The "large" refers to two things: the vast amount of text it was trained on, and the billions of mathematical parameters (weights) inside it.

At its core, an LLM does one deceptively simple thing: given some text, predict what comes next. When trained extraordinarily well across trillions of words, reasoning, knowledge, and language understanding emerge as side effects of that prediction task.

The four key dimensions of how LLMs work are: the mathematics underpinning them, the training process that builds them, the architectural choices that define them, and the prompting techniques that get the best out of them.

Key concepts Next token prediction Parameters Foundation models Emergent capabilities

Sources

Wikipedia — Large language model Anthropic — Core Views on AI Safety

The Mathematics of LLMs

Mathematics

LLM maths isn't exotic — it builds on linear algebra, calculus, and probability applied at enormous scale. The entire forward pass that converts your prompt into a response is essentially a chain of matrix multiplications. Every token becomes a vector — an ordered list of ~4,096 numbers — and every transformation the model applies is a matrix multiplication.

Linear algebra

Vectors & matrices

The foundation of all LLM computation. Every token is converted into an embedding vector — an ordered list of ~4,096 floating-point numbers that encodes its meaning as a position in high-dimensional space. Words with similar meanings end up geometrically close in this space.

Every transformation the model applies — attention, feed-forward layers — is a matrix multiplication: multiplying two grids of numbers together. The billions of "parameters" in a model are literally the individual numbers inside these matrices.

Key concepts Embedding vectors Weight matrices Matrix multiplication Dot products Cosine similarity

Sources

3Blue1Brown — Linear Algebra series Wikipedia — Word embedding

Calculus & gradient descent

How models learn

Calculus drives training. The model's error — how wrong its predictions are — forms a high-dimensional "loss landscape." Training is the process of rolling a ball downhill on that landscape, adjusting parameters in the direction that reduces error.

The gradient tells you which direction downhill is. Backpropagation is the algorithm that efficiently calculates that gradient across billions of parameters simultaneously, flowing error signals backwards through the network layer by layer. The Adam optimiser is the most widely used variant for adapting the learning rate per parameter.

Key concepts Loss function Gradient descent Backpropagation Learning rate Adam optimiser

Sources

3Blue1Brown — Backpropagation explained Rumelhart et al. 1986 — Original backprop paper

Probability & statistics

Softmax & sampling

The final step of every forward pass produces a probability distribution — not just "the next word is X" but a ranked list across all ~100,000 tokens in the vocabulary.

The temperature parameter controls this distribution. At temperature 0, you always pick the highest-probability token (deterministic, repetitive). At temperature 1, you sample proportionally (creative, varied). At temperature 2, the distribution flattens (chaotic).

Top-p (nucleus) sampling is a refinement that samples only from the smallest set of tokens whose cumulative probability exceeds a threshold p — avoiding very low-probability tokens regardless of temperature.

Key concepts Softmax function Temperature Top-p sampling Top-k sampling Perplexity

Sources

Holtzman et al. 2020 — The Curious Case of Neural Text Degeneration Wikipedia — Softmax function

Self-attention mechanism

Query / Key / Value

The key innovation of the Transformer (Vaswani et al., 2017). Self-attention asks: for every token, which other tokens in the input should I pay attention to when building my understanding of this token?

It computes three vectors per token: a Query (what am I looking for?), a Key (what do I offer?), and a Value (what information do I carry?). The attention score between any two tokens is the dot product of their Q and K vectors, scaled and passed through softmax to produce weights. Those weights determine how much each token's Value contributes to the output.

This is computed in parallel across all tokens simultaneously — far faster than older recurrent (LSTM) models that processed one token at a time. Modern models use multi-head attention, running this process in parallel across many independent heads.

Key concepts Multi-head attention Query / Key / Value Attention weights Scaled dot-product Positional encoding

Sources

Vaswani et al. 2017 — Attention Is All You Need Jay Alammar — The Illustrated Transformer Anthropic — Transformer circuits

How LLMs are Trained

Training

Training a frontier model is one of the most resource-intensive activities in computing. It happens in sequential stages: data collection → pre-training → instruction tuning → alignment. Training Meta's Llama 3 70B consumed approximately 6.4 million GPU-hours on H100 chips — roughly $30–50M AUD for a single run.

Data collection & cleaning

Web crawl, books, code

Models are trained on massive text corpora: web crawls (Common Crawl is the most used source), books (Books3, Project Gutenberg), Wikipedia, GitHub code repositories, scientific papers (ArXiv), and more. Meta's Llama 3 was trained on 15 trillion tokens. OpenAI has never disclosed GPT-4's exact data volume.

The raw data is enormous but messy — full of spam, duplicate content, low-quality pages, and toxic material. Significant engineering effort goes into filtering, deduplication, and quality scoring before training begins. The quality of training data arguably matters as much as the architecture itself.

Key concepts Common Crawl The Pile RedPajama FineWeb C4 dataset

Sources

Penedo et al. 2024 — FineWeb dataset Gao et al. 2020 — The Pile Meta — Llama 3 Model Card

Pre-training

Next token prediction

The foundational and most expensive stage. The model starts with random parameters (essentially noise) and learns by reading text, trying to predict the next token, comparing its prediction to the actual token, calculating how wrong it was (the "loss"), and adjusting all parameters slightly to be less wrong. This process repeats billions of times.

The result is a base model — sometimes called a foundation model — that deeply understands language but is strange to talk to. If you ask "What is the capital of France?", it might continue your text as if it's a quiz, not a question. Base models need further stages to become useful assistants.

Key concepts Causal language modelling Next token prediction Foundation models Base models Training loss

Sources

Brown et al. 2020 — GPT-3 paper Karpathy — Let's build GPT from scratch

Instruction tuning (SFT)

Supervised fine-tuning

After pre-training, supervised fine-tuning (SFT) trains the model on thousands to millions of examples of the desired input-output format: human message → assistant reply → human message → assistant reply.

This is where a base model becomes a usable assistant — it learns the conversational format, how to respond helpfully, and basic safety behaviours. SFT is relatively cheap compared to pre-training. Every major model (GPT, Claude, Gemini, Llama-Instruct) goes through this stage. The quality and diversity of the SFT dataset heavily influences the model's personality and instruction-following ability.

Key concepts Conversational format FLAN dataset Alpaca ShareGPT Open Hermes

Sources

Wei et al. 2022 — Finetuned Language Models are Zero-Shot Learners Taori et al. 2023 — Stanford Alpaca

Reinforcement Learning from Human Feedback

Reward model, PPO

The step that defines modern chatbots. Human raters are shown multiple responses to the same prompt and rank them from best to worst. These rankings train a separate reward model that learns to predict human preferences.

The main LLM is then fine-tuned using reinforcement learning (specifically PPO — Proximal Policy Optimisation) to generate responses the reward model scores highly. This gives models their tendency to be helpful, to decline harmful requests, and to structure answers in particular ways.

Anthropic's Constitutional AI is a variation where the model is also evaluated against a written set of principles — a "constitution" — and the model itself uses these to self-critique and revise responses before human raters see them.

Key concepts Reward model PPO optimiser Constitutional AI Human preference data Proximal Policy Optimisation

Sources

Ouyang et al. 2022 — Training language models to follow instructions Bai et al. 2022 — Constitutional AI (Anthropic)

DPO — Direct Preference Optimisation

Direct preference optimisation

A more recent and efficient alternative to RLHF that achieves similar alignment results without needing a separate reward model. DPO directly optimises the language model on preference pairs — shown a prompt and a preferred vs rejected response — using a mathematically elegant reformulation that treats the LLM itself as the implicit reward model.

Many open-source models use DPO: Mistral models, Zephyr, many Llama fine-tunes, and Tulu. Results are comparable to RLHF for most tasks at significantly lower training cost, making it popular in the research community and for smaller labs that can't afford full RLHF infrastructure.

Key concepts Preference pairs Implicit reward model Zephyr model Tulu ORPO variant

Sources

Rafailov et al. 2023 — Direct Preference Optimization paper Tunstall et al. 2023 — Zephyr (DPO in practice)

Synthetic training data

AI-generated training

Models like Claude, GPT-4, and Gemini are now partially trained on data generated by other AI models. Meta used Llama 3 to help generate instruction-following training data for Llama 3. Microsoft's Phi series (Phi-1, Phi-2, Phi-3) is almost entirely trained on high-quality synthetic data generated by GPT-4 — and achieves remarkable performance for its tiny size.

This raises interesting questions: can AI quality improve recursively? Research suggests it can — up to a point — but errors and biases compound over generations if synthetic data isn't carefully curated and filtered. It also enables labs to generate specific kinds of training data that are rare or expensive to collect naturally.

Key concepts Phi-3 (Microsoft) Orca methodology WizardLM Self-Instruct Magpie dataset

Sources

Abdin et al. 2024 — Phi-3 Technical Report Mukherjee et al. 2023 — Orca paper

Model Architectures & Families

Architectures

Not all LLMs are built the same way. The Transformer (2017) is the universal foundation — but within that, there are major design variants. The AI landscape is also divided between closed models (GPT-4, Claude, Gemini — weights never released) and open weight models (Llama, Mistral, Qwen, DeepSeek — weights publicly available).

Dense transformer

GPT-2, GPT-3 style

The classic architecture — every parameter is activated for every token processed. Simple, well-understood, and the foundation of early large models. GPT-2 (2019) and the original GPT-3 (2020) were dense transformers.

The limitation: at very large scales, running all parameters for every token becomes prohibitively expensive. A 175B dense model must activate all 175B parameters to process a single token. This drove research into more efficient architectures like Mixture of Experts, where only a fraction of parameters activate per token. Dense models are still widely used for smaller scales (7B–13B) where efficiency is less critical.

Examples GPT-2 BERT T5 original GPT-3 LLaMA 1 Falcon

Sources

Radford et al. 2019 — GPT-2 paper Brown et al. 2020 — GPT-3 paper

Mixture of Experts (MoE)

Routing & efficiency

Now dominant at the frontier. Instead of one large neural network, MoE models have many smaller "expert" networks and a router that decides which 2–4 experts should handle each token. You get the total capacity of a very large model while only activating a fraction of parameters per token.

Mixtral 8x7B (Mistral AI) has 46.7B total parameters but only activates ~12.9B per token — similar inference cost to a 13B dense model but with much higher knowledge capacity. GPT-4 is widely believed to be an MoE model (never confirmed by OpenAI). DeepSeek V3 uses MoE with 671B total but only ~37B active parameters per token.

Examples Mixtral 8x7B GPT-4 (believed MoE) Qwen 3.5 DeepSeek V3 Switch Transformer

Sources

Jiang et al. 2024 — Mixtral of Experts paper Fedus et al. 2021 — Switch Transformer

Reasoning models

o1, o3, DeepSeek R1

A different paradigm: rather than generating an answer immediately, the model produces a long internal "chain of thought" — working through the problem step by step before giving its final answer. This dramatically improves performance on mathematics, logic, and complex coding problems at the cost of being slower and more expensive to run.

OpenAI's o1 (September 2024) and o3 were the first widely deployed reasoning models. DeepSeek R1 (January 2025) shocked the industry by matching o1's benchmark performance at a fraction of the training cost — reportedly ~$6M vs hundreds of millions.

Examples OpenAI o1 OpenAI o3 DeepSeek R1 Gemini Thinking QwQ-32B

Sources

OpenAI — o1 System Card DeepSeek — DeepSeek-R1 Technical Report

OpenAI model family

GPT-5 family

OpenAI's current generation is the GPT-5 family. GPT-5.5 is the current flagship. GPT-5.4 covers most applications — in Pro, Thinking, mini, and nano variants. GPT-5.5 Instant became the default for all ChatGPT users (including free tier) on May 5, 2026.

Reasoning capability — previously a separate "o-series" of models (o1, o3) — is now integrated into GPT-5.4 Thinking mode. The o-series and entire GPT-4 family (GPT-4o, GPT-4.1) were retired from ChatGPT in early-to-mid 2026. Both were important milestones: GPT-4o pioneered native multimodality; o1 and o3 established the reasoning paradigm now standard across the industry.

Models GPT-5.5 GPT-5.4 GPT-5.5 Instant ChatGPT OpenAI API DALL-E 3

Sources

OpenAI — Introducing GPT-5.5 OpenAI — Models overview

Anthropic — Claude models

Constitutional AI

Anthropic was co-founded in 2021 by former OpenAI researchers with a specific focus on AI safety research. Their models are known for strong writing, nuanced instruction following, and strong coding ability.

Constitutional AI is Anthropic's key differentiator: rather than purely human feedback, Claude is also trained against a written set of principles — a "constitution" — that the model itself uses to self-critique and revise its own responses before human raters evaluate them. This allows more scalable and principled alignment.

Current lineup (mid-2026): Claude Opus 4.8 is the most capable model currently available to the public — best suited for complex reasoning and long-horizon agentic tasks. Claude Sonnet 4.6 is the recommended default for most use cases. Claude Haiku 4.5 is optimised for speed and cost at scale. Fable 5 is Anthropic's most capable model overall, but has been suspended for all users since June 12, 2026 under a US export-control directive and is not currently accessible.

Models Fable 5 Claude Opus 4.8 Claude Sonnet 4.6 Claude Haiku 4.5 Constitutional AI claude.ai Anthropic API

Sources

Bai et al. 2022 — Constitutional AI paper Anthropic — Claude Model Card

Google DeepMind — Gemini

Gemini 3.x family

Google's frontier model family, built natively multimodal from the ground up — text, images, audio, and video are treated as equal inputs rather than text being primary with vision bolted on as a separate module.

Gemini 3.5 Flash (May 2026) is the newest model — optimised for agentic tasks and faster and cheaper than the Pro tier while matching or exceeding it on many benchmarks. Gemini 3.1 Pro is the advanced high-reasoning flagship. Google's extended context window — reaching into the millions of tokens — remains a key differentiator. Gemini powers Google Search AI Overviews and deep Workspace integrations.

Models Gemini 3.5 Flash Gemini 3.1 Pro 1M token context Google AI Studio Vertex AI Gemini API

Sources

Google DeepMind — Gemini 3.5 Flash Google AI — Gemini models overview

Open weight models

Llama · Mistral · DeepSeek

Meta Llama 3 — fully open weights. Llama 3.3 70B is exceptional for its size and highly competitive with closed models. Being open enables the entire Ollama ecosystem.

Mistral AI (France) — punches above its weight. Mistral 7B outperformed much larger models at launch (2023). Mixtral 8x7B is a landmark open MoE model.

DeepSeek (China) — shocked the world with R1 in January 2025, matching OpenAI o1 for ~$6M training cost. DeepSeek V3 is an exceptional open MoE model. Qwen (Alibaba) and Gemma (Google) round out the major open families.

Models Llama 3.3 70B Mistral Large Mixtral 8x7B DeepSeek V3 Qwen 3.5 Gemma 3 Phi-4

Sources

Meta — Llama 3 GitHub Mistral AI — Model releases DeepSeek — DeepSeek-V3 Technical Report

Prompting Techniques

Prompting

Prompting is often underrated as a skill, but technique has a massive effect on output quality — arguably more impact than switching between similar-sized models. The model has the capability; your job is to activate it precisely. Key principles: be specific about context and goal; specify format explicitly; give the model a role; and use examples.

Chain of thought prompting

Step-by-step reasoning

One of the most powerful prompting techniques. Adding "let's think step by step" or "think through this carefully before answering" to a prompt measurably improves performance on reasoning tasks, even in standard (non-reasoning) models.

You force the model to generate intermediate reasoning tokens before its conclusion — those intermediate tokens become part of its context, influencing the final answer. Works particularly well for maths, logic, multi-step planning, and complex decision-making.

Reasoning models (OpenAI o1, DeepSeek R1) essentially do chain-of-thought automatically and at greater depth, running hundreds or thousands of reasoning tokens internally before producing their visible response.

Key concepts Zero-shot CoT Tree of Thoughts Self-consistency ReAct prompting Scratchpad

Sources

Wei et al. 2022 — Chain-of-Thought Prompting paper Yao et al. 2023 — Tree of Thoughts paper

Few-shot prompting

Examples in context

Providing examples before your actual request so the model extrapolates the pattern. Rather than "classify this email: [email]", you provide two or three labelled examples first, then your actual input.

Especially powerful for: specific output formats, classification tasks, style matching, and any task where showing is clearer than telling. The model doesn't need to be retrained — it pattern-matches from examples in its context window alone. This is called in-context learning, and it was one of the surprising capabilities that emerged from large-scale pre-training.

Key concepts Zero-shot (no examples) One-shot (1 example) Few-shot (2–5 examples) In-context learning Format matching

Sources

Brown et al. 2020 — GPT-3 and few-shot learning Min et al. 2022 — Rethinking Role of Demonstrations

System prompts

Frames model behaviour

The invisible instructions that frame the entire conversation. Every commercial AI product — Claude.ai, ChatGPT, Gemini, Copilot — has a system prompt you don't see that shapes how the model behaves: its persona, what it will and won't do, its output style, and its focus area.

When you use the Ollama API, OpenAI API, or Anthropic API directly, you control this fully via the "system" message role. A well-designed system prompt can transform a general model into a focused specialist. "You are a senior contract lawyer reviewing clauses for liability" genuinely shifts outputs toward that expertise domain by activating relevant training patterns.

Key concepts OpenAI API "system" role Ollama Modelfile Anthropic system param Persona assignment Role prompting

Sources

OpenAI — Prompt Engineering Guide Anthropic — System prompts guide

RAG — Retrieval Augmented Generation

Retrieval augmented generation

Rather than asking the model to recall information from training (unreliable for specific facts), RAG retrieves relevant documents first and injects them into the context window: "Here are relevant sections from these documents. Based only on this, answer: [question]."

This is how Perplexity, Microsoft Copilot in Word, and many document tools work. It dramatically reduces hallucination for factual tasks and allows the model to answer questions about content it was never trained on — your private documents, recent news, internal company data.

The key components are: a vector database (to store document chunks as embeddings), an embedding model (to convert query and chunks to vectors for similarity matching), and a retrieval step (to find the most relevant chunks before calling the LLM).

Key concepts Vector database LlamaIndex LangChain Pinecone pgvector Chroma

Sources

Lewis et al. 2020 — Retrieval-Augmented Generation paper LlamaIndex documentation

Context window

~1M tokens at the frontier

The maximum amount of text an LLM can "see" at once — its working memory for a single conversation or task. Everything outside the context window is invisible to the model. This is why LLMs have no memory between conversations unless memory is explicitly managed.

Context windows have grown dramatically and frontier models have now largely converged near 1 million tokens: GPT-5.4 supports up to ~1M tokens, Claude Opus 4.8 and Sonnet 4.6 offer 1M tokens via the API, and Gemini 3.1 Pro has a 1,048,576-token context window. The constraint that once sharply distinguished models — 128K vs 200K vs 1M — has largely dissolved at the frontier.

1 token ≈ 0.75 words, so 1M tokens ≈ several large novels or an entire mid-sized codebase in a single prompt. However, model quality degrades in the middle of very long contexts — the "lost in the middle" problem — so raw context size is not the only factor.

Key facts GPT-5.4: ~1M Claude 4.x: 1M API Gemini 3.1 Pro: 1M Lost in the middle

Sources

Liu et al. 2023 — Lost in the Middle paper Gemini 3.1 Pro — Model Card (Google DeepMind) Claude context windows — Anthropic docs

Prompt injection attacks

Security & safety

A security risk in AI-powered systems. If an LLM processes untrusted external content — emails, web pages, documents, database entries — that content can contain hidden instructions designed to hijack the model's behaviour.

Indirect prompt injection is particularly dangerous: the attack payload is hidden in content the model retrieves autonomously — a web page the agent visits, a PDF it reads — not in the user's direct input. This is a genuine attack vector in production AI systems, distinct from jailbreaking.

Why it's hard to fix: the model fundamentally cannot distinguish between trusted instructions and untrusted injected data if both appear in the same context window as text.

Key concepts Indirect injection Direct jailbreak Data exfiltration Agent hijacking LLM firewall

Sources

Greshake et al. 2023 — Not What You've Signed Up For OWASP — LLM Top 10 Security Risks

Broader Themes

Context

Open vs closed models

Weights, access, safety

Closed models (GPT-4, Claude, Gemini) — the weights are never released. You can only access them via API at per-token cost. Labs argue this is necessary for safety and commercial sustainability.

Open weights (Llama 3, Mistral, Qwen, DeepSeek) — the actual model file is publicly downloadable. Anyone can run, inspect, study, or fine-tune it. This is what makes Ollama and local AI possible. "Open weights" is subtly different from "open source" — Meta releases Llama's weights but not always all training code or data.

Fully open source (rare) — weights + training code + training data all released. EleutherAI's models and Pythia qualify. Enables full scientific reproducibility.

Examples Meta Llama (open weights) Mistral (open weights) EleutherAI GPT-NeoX Hugging Face AI2 OLMo

Sources

Bommasani et al. 2023 — Considerations for open foundation models EleutherAI — Open source LLM work

Safety & alignment

Hallucination, bias, risk

Alignment is the problem of ensuring AI systems do what humans actually want, not just what they were literally instructed to do at training time. Safety is the broader challenge of ensuring AI systems don't cause harm as they become more capable.

Hallucination — where models generate plausible-sounding but false information — is the most common practical safety concern. It occurs because models are pattern-matchers, not fact-databases. RAG, grounding with tool use, and output verification all help.

Bias — models can reflect and amplify biases present in training data. This is an active area of research and the subject of significant regulatory attention globally. Anthropic was founded specifically around AI safety research and their "responsible scaling policy" commits to safety evaluations before deploying more capable models.

Key concepts Constitutional AI RLHF Red-teaming Scalable oversight Mechanistic interpretability

Sources

Anthropic — Core Views on AI Safety Bender et al. 2021 — Stochastic Parrots paper Bowman et al. 2022 — Measuring Progress on Scalable Oversight

Hardware & infrastructure

GPUs, CUDA, H100

LLMs run on GPUs — Graphics Processing Units — because their architecture is optimised for the massively parallel matrix multiplications that neural networks require. NVIDIA dominates the training hardware market, with the H100 being the current standard chip for frontier model training. Each H100 costs around $30,000 USD and frontier training runs use tens of thousands of them.

CUDA is NVIDIA's programming platform that makes GPUs accessible to AI frameworks like PyTorch. For local inference, llama.cpp enables running quantised models on consumer hardware — including Apple Silicon Macs via the Metal Performance Shaders (MPS) backend. Quantisation reduces model precision from 32-bit to 4-bit or 8-bit floats, reducing memory requirements ~8x with modest quality loss.

Key concepts NVIDIA H100 A100 CUDA platform Google TPU llama.cpp GGUF format Apple Metal (MPS)

Sources

NVIDIA — H100 GPU Overview llama.cpp GitHub

Who Builds LLMs?

Roles

Building a large language model requires a cross-disciplinary team. Each role tackles a distinct part of the pipeline — from raw data to the finished product people actually use.

Data Scientist

the information wrangler

An LLM is only as smart as the data it reads. Data Scientists are responsible for gathering, cleaning, and organizing the trillions of words of text that the AI will learn from. They make sure the "diet" of the AI is high-quality and diverse.

ML Engineer

the architect

If Data Scientists provide the fuel, Machine Learning (ML) Engineers build the engine. They write the complex mathematics and code, set up the neural networks, and run the massive supercomputers required to actually "train" the AI model to understand language.

AI Ethicist & Safety Researcher

the guardrail designer

Because AI learns from the human internet, it can easily pick up human biases or generate harmful content. Ethicists and Safety Researchers test the models for weaknesses, build safety filters, and ensure the AI behaves responsibly before it is released to the public.

Software Engineer

the app builder

If the ML Engineer builds the engine, the Software Engineer builds the steering wheel, the dashboard, and the chassis. They take the raw AI model and wrap it into a usable app or website so everyday people can actually interact with the technology easily.

Hardware Engineer

the silicon architect

AI requires mind-boggling amounts of physical computing power. Hardware Engineers design the specialized computer chips (like GPUs) and build the massive, stadium-sized data centers that provide the electricity and processing muscle needed to run these enormous models.

Everything you need to understand Large Language Models

What is a Large Language Model?

Large Language Models

The Mathematics of LLMs

Linear algebra

Calculus & gradient descent

Probability & statistics

Self-attention mechanism

How LLMs are Trained

Data collection & cleaning

Pre-training

Instruction tuning (SFT)

Reinforcement Learning from Human Feedback

DPO — Direct Preference Optimisation

Synthetic training data

Model Architectures & Families

Dense transformer

Mixture of Experts (MoE)

Reasoning models

OpenAI model family

Anthropic — Claude models

Google DeepMind — Gemini

Open weight models

Prompting Techniques

Chain of thought prompting

Few-shot prompting

System prompts

RAG — Retrieval Augmented Generation

Context window

Prompt injection attacks

Broader Themes

Open vs closed models

Safety & alignment

Hardware & infrastructure

Who Builds LLMs?

Data Scientist

ML Engineer

AI Ethicist & Safety Researcher

Software Engineer

Hardware Engineer

Explore the interactive mind map