Language Model
Transform, generate, or reason about text. The core building block for chatbots, summarization, translation, and more.
How Large Language Models Work
A technical deep-dive into transformer-based language models. How text becomes tokens, tokens become vectors, and vectors predict the next word.
Tokenization: Text to Numbers
LLMs don't see text as characters. They see tokens - subword units learned from training data.
How BPE (Byte Pair Encoding) Works
Embeddings: Tokens to Vectors
Each token ID is mapped to a learned vector (typically 4096-12288 dimensions). These vectors capture semantic meaning - similar words have similar vectors.
Token Embedding Lookup
Semantic Similarity
Attention: How Tokens Relate
The key innovation of transformers. Each token can "attend" to every other token, learning which words are relevant for understanding each position.
"sat" attends to:
The Q-K-V Mechanism
Next Token Prediction
The core task of language models. Given context, predict the probability distribution over all possible next tokens.
The capital of France is|Sampling Strategies
Model Size Comparison
| Model | Parameters | Layers | Context |
|---|---|---|---|
| GPT-2 | 1.5B | 48 | 1K |
| GPT-3 | 175B | 96 | 4K |
| GPT-4 | ~1.8T* | ~120* | 128K |
| Llama 3.1 70B | 70B | 80 | 128K |
| Claude 3.5 | ~? | ~? | 200K |
* Estimated values for proprietary models
The Complete Pipeline
The model generates text by repeatedly predicting the next token, appending it to the context, and predicting again. This autoregressive process continues until a stop condition is met.
Use Cases
- ✓Chatbots and assistants
- ✓Text summarization
- ✓Translation
- ✓Content generation
- ✓Question answering
Architectural Patterns
Direct LLM Generation
Pass input to an LLM with appropriate prompting.
- +Simple
- +Flexible
- +Handles many tasks
- -May hallucinate
- -Limited to training knowledge
RAG (Retrieval-Augmented Generation)
Retrieve relevant context, then generate with LLM.
- +Grounded in data
- +Up-to-date
- +Citable
- -Retrieval quality matters
- -More complex pipeline
Agent with Tools
LLM that can call external tools, APIs, and functions.
- +Can take actions
- +Access real-time data
- -Complex error handling
- -Security considerations
Implementations
API Services
GPT-4o
OpenAITop-tier reasoning. Good balance of speed and quality.
Claude 3.5 Sonnet
AnthropicExcellent for long context and code. Strong reasoning.
Gemini 1.5 Pro
Google1M token context. Good for very long documents.
Mistral Large
MistralStrong European option. Good for function calling.
Open Source
Llama 3.1 405B
Llama 3.1 CommunityBest open-source. Requires significant compute.
Benchmarks
Quick Facts
- Input
- Text
- Output
- Text
- Implementations
- 2 open source, 4 API
- Patterns
- 3 approaches