Series

How LLMs Work: A Reading Series

A curated reading path from LLM fundamentals to production agents — covering weights, attention, tool use, and the emerging agent protocol stack.

  1. 1 microGPT from First Principles: 200 Lines That Explain LLMs A line-by-line walkthrough of Karpathy's 200-line GPT implementation, with ASCII diagrams covering embeddings, attention, backpropagation, and the Adam optimizer -- the same algorithm powering ChatGPT and Claude, just at toy scale.
  2. 2 The Grammar of LLM Special Tokens Special tokens like <|im_start|> and <|im_end|> are the invisible structural grammar of chat LLMs -- atomic vocabulary entries that impose conversational structure on a next-token predictor.
  3. 3 Post-Training vs. In-Context Learning How large language models adapt to new tasks through two fundamentally different mechanisms: permanently updating weights via post-training, or steering behavior at inference time through in-context learning.
  4. 4 Induction Heads: The Circuit Behind In-Context Learning How Anthropic researchers discovered a two-head attention circuit that explains why language models can learn from examples in their context window — and what it reveals about the structure of model intelligence.
  5. 5 Fine-Tuning LLMs: What Happens to the Weights Fine-tuning modifies a model's weights to specialize it for a task, but how those weights change varies dramatically — from updating every parameter to injecting tiny low-rank matrices that can be hot-swapped at inference time.
  6. 6 Function Calling Internals: Grammars and Constrained Sampling LLM function calls produce valid JSON not because the model is perfectly reliable, but because a grammar engine masks invalid tokens at every sampling step.
  7. 7 Why Built-in Tools Outperform Function Tools in LLMs Built-in tools like code_execution are in-distribution -- the model was trained on their exact invocation patterns -- while custom function tools force the model to generalize from a schema it has never seen.
  8. 8 How LLMs Keep Built-in and Function Tools From Colliding What stops you from naming a function tool 'code_interpreter'? Nothing -- and it still won't collide with the built-in one. The answer is namespace separation, from Harmony tokens to the Responses API.
  9. 9 The Tool Invocation Gap: From ChatML to the Responses API Tool execution is moving inward -- from client-side function calls into provider-managed infrastructure -- creating a widening gap between what built-in and custom tools can do.
  10. 10 Side-Effects, All the Way Up Functional programming didn't teach us to eliminate side-effects — it taught us to make them explicit. Now LLMs with tool access are forcing the same lesson, at a higher level.
  11. 11 The Simplest Agent Loop Strip away the frameworks and an AI agent is just a while loop — with the LLM deciding when to stop.
  12. 12 Why Agents Hallucinate Tool Calls (and How to Stop It) Tool call hallucination isn't random noise — it's the model activating trained patterns that don't match your actual tool list. Understanding the mechanism makes it debuggable.
  13. 13 SDKs, Frameworks, Agents: Pick Your Tier A clear three-tier mental model for the AI tooling landscape: API SDKs call models, multi-agent frameworks coordinate them, and coding agents do the engineering work autonomously.
  14. 14 Composing MCP Tools with TypeScript mcp-compose lets LLMs chain multiple MCP tools in a single TypeScript snippet executed in a sandboxed runtime, keeping intermediate data out of the context window and saving tokens.
  15. 15 Skills vs. MCP: How Context Gets to the Model MCP tools land in the model's context as a flat, static schema at every ReACT iteration. Skills use a three-tier progressive disclosure strategy that keeps context lean until the capability is actually needed.
  16. 16 MCP, A2A, Skills, Toolbox: Where Agent Protocols Are Converging MCP handles how agents connect to tools; A2A handles how agents connect to each other. Skills and Toolbox fill the gaps above and below — and under the Linux Foundation, these layers are settling into a coherent stack.