Series

How LLMs Work: A Reading Series

A curated reading path from LLM fundamentals to production agents — covering weights, attention, tool use, and the emerging agent protocol stack.

  1. 1 The Grammar of LLM Special Tokens Special tokens like <|im_start|> and <|im_end|> are the invisible structural grammar of chat LLMs -- atomic vocabulary entries that impose conversational structure on a next-token predictor.
  2. 2 Function Calling Internals: Grammars and Constrained Sampling LLM function calls produce valid JSON not because the model is perfectly reliable, but because a grammar engine masks invalid tokens at every sampling step.
  3. 3 Why Built-in Tools Outperform Function Tools in LLMs Built-in tools like code_execution are in-distribution -- the model was trained on their exact invocation patterns -- while custom function tools force the model to generalize from a schema it has never seen.
  4. 4 How LLMs Keep Built-in and Function Tools From Colliding What stops you from naming a function tool 'code_interpreter'? Nothing -- and it still won't collide with the built-in one. The answer is namespace separation, from Harmony tokens to the Responses API.
  5. 5 The Simplest Agent Loop Strip away the frameworks and an AI agent is just a while loop — with the LLM deciding when to stop.
  6. 6 Composing MCP Tools with TypeScript mcp-compose lets LLMs chain multiple MCP tools in a single TypeScript snippet executed in a sandboxed runtime, keeping intermediate data out of the context window and saving tokens.
  7. 7 Skills vs. MCP: How Context Gets to the Model MCP tools land in the model's context as a flat, static schema at every ReACT iteration. Skills use a three-tier progressive disclosure strategy that keeps context lean until the capability is actually needed.