Skills vs. MCP: How Context Gets to the Model

When you connect multiple MCP servers to a coding agent like Claude Code, something specific happens to the model’s context window at every step of the reasoning loop. All tool schemas, from every server, are presented simultaneously as a flat list. The model must parse them at inference time, weigh them for relevance, and decide which – if any – to invoke. Add enough servers and the tool list starts to crowd out the actual task.

Skills work differently. They load in tiers: descriptions first, full instructions only when invoked, supporting resources only when explicitly requested. The two mechanisms are solving different problems, but understanding how they diverge at the token level clarifies when to reach for each one.

What Happens at Each ReACT Iteration

The ReACT (Reason + Act) loop is the core of how agentic LLMs operate: the model thinks, decides on an action, observes the result, thinks again. At each iteration, the model receives the full accumulated context – conversation history, tool results, system instructions – and generates its next move.

Tool-calling ability isn’t an emergent property of pretraining. It’s taught during post-training with special tokens and structured formats, so the model learns specific patterns for reading schemas and generating valid invocations. The key point: at every single iteration, the model pays attention cost to everything in context – including the complete tool list, whether or not those tools are relevant to the current step.

With MCP, that tool list is always full. With skills, it starts minimal and expands on demand.

MCP: A Flat, Static Schema

The Model Context Protocol is a transport and discovery protocol. When Claude Code connects to an MCP server, it calls tools/list to discover available tools. The server returns structured schemas:

[
  {
    "name": "read_file",
    "description": "Read the complete contents of a file from the file system.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "path": { "type": "string", "description": "Absolute path to the file" }
      },
      "required": ["path"]
    }
  },
  {
    "name": "write_file",
    "description": "Write content to a file, creating it if it doesn't exist.",
    "inputSchema": { ... }
  }
]

Connect three MCP servers – a filesystem server, a GitHub server, and a database server – and the model receives maybe 40 tool schemas at every iteration. The namespace separation keeps them from colliding (each lives under its server name in the function namespace), but all 40 are present simultaneously.

This is by design. MCP is explicitly a connectivity protocol: its job is to expose capabilities from external systems in a structured, discoverable way. It does that job well. The tradeoff is that the full capability surface is always visible, always consuming context.

Research from Anthropic’s engineering team showed that with 50+ function tools, accuracy drops to 49%. The attention cost of parsing many schemas degrades the model’s ability to reason about any of them well. Anthropic’s solution – a “tool search” mechanism that retrieves relevant tools at query time – is essentially retrofitting progressive disclosure onto MCP.

MCP Composition: A Partial Workaround

The schema-and-data-bloat problem isn’t new, and there are approaches that work within MCP’s constraints.

mcp-compose addresses a real cost: when a task requires chaining multiple tools, intermediate data flows through the model’s context even though the model doesn’t need to reason about it. A getDoc → emailDoc chain means the full document body hits the context window between calls. This problem isn’t unique to MCP — skills that read large files and process them have the same issue. mcp-compose sidesteps it by having the model write a TypeScript snippet instead — the runtime executes it in a sandbox, and only the final result returns. Skills address it differently: context: fork runs the skill in an isolated subagent, so intermediate tool results never reach the main session’s context. Both mechanisms isolate execution; they just do it at different levels — mcp-compose at the tool composition layer, context: fork at the subagent layer.

It also compresses the schema surface. Rather than exposing all tools from all connected servers, mcp-compose exposes exactly two: compose (accepts TypeScript, runs it) and listAvailableTools (returns typed signatures for what’s available). The model’s visible schema shrinks from N to 2.

But listAvailableTools is not progressive disclosure. With skills, the model passively knows what capabilities exist from the always-in-context description — no tool call required. With mcp-compose, the model must proactively invoke listAvailableTools to discover what inner tools are available. That’s a full ReACT iteration just to learn what you can do:

Tier	Skills	mcp-compose
Always in context	name + description	2 stub schemas
On invocation	full instructions	full typed signatures (via tool call)
On explicit request	reference files	—

The deeper constraint is the MCP protocol itself. The tools/list specification requires a complete, synchronous response: every entry must include name, description, and inputSchema. There’s no provision for stub entries that expand on demand, no lazy schema endpoint, no mechanism to mark a tool as “description only until invoked.” You can compress N tools to 2 — as mcp-compose does — but you cannot implement true progressive disclosure without changing the protocol.

Skills: Three-Tier Progressive Disclosure

Skills are defined as markdown files with YAML front matter. The metadata and body are loaded separately, at different points in the interaction.

Tier 1: Always in context. The skill’s name and description fields are loaded at session start. These are short – a sentence or two per skill. Regardless of how many skills are available, the context cost is proportional only to the number of skill descriptions, not their full instruction bodies.

Tier 2: Loaded on invocation. When a skill is actually used (either the user types /skill-name or the model calls the Skill tool), the full SKILL.md body is injected into context. This is where the actual instructions live – the procedures, the examples, the decision logic. It only appears when it’s needed.

Tier 3: Loaded on explicit request. Skills can bundle supporting files in a references/ subdirectory, example scripts, templates. These are never auto-injected. If the model determines it needs the detailed patterns in references/advanced.md, it reads that file explicitly. The content enters context only at that point.

The contrast with MCP is stark. An MCP server exposes all its tools immediately and completely. A skill exposes a description immediately, its instructions when invoked, and its supporting resources only when the model reaches for them.

The Skill Front Matter

The YAML front matter of a SKILL.md file controls more than just the skill’s name. Here are the available fields:

---
name: deploy-preview           # Kebab-case identifier, max 64 chars
                               # Defaults to directory name if omitted
description: |
  Deploy a preview environment and return the URL. Use this skill when
  the user asks to "preview", "stage", or "deploy to preview".
argument-hint: "[branch-name]" # Shown in autocomplete; hints at expected args
disable-model-invocation: true # If true, only user can invoke via /deploy-preview
                               # Claude will not attempt to call it autonomously
user-invocable: false          # If true, hides from the / menu; Claude-only
allowed-tools: Bash, Read      # Restrict which tools Claude may use in this skill
context: fork                  # Run in an isolated subagent context
agent: Explore                 # Which subagent type to use when context: fork
model: claude-opus-4-6         # Override the model for this skill's execution
hooks:                         # Skill-scoped hooks (PreInvoke, PostInvoke, etc.)
  PreInvoke:
    - matcher: ".*"
      hooks: [{ type: command, command: "echo starting" }]
---

A few fields are worth examining closely.

description controls when the skill is considered. The model reads descriptions at the start of each session to build an internal map of available capabilities. A vague description leads to the model never recognizing when the skill is relevant. A precise one – with specific trigger phrases like “use this skill when the user asks to ‘create X’ or ‘configure Y’” – acts as a learned routing signal.

disable-model-invocation and user-invocable give fine-grained control over who can trigger what. The default allows both user and model invocation. Setting disable-model-invocation: true is appropriate for operations with side effects – commits, deployments, messages sent to external services. The model can inform the user that this operation exists, but cannot execute it autonomously. Conversely, user-invocable: false creates skills that are pure background knowledge: the description appears in context so the model can use them, but they’re hidden from the slash-command menu since they’re not meant to be called by the user directly.

context: fork creates an isolated subagent. When set, the skill executes in a separate subagent with its own context. All intermediate tool results — file reads, command outputs, data fetched and transformed — stay inside the subagent. The main session only sees the final result. This is the skills-level answer to the same data-bloat problem that mcp-compose’s sandbox solves for tool composition chains.

Skills also support dynamic content injection using backtick commands in the file body:

---
name: pr-review
description: Review the current pull request for issues and improvements.
---

Current PR diff:
!`gh pr diff`

Open comments:
!`gh pr view --comments`

Review the above diff and comments...

The !command`` syntax runs the shell command at invocation time and injects its output into the skill body before the model sees it. The model gets a skill that arrives with live context already embedded.

Scope and Location

Skills are discovered based on where their SKILL.md files live:

Location	Path	Scope
Personal	`~/.claude/skills/<name>/SKILL.md`	All projects for this user
Project	`.claude/skills/<name>/SKILL.md`	This project only
Plugin	`<plugin>/skills/<name>/SKILL.md`	Namespaced as `plugin:skill-name`
Enterprise	Managed settings	Organization-wide, highest priority

When a skill name exists at multiple levels, enterprise beats personal beats project. Plugin skills never collide because they use a namespace prefix – /my-plugin:deploy rather than /deploy.

This layering allows teams to define shared project workflows in .claude/skills/ (committed to the repository), while individuals maintain personal utilities in ~/.claude/skills/. The project-level skills become part of the codebase, versioned and reviewable alongside the code itself.

The Model’s Perspective at Each Iteration

To make the comparison concrete, consider what the model sees at each ReACT step when using MCP versus skills to accomplish the same task: deploying a preview environment.

With MCP (GitHub MCP server + a custom deploy server):

At every iteration – whether the model is reading source code, fixing a bug, or deciding whether to deploy – the full schema for every available tool is present. create_pull_request, list_issues, get_repository, create_deployment, list_environments… all of them, all the time. The model must parse and reason about relevance for all of them at each step.

With a /deploy-preview skill:

At every iteration, the model sees one line: /deploy-preview: Deploy a preview environment and return the URL. Use this skill when.... That’s it. The deployment procedure – the steps, the flags, the error handling logic – isn’t loaded until the model actually invokes the skill. When it does, the instructions arrive in full, and the model executes them with complete context.

The difference compounds across a long agentic session. An agent making 50 reasoning steps while fixing a bug doesn’t need deployment knowledge at 49 of those steps. With MCP, it’s carrying that knowledge the whole time. With skills, it’s not.

What Each Mechanism Is For

This isn’t a competition. MCP and skills are designed for different layers of the problem.

MCP handles connectivity. Authentication to external systems, network transport, schema discovery, live data access. If you need to read from a database, query an API, or invoke a service that lives outside the agent, MCP is the right mechanism. The structured schema is essential here: the model needs to know the exact parameter types and required fields to invoke an external service correctly. And as mcp-compose demonstrates, you can compose multiple MCP tools into higher-level operations to keep intermediate data out of context — though this is a compression workaround, not a substitute for progressive disclosure.

Skills handle procedural knowledge. Multi-step workflows, team conventions, operational runbooks. If you want the agent to “follow the team’s PR process” or “deploy using our staging pipeline,” that knowledge doesn’t come from a schema – it comes from prose instructions that describe a sequence of actions. Skills bundle that prose with supporting scripts and references, and load it progressively so it only occupies context when actually in use.

The architecture that falls out naturally: use MCP to expose your external systems’ capabilities, use skills to encode your workflows that use those capabilities. MCP gives the model the verbs; skills give it the sentences.

Why This Matters for Context

Post-training shapes how models handle in-context information. The model’s ability to reason about tools degrades as the number of tools increases, because each additional schema competes for attention. This is the same fundamental constraint that makes in-context learning sensitive to prompt length and ordering – there’s a finite budget, and everything in context competes for it.

Skills’ progressive disclosure is a deliberate response to this constraint. By keeping descriptions short and deferring full instruction bodies until needed, a coding agent can have dozens of available skills without the context cost of exposing all of them simultaneously. The model knows they exist (from descriptions), but doesn’t pay the attention cost of reasoning about them until one becomes relevant.

This is a design pattern worth carrying into any agentic system you build: prefer deferred loading over static listing. Don’t give the model information it doesn’t need yet. The context window is finite, and every token that doesn’t contribute to the current reasoning step is a token taken from something that might.

Comments

Came here from LinkedIn or X? Join the conversation below — all discussion lives here.