SDKs, Frameworks, Agents: Pick Your Tier


The AI tooling landscape has fractured into a bewildering number of SDKs, frameworks, and agents – each claiming to be the right way to build with large language models. OpenAI has an API SDK and an Agents SDK. Anthropic has a Claude SDK and Claude Code. Google has a GenAI SDK and an Agent Development Kit. Microsoft merged Semantic Kernel and AutoGen into a single Agent Framework. Then there’s LangGraph, CrewAI, Cursor, Windsurf, Aider, Devin, and more arriving every week.

If you squint at all of this, a clear three-tier architecture emerges. Understanding these tiers – what each one does, where the boundaries are, and where they’re heading – is the key to cutting through the noise.

┌──────────────────────────────────────────────────────────────────┐
│                                                                  │
│  TIER 3: CODING AGENTS                                           │
│  Claude Code, GitHub Copilot CLI, Codex, Cursor, Devin, Aider    │
│  ── Autonomous systems that inhabit your dev environment ──      │
│                                                                  │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  TIER 2: MULTI-AGENT FRAMEWORKS                                  │
│  LangGraph, CrewAI, Microsoft Agent Framework, Google ADK        │
│  ── Orchestration layers for coordinating multiple agents ──     │
│                                                                  │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  TIER 1: API SDKs                                                │
│  OpenAI SDK, Anthropic SDK, Google GenAI SDK                     │
│  ── Client libraries for calling LLM APIs ──                     │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Each tier answers a fundamentally different question. Tier 1 asks: how do I call the model? Tier 2 asks: how do I coordinate multiple models? Tier 3 asks: what if the model just does the work?

Tier 1: API SDKs – The Foundation

An API SDK is a thin client library that wraps HTTP calls to a model provider’s inference endpoint. You send a prompt, you get a completion. Everything else – the application logic, the retry handling, the tool execution, the conversation state – is your responsibility.

The three major providers each ship official SDKs:

OpenAI offers client libraries in Python, TypeScript, Java, Go, and Ruby. The API is organized around Chat Completions (the workhorse endpoint) and the newer Responses API (which adds built-in tools like web search and code execution as first-class primitives). A typical call looks like this:

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(response.choices[0].message.content)

Anthropic provides Python and TypeScript clients for the Messages API. Key differentiators include extended thinking (where the model reasons before responding), a 1M token context window in beta, and computer use capabilities where the model can control a desktop environment.

Google ships the GenAI SDK across Python, TypeScript, Go, and Java. It reached general availability in May 2025 and supports the Gemini 3 model series, the Live API for real-time audio/video streaming, and grounding with Google Search and Google Maps.

What API SDKs Are Good At

API SDKs excel when you’re building applications where the AI is a component – not the whole system. A chatbot that answers customer questions. A content pipeline that summarizes articles. A search engine that generates embeddings. In these cases, you want precise control over every prompt, every parameter, and every retry. The SDK stays out of your way and gives you exactly that.

Where They Fall Short

The moment your task requires multiple steps – read a file, analyze it, write a response, check it, fix it – you’re on your own. The SDK gives you send prompt → get response. The loop, the state management, the error recovery, the decision about what to do next – all of that is application code you must write yourself. This is where frameworks enter the picture.

Tier 2: Multi-Agent Frameworks – The Orchestration Layer

Multi-agent frameworks exist because most real-world AI tasks require more than a single prompt-response cycle. They need an LLM to make decisions, invoke tools, inspect results, and decide what to do next – sometimes with multiple specialized agents collaborating on different parts of a problem. Frameworks provide the scaffolding for this coordination.

LangGraph

LangGraph, built by the LangChain team, models agent workflows as directed graphs. Nodes are agents, functions, or decision points. Edges define the flow of data. A central StateGraph maintains shared context across the entire execution.

┌────────────┐     ┌────────────┐     ┌────────────┐
│  Research  │────▶│  Analyze   │────▶│   Write    │
│   Agent    │     │   Agent    │     │   Agent    │
└────────────┘     └────────────┘     └────────────┘
       │                                     │
       │           ┌────────────┐            │
       └──────────▶│   Review   │◀───────────┘
                   │   Agent    │
                   └────────────┘

LangGraph 1.0 shipped in October 2025. Key features include conditional routing (edges can branch based on agent output), parallel execution with downstream merging, immutable state management, built-in persistence for cross-session memory, and human-in-the-loop approval workflows. LangChain reports approximately 400 companies in production and around 90 million monthly downloads.

CrewAI

CrewAI takes a different approach: role-based collaboration. Each agent is defined with a distinct role, goal, and backstory. A “Researcher” agent might gather information, a “Writer” agent might draft content, and an “Editor” agent might review it. The framework supports two architectures: Crews (autonomous teams where agents decide when to delegate) and Flows (event-driven pipelines for deterministic production workloads).

A distinctive feature is hierarchical process mode, which auto-generates a manager agent that delegates tasks, reviews outputs, and coordinates the team – mimicking how a human project manager operates. CrewAI ships with 100+ tools out of the box and sophisticated memory management (shared short-term, long-term, entity, and contextual memory). The project has accumulated over 20,000 GitHub stars.

Microsoft Agent Framework

In October 2025, Microsoft released the Agent Framework in public preview – a convergence of two previously separate projects. Semantic Kernel was Microsoft’s production-grade SDK for building AI applications with plugins and planners. AutoGen was a research project for dynamic multi-agent orchestration with an event-driven architecture. The Agent Framework merges both into a single open-source framework supporting Python and .NET, with general availability planned for Q1 2026.

Semantic Kernel v1.x and AutoGen both continue to receive critical bug fixes, but new feature development is concentrated in the Agent Framework. If you’re starting a new project in the Microsoft ecosystem, this is where you should land.

OpenAI Agents SDK

OpenAI’s entry into the framework tier arrived in March 2025 when they launched the Agents SDK as the production-ready successor to Swarm (an experimental, educational project from October 2024). The Agents SDK is deliberately minimal – three core primitives: Agents (LLMs with instructions and tools), Handoffs (delegation between agents), and Guardrails (input/output validation). It includes built-in tracing, prompt caching, and is designed to handle most agent workflows without additional abstractions.

Google Agent Development Kit (ADK)

Announced at Cloud NEXT in April 2025, ADK is a code-first, model-agnostic framework supporting Python, TypeScript, Go, and Java. It supports multi-agent orchestration with workflow agents (Sequential, Parallel, Loop), MCP tools, and Google’s Agent-to-Agent (A2A) protocol for cross-vendor agent coordination.

The Framework Landscape at a Glance

FrameworkOriginCore AbstractionLanguage SupportProduction Status
LangGraphLangChainDirected graphPython, JSGA (v1.0, Oct 2025)
CrewAIIndependentRole-based teamsPythonGA
Agent FrameworkMicrosoftMerged SK + AutoGenPython, .NETPreview (GA Q1 2026)
Agents SDKOpenAI3 primitivesPythonGA (Mar 2025)
ADKGoogleMulti-agent + A2APython, TS, Go, JavaGA

What Frameworks Are Good At

Frameworks excel at building AI-powered products: customer service systems with escalation logic, data analysis pipelines with multiple specialized agents, content generation workflows with review stages. They handle the coordination complexity that would be painful to build from scratch on top of raw API SDKs.

Where They Fall Short

Frameworks coordinate AI but they don’t apply it to real-world environments. A LangGraph agent can “plan” to fix a bug, but it can’t open your repository, read the stack trace, edit the file, run the tests, and verify the fix. That capability gap leads us to the third tier.

Tier 3: Coding Agents – AI That Does the Work

This is where the paradigm shift lives. Coding agents don’t help you build AI applications – they are AI applications that do software engineering work directly in your development environment. They read your codebase, write code, run tests, commit changes, and iterate on failures. The developer’s role shifts from writing code to reviewing code.

Claude Code

Claude Code is Anthropic’s agentic coding tool, available in the terminal, IDEs, desktop, and browser. It’s not a chatbot with code suggestions – it’s an autonomous system that operates directly in your development environment. It reads files, executes commands, modifies code, manages git workflows, and connects to external services via MCP.

What sets it apart architecturally is multi-agent orchestration: Claude Code can spawn specialized subagents for different parts of a task – an Explore agent for codebase analysis, a Plan agent for implementation design, a general-purpose agent for complex multi-step work – and coordinate them in parallel. It follows the Unix philosophy of composability: you can pipe it, run it in CI, or chain it with other tools via its SDK.

GitHub Copilot CLI

GitHub Copilot CLI reached general availability on February 25, 2026. Like Claude Code, it operates as an autonomous agent: it plans complex tasks, executes multi-step workflows, edits files, runs tests, and iterates until done. It ships with specialized built-in agents (Explore, Task, Code Review, Plan), an autopilot mode that executes without stopping for approval, and background delegation – prefix a prompt with & to dispatch it to a cloud coding agent. It supports multiple models (Claude Opus 4.6, Sonnet 4.6, GPT-5.3-Codex, Gemini 3 Pro), MCP integration, persistent memory across sessions, and a plugin system for community extensions.

OpenAI Codex

Codex is OpenAI’s cloud-based coding agent. Each task runs in its own cloud sandbox preloaded with your repository. It reads and edits files, runs commands (test harnesses, linters, type checkers), and iteratively runs tests until passing. Tasks take 1-30 minutes. Internet access is intentionally disabled during execution for security. Codex introduced Automations – unprompted work like issue triage, alert monitoring, and CI/CD – bringing agents closer to autonomous background operation.

What Makes Coding Agents Fundamentally Different

The distinction between tiers isn’t just about features – it’s about who is in the loop. Here’s the same task at each tier:

Task: “Fix the failing test in auth.test.ts”

With an API SDK, you write code that:

  1. Reads auth.test.ts (you implement the file reading)
  2. Sends the content to the model with a prompt asking for a fix (you write the prompt)
  3. Parses the model’s response (you implement the parsing)
  4. Writes the fix to disk (you implement the file writing)
  5. Runs the test (you implement the test runner invocation)
  6. If it fails, you loop back to step 2 (you implement the loop)

With a multi-agent framework, you define:

  1. A “Diagnosis” agent that reads files and identifies issues
  2. A “Fix” agent that generates patches
  3. A “Verification” agent that runs tests
  4. A graph connecting them with conditional edges for retry logic

You still write the agent definitions, tool implementations, and orchestration logic. The framework handles the coordination, but you build the pieces.

With a coding agent, you type:

> Fix the failing test in auth.test.ts

The agent reads the test file, reads the source file it tests, identifies the issue, edits the code, runs the test, sees it fail again, reads the error output, makes a second fix, runs the test again, sees it pass, and reports back. No code written. No tools defined. No orchestration logic. The agent is the developer.

┌───────────────────────────────────────────────────────────────┐
│                   THE AUTONOMY GRADIENT                       │
│                                                               │
│  API SDK           Framework           Coding Agent           │
│  ──────────────────────────────────────────────────▶          │
│                                                               │
│  You write         You define          You describe           │
│  everything        the agents          the outcome            │
│                                                               │
│  You orchestrate   Framework           Agent                  │
│  the loop          orchestrates        orchestrates           │
│                                                               │
│  You handle        Framework           Agent                  │
│  failures          routes failures     debugs failures        │
│                                                               │
│  No environment    Limited tool        Full environment       │
│  access            interfaces          access                 │
└───────────────────────────────────────────────────────────────┘

The key capabilities that enable this:

  1. Code execution: Agents write code AND run it, observe results, and iterate. This closes the feedback loop that SDKs and frameworks leave open.
  2. File system access: They navigate entire project structures, read configuration, and make coordinated multi-file changes.
  3. Tool chain access: They run the same tools human developers use – test suites, linters, type checkers, build systems, deployment scripts.
  4. Version control: They create branches, commit changes, open pull requests, and handle merge conflicts.
  5. Iterative debugging: When something fails, they read error output, diagnose the issue, apply fixes, and re-run – without human intervention.
  6. Extended duration: Anthropic reports that Claude can code autonomously for more than 30 hours without major performance degradation, spawning subagents for subtasks.

The Evidence – and a Counterpoint

The enterprise results are striking. TELUS reports 500,000+ hours saved. Rakuten achieved 99.9% accuracy on massive codebase migrations in hours. 92% of US developers now use AI coding tools daily. Gartner reported a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025. The AI agent market is growing at 46.3% CAGR, from $7.84 billion in 2025 to a projected $52.62 billion by 2030.

But intellectual honesty requires mentioning a significant counterpoint. METR, a safety research organization, conducted a randomized controlled trial with 16 experienced open-source developers completing 246 tasks between February and June 2025. They found that AI tools (primarily Cursor Pro with Claude 3.5/3.7 Sonnet) increased completion time by 19% for experienced developers on codebases where they averaged 5 years of prior experience. Developers predicted AI would make them 24% faster, but it measurably slowed them down.

The nuance matters: this study measured experienced developers on codebases they deeply understood – precisely the scenario where human expertise already provides fast, accurate navigation. The study doesn’t claim AI is useless; it suggests the productivity gains are more pronounced on unfamiliar codebases, greenfield projects, and tasks outside the developer’s domain expertise. Notably, METR’s follow-up study (August 2025 onward) was hampered because a significant number of developers refused to participate if they couldn’t use AI – suggesting the perceived value is high even when measured productivity gains are ambiguous. The study also used early 2025 models; Opus 4.6 and Sonnet 4.6 (February 2026) represent a meaningful capability jump, particularly in sustained agentic task execution.

The Convergence

The three-tier architecture is real, but the boundaries are blurring. API SDKs now include agent primitives – OpenAI’s Agents SDK ships built-in tracing and guardrails. Frameworks now include production deployment tools. And coding agents are absorbing framework capabilities – Claude Code’s multi-agent orchestration is essentially a built-in framework.

┌──────────────────────────────────────────────────────┐
│                                                      │
│          2024              2025             2026     │
│                                                      │
│  SDKs:   API calls ──▶ + tools ──────▶ + agents      │
│                                                      │
│  Frmwks: Chains ──────▶ Graphs ──────▶ + deploy      │
│                                                      │
│  Agents: Copilot ─────▶ Autonomous ──▶ + teams       │
│                                                      │
│          ◀──────── each tier absorbs the one below   │
│                                                      │
└──────────────────────────────────────────────────────┘

Several forces are accelerating this convergence:

MCP as the universal connector. The Model Context Protocol has become the standard for connecting AI systems to external tools and data sources. In December 2025, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation, co-founded with Block and OpenAI, and supported by Google, Microsoft, AWS, Cloudflare, and Bloomberg. MCP has surpassed 97 million monthly SDK downloads with over 10,000 published servers. It’s now integrated into ChatGPT, Cursor, Gemini, Copilot, and VS Code. When every agent speaks the same tool protocol, the integration layer collapses.

Multi-model support. GitHub Copilot CLI supports Claude Opus 4.6, GPT-5.3-Codex, and Gemini 3 Pro. Aider works with any model provider. Cursor and Windsurf are model-agnostic. The coding agent is decoupling from the model underneath it – the agent becomes an interface to any foundation model, making the API SDK tier increasingly invisible to end users.

Background and autonomous operation. Codex runs tasks in cloud sandboxes for up to 30 minutes. Claude Code runs async background tasks while you work on other things. Copilot CLI’s & prefix dispatches work to cloud agents. The direction is clear: agents that work while you sleep, triaging issues, running maintenance, and preparing pull requests for your morning review.

Why CLI Agents Win

If you’re deciding where to invest your time and attention, the answer depends on what you’re building. But the trend line favors coding agents for a specific, structural reason: they close the feedback loop.

An API SDK lets you ask a model a question. A framework lets you chain questions together. But a coding agent can act on the answers – and critically, it can observe the results of its actions and correct course. This is the difference between a consultant who writes a report and an engineer who writes the code, runs it, debugs it, and ships it.

The feedback loop is why coding agents can handle tasks that are genuinely hard to solve with frameworks alone:

  • “Upgrade this project from React 17 to React 19.” The agent reads every file, makes incremental changes, runs the build after each change, fixes new errors that surface, and continues until the build passes. A framework could coordinate specialized agents for this, but you’d need to build the file reading, the build runner, the error parser, and the retry logic yourself.
  • “Find and fix the security vulnerability in the authentication flow.” The agent reads the code, identifies the issue, applies a fix, writes a test for the fix, runs the test suite to ensure nothing else breaks, and commits the result. It thinks like a developer because it has access to the same tools a developer has.
  • “Add pagination to the API endpoint and update the frontend to use it.” The agent modifies the backend, updates the frontend, runs integration tests, and iterates until everything works together. Multi-file, multi-layer changes coordinated by a single intent.

The CLI form factor matters here. Terminal-based agents like Claude Code, Copilot CLI, and Aider inherit the composability of Unix: they can be piped, scripted, run in CI, and chained with other tools. They operate on real projects with real build systems and real test suites. They are not sandboxed demonstrations – they are production tools operating in production environments.

How to Think About This

A practical mental model for choosing the right tier:

You’re building…Use…Why
An app that uses AI as a featureAPI SDKYou need precise control over prompts, parameters, and error handling
A system where multiple AI agents collaborateMulti-agent frameworkYou need orchestration, state management, and coordination logic
Nothing – you want AI to build it for youCoding agentYou describe the outcome; the agent does the engineering

The first two tiers are for developers building with AI. The third tier is for developers working alongside AI. The distinction is subtle but important: in tiers 1 and 2, you’re the architect and the AI is a tool. In tier 3, the AI is a peer – sometimes a junior peer that needs guidance, sometimes a remarkably capable one that handles complex refactors while you focus on design decisions.

We are early in this shift. The METR study reminds us that the productivity gains are not universal, and the tooling is still maturing. But the trajectory is unmistakable. Every major platform is converging on the same bet: the most valuable AI developer tool is not an SDK you call or a framework you configure – it’s an agent that writes the code.

Comments

Came here from LinkedIn or X? Join the conversation below — all discussion lives here.