← Blog
BlogMCP

Model Context Protocol (MCP) Explained for Engineers: Protocol Design, Function Calling Trade-offs, and Building Your First Server

·13 min read
Share:Share on XShare on LinkedIn
Model Context ProtocolAI EngineeringLLMsAPI Design

MCP has hit 97 million monthly SDK downloads as of March 2026, up from 2 million at launch 16 months ago — a 4,750% increase. The protocol now has 9,652 server entries in its official registry, 300+ client integrations, and 41% enterprise production adoption per the Stacklok 2026 software report. That's not hype anymore. That's adoption.

What's missing from most of the coverage is the engineering reality underneath it. How does the protocol actually work? What's on the wire? When does it make more sense than just writing a function and passing it to the model directly? And what do you need to know before you ship a server into production?

This guide skips the pitch. It covers the wire format, the three core primitives, the honest comparison with function calling, and a working server you can run in 20 minutes.

Key Takeaways

  • MCP is a JSON-RPC 2.0 protocol over stdio, SSE, or Streamable HTTP — not an abstraction layer or agent framework. (Official MCP Spec, 2025)
  • In 2026, 41% of organizations have MCP in production; the SDK crossed 97M monthly downloads in March 2026. (Stacklok Survey, 2026; Digital Applied / Anthropic, 2026)
  • The best model on the MCPMark benchmark achieves only 52.56% pass@1 across 127 real tasks — MCP doesn't solve reliability by itself. (MCPMark, arXiv, September 2025)
  • Use function calling when latency is critical or you own the full single-model stack. Use MCP when multiple clients or models need the same tools.

What Is MCP? (Not the Marketing Version)

MCP is a JSON-RPC 2.0 protocol that standardizes how AI models communicate with external tools and data sources. In November 2024, Anthropic published the open specification; by late 2025, they transferred governance to the Linux Foundation's Agentic AI Foundation (AAIF) — now co-stewarded by Anthropic, Block, and OpenAI, with Google, Microsoft, and AWS as supporters. The core innovation isn't the capability: it's the standardization.

Before MCP, every AI application had to build its own tool-wiring layer. Claude had a different interface than GPT-4. Cursor's plugin system didn't work with Windsurf. You built integrations once per model, per application. MCP collapses that into a single interface any conforming client can consume.

The wire format

Everything in MCP is JSON-RPC 2.0. A tool call looks like this:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "get_top_stories",
    "arguments": { "limit": 5 }
  }
}

And the response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": [
      { "type": "text", "text": "[{\"title\": \"...\", \"score\": 312 }]" }
    ],
    "isError": false
  }
}

There's nothing magic here. It's a message-passing protocol with a defined method namespace. The complexity isn't in the wire format — it's in the lifecycle management and capability negotiation on top of it.

Transport options

MCP supports three transports:

  • stdio: The client spawns a server subprocess; communication happens over stdin/stdout. The default for local desktop integrations (Claude Desktop, Cursor, VS Code). Zero network setup required.
  • SSE (Server-Sent Events): An HTTP-based transport that's now deprecated in the 2025 spec update. Still common in the wild — you'll encounter it — but don't start new projects on it.
  • Streamable HTTP: The new default for remote and cloud-deployed servers. Uses standard HTTP POST for client-to-server messages, with SSE or response body for server-to-client streaming. This is the transport to use for anything you're deploying.

For local development, use stdio. For any cloud deployment, use Streamable HTTP.

The three primitives

Every MCP server exposes some combination of three primitives:

Primitive Who controls it What it does
Tools Model-controlled Callable functions the model invokes by name — analogous to function calling.
Resources App-controlled Read-only data sources the host application attaches to context. Think files, database records, live API responses.
Prompts User-controlled Reusable prompt templates that users can activate through the client UI.

Most engineers only need Tools in their first server. Resources and Prompts matter more once you're building multi-server or multi-client architectures where the host app needs to control what context the model sees.

The initialization handshake

When a client connects to an MCP server, they negotiate capabilities:

  1. Client sends initialize with its protocol version and supported capabilities.
  2. Server responds with its own protocol version and what it supports.
  3. Client sends initialized (a notification — no response expected).
  4. Session is now active.

This handshake is why you can't fire a tools/call at a cold server without connecting first. It's also how the protocol stays backward-compatible as versions evolve: each side declares what it supports, and both sides operate within the intersection.

Dense network cables connected to server rack ports — representing MCP's protocol layer connecting AI models to external tools and services

According to the official MCP anniversary report (Anthropic, November 2025), the MCP registry grew 407% in a single quarter — from under 500 entries in September 2025 to nearly 2,000 by November 2025. As of May 2026, the registry holds 9,652 server entries across developer tools, business applications, web services, and AI automation categories. The GitHub mcp-server topic now has 15,926 repositories.

For a broader look at how tools fit into production AI architectures, see How to Build AI Agents That Don't Fall Apart in Production.


MCP SDK Monthly Downloads TypeScript + Python SDKs combined 0 25M 50M 75M 100M 97M Nov'24 Jan'25 Apr'25 Jul'25 Nov'25 Mar'26 Source: Anthropic / Digital Applied, March 2026
MCP SDK monthly downloads grew 4,750% in 16 months — from 2M at launch to 97M by March 2026.

How Does MCP Compare to Function Calling?

Function calling isn't going away, and MCP doesn't replace it — they solve different problems. Here's the honest trade-off.

In 2026, the MCPMark benchmark (arXiv, September 2025) tested models across 127 realistic tasks spanning 55 MCP servers. The best-performing model achieved only 52.56% pass@1. That's not a failure of the protocol — it reflects the inherent complexity of multi-step tool orchestration with real-world servers. Function calling in a tightly controlled single-model context typically outperforms general-purpose MCP in raw accuracy because the schema is optimized for exactly one model and one application.

The core architectural difference

Function calling is in-process. You define a JSON schema, pass it to the model API alongside your prompt, and the model returns a structured tool call you execute in your own runtime. The tool definition lives in your code. The execution happens in your process. There's no network hop, no subprocess, no handshake.

MCP is out-of-process. You run a separate server — a different process, sometimes on a different machine. The client (your app, Claude Desktop, Cursor) connects over stdio or HTTP, negotiates capabilities, and the model requests tool calls through the protocol layer. That separation is exactly the point: the server can be shared across any client that speaks MCP.

Latency comparison

For local stdio servers, MCP protocol overhead is under 10 ms — negligible for most use cases. For remote Streamable HTTP servers, you're adding a network round-trip. Published gateway benchmarks from TrueFoundry and Bifrost show p99 latencies in the 3–11 ms range for well-optimized setups (Portkey Blog, March 2026). Compare that to function calling, which executes in-process with microsecond overhead.

The latency gap is real but rarely the deciding factor. If your tool is calling a remote API anyway — GitHub, Jira, a database — the MCP overhead is noise. Where it matters is streaming UX or high-frequency tool loops where every millisecond compounds.

Side-by-side comparison

Function Calling MCP
Overhead ~0 ms (in-process) <10 ms stdio / 3–50 ms HTTP
Reusability One model, one app Any MCP-compatible client or model
Schema ownership Defined in your code Server declares its own capabilities
Versioning Manual, per-deployment Protocol-level capability negotiation
Security boundary Same process Separate process or service
Ecosystem reach Proprietary to model provider Open, cross-provider standard
Operational complexity Low — no server to run Higher — server process to manage
Best for Single-model pipelines Multi-client or multi-model architectures

On the MCPMark benchmark (arXiv, September 2025), which covers 127 tasks across 55 real MCP servers, the best-performing model (GPT-5-medium) achieves 52.56% pass@1 — with Claude Sonnet 4 at 28.1% and o3 at 25.4%. These numbers reflect real-world multi-step tool orchestration difficulty, not protocol limitations. Engineer fallback handling and human-in-the-loop checkpoints into any MCP pipeline where accuracy matters.

For a deeper look at production reliability and fallback design for AI pipelines, see How to Build AI Agents That Don't Fall Apart in Production.


MCPMark Benchmark: Model Pass@1 Rate 127 tasks across 55 real MCP servers (arXiv, Sept 2025) GPT-5-medium Grok-4 Claude Opus 4.1 Claude Sonnet 4 o3 Qwen3-Coder-Plus 52.56% 31.69% 29.92% 28.1% 25.4% 24.80% Source: MCPMark, arXiv:2509.24002, September 2025
Even the best model achieves only 52.56% pass@1 on realistic MCP tasks — production pipelines need fallback handling.

When Should You Choose MCP Over Function Calling?

This is the question that actually matters in practice. Neither option is universally better — the right answer depends on three variables: who's calling the tools, how many clients exist, and whether you need cross-provider compatibility.

Choose MCP when:

  1. Multiple clients need the same tools. If you're building a GitHub integration and you want Cursor, Claude Desktop, and your custom chatbot to share the same implementation, build it once as an MCP server. Every conforming client gets it for free.

  2. Multiple models need access. If your architecture includes Claude for reasoning and GPT-4o for generation — or any mix — MCP gives you a single server that any conforming client can call. Function calling locks you to per-provider schemas.

  3. Tools should be independently deployable. Separating tools into their own processes gives you independent deployability, separate observability, and clean security boundaries. Especially relevant when tools have elevated permissions: database writes, code execution, file system access.

  4. You want ecosystem interoperability. With 300+ MCP clients and 9,600+ servers in the registry, the ecosystem is real. Building an MCP server means your tool works with Windsurf, Zed, Continue, Sourcegraph Cody, and whatever ships next year — without changes.

Choose function calling when:

  1. Latency is critical. Sub-millisecond tool execution matters for streaming UX or high-volume pipelines. In-process function calls win here, and the gap is non-trivial.

  2. You control the full stack. Single model, single application, tight coupling between the tool schema and your prompt engineering — function calling is simpler and has fewer operational surfaces to break.

  3. You're iterating fast. MCP adds operational complexity: you need to run and manage a server process. For early-stage development where the schema changes weekly, function calling lets you iterate without the overhead.

  4. You don't need cross-provider portability. If you're committed to one model provider and one client, the portability benefits of MCP don't apply — you're paying the operational cost for a benefit you don't use.

The most common mistake engineers make is choosing MCP because it sounds more "production-grade," then spending two sprints debugging stdio transport edge cases and capability negotiation failures — for a tool that one app calls with one model. The protocol complexity is worth it when you actually need the portability. It isn't when you don't.

For a broader look at how LLM tool choices fit into a modern development workflow, see The Modern AI-Assisted Dev Workflow.


How to Build Your First MCP Server

Now the practical part. You'll build a real MCP server that fetches top HackerNews stories — with a Tool (callable by the model) and a Resource (a status data source for the host app). You'll be able to connect it to Claude Desktop or test it with the MCP Inspector.

You'll need:

  • Python 3.10+ (python.org)
  • pip or uv for package management
  • ~20 minutes
  • Claude Desktop (optional, for live testing)

The server uses FastMCP, the dominant Python framework for MCP servers. It's downloaded approximately 1 million times per day and powers around 70% of Python MCP server deployments. It handles JSON-RPC framing, capability negotiation, and transport selection — you write tool handlers, not protocol code.

Step 1: Set Up Your Environment

# Create a project directory
mkdir hn-mcp-server && cd hn-mcp-server

# Install dependencies
pip install fastmcp httpx

# Or with uv (recommended — faster resolution)
uv add fastmcp httpx

Verify the install:

fastmcp version
# FastMCP 2.x.x

Step 2: Define Your Tools and Resources

Create server.py:

from fastmcp import FastMCP
import httpx

mcp = FastMCP("hackernews")

HN_API = "https://hacker-news.firebaseio.com/v0"

@mcp.tool()
async def get_top_stories(limit: int = 10) -> list[dict]:
    """Fetch the top HackerNews stories with titles, scores, and URLs."""
    async with httpx.AsyncClient() as client:
        ids_resp = await client.get(f"{HN_API}/topstories.json")
        ids = ids_resp.json()[:limit]

        stories = []
        for story_id in ids:
            resp = await client.get(f"{HN_API}/item/{story_id}.json")
            item = resp.json()
            stories.append({
                "id": item.get("id"),
                "title": item.get("title"),
                "url": item.get("url", f"https://news.ycombinator.com/item?id={item.get('id')}"),
                "score": item.get("score"),
                "by": item.get("by"),
            })
        return stories

@mcp.tool()
async def get_story_comments(story_id: int, limit: int = 5) -> list[dict]:
    """Fetch the top comments for a HackerNews story by its ID."""
    async with httpx.AsyncClient() as client:
        story_resp = await client.get(f"{HN_API}/item/{story_id}.json")
        story = story_resp.json()
        comment_ids = story.get("kids", [])[:limit]

        comments = []
        for cid in comment_ids:
            resp = await client.get(f"{HN_API}/item/{cid}.json")
            item = resp.json()
            comments.append({
                "by": item.get("by"),
                "text": item.get("text", ""),
            })
        return comments

@mcp.resource("hn://status")
async def get_status() -> str:
    """Current HackerNews API connectivity status and available tools."""
    return (
        "HackerNews API is operational. "
        "Use get_top_stories(limit=N) to fetch recent posts. "
        "Use get_story_comments(story_id=ID, limit=N) to fetch comments."
    )

if __name__ == "__main__":
    mcp.run()  # defaults to stdio transport

What's happening: @mcp.tool() registers a function as a callable Tool — the model decides when to call it. @mcp.resource() registers a URI-addressable data source that the host app can attach to context independent of the model. FastMCP infers the JSON schema from Python type hints automatically — you don't write schema definitions by hand.

Watch out: Don't let raw exceptions propagate from tool handlers. FastMCP will catch them and return error responses, but your error messages will leak implementation details. Wrap external API calls in explicit try/except and return structured error information instead.

Step 3: Test With the MCP Inspector

Before connecting any real client, test locally:

fastmcp dev server.py

This opens an interactive browser UI at http://localhost:5173. You can call your tools, inspect the raw JSON-RPC request/response pairs, and verify schema generation — without touching Claude Desktop or any other client.

Expected startup output:

Starting MCP Inspector...
Server: hackernews
Tools: get_top_stories, get_story_comments
Resources: hn://status
Transport: stdio
Inspector available at http://localhost:5173

Computer monitor displaying code — representing MCP server implementation and development workflow

Step 4: Connect to Claude Desktop

To wire your server into Claude Desktop on macOS, add it to the MCP config file:

// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "hackernews": {
      "command": "python",
      "args": ["/absolute/path/to/hn-mcp-server/server.py"],
      "env": {}
    }
  }
}

Use an absolute path in args — relative paths break when Claude Desktop launches the subprocess from a different working directory. Restart Claude Desktop. The MCP icon (⚡) appears in the input area. Click it to verify your server is connected and its tools are listed.

Step 5: Deploy as a Remote Server

For cloud deployment, switch the transport:

# Change the entry point in server.py:
if __name__ == "__main__":
    mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)

Or deploy via uvicorn for production:

pip install uvicorn
uvicorn server:mcp --host 0.0.0.0 --port 8000

Remote clients connect at http://your-host:8000/mcp. Add authentication before exposing this publicly — see the Production section below.


Enterprise MCP Adoption Stage (2026) Stacklok 2026 Software Survey (n=organizations surveyed) 41% in production Broad production (12%) Limited production (29%) Pilot (30%) Planning (29%)
41% of organizations surveyed have MCP in production as of 2026 — 12% at broad scale and 29% in limited deployments. (Source: Stacklok 2026 Software Report)

What You Need to Know Before Going to Production

The production problems with MCP servers are rarely about the protocol itself. They're about the same things that bite you with any external process: authentication, error handling, observability, and security. Get these right early — retrofitting them after launch is painful.

Authentication

The 2026 MCP spec defines an OAuth 2.1 + PKCE flow for remote servers. Local stdio servers inherit permissions from the parent process (the AI client), so you typically don't need auth there. For any HTTP-exposed server:

  • Add bearer token validation at the transport level before requests reach your tool handlers.
  • The Authorization: Bearer <token> header is accessible in the request context in FastMCP via mcp.get_context().request.headers.
  • Don't roll your own auth scheme. Implement the spec's OAuth 2.1 flow or use a gateway like Portkey or Zuplo that enforces auth at the HTTP layer.

Security

In 2026, 43% of sampled MCP servers contain command injection vulnerabilities, and CVE-2025-6514 exposed over 437,000 environments (Portkey Blog, citing security research, March 2026). The attack surface is concrete:

  1. Tool argument injection: Validate and sanitize all arguments before passing them to system commands or external APIs. Never pass user-supplied strings directly to subprocess.run or exec.
  2. Prompt injection via tool output: A malicious external API can return a string like "Ignore previous instructions and..." — the model reads that as context and may act on it. Add output sanitization or content filtering before returning tool results to the client.
  3. Capability over-exposure: Don't expose more tools than each client actually needs. The MCP spec allows per-client capability filtering — use it.

Observability

MCP servers are separate processes — their logs don't appear in your main application logs. Wire up structured logging from the start:

import logging
import structlog

logger = structlog.get_logger()

@mcp.tool()
async def get_top_stories(limit: int = 10) -> list[dict]:
    logger.info("tool_called", tool="get_top_stories", limit=limit)
    try:
        # ... tool logic ...
        logger.info("tool_succeeded", tool="get_top_stories", result_count=len(stories))
        return stories
    except httpx.RequestError as e:
        logger.error("tool_failed", tool="get_top_stories", error=str(e))
        return [{"error": "HackerNews API unavailable", "details": str(e)}]

Use correlation IDs to tie model requests to tool calls across process boundaries. The MCP session ID is available in mcp.get_context() — include it in every log line.


Troubleshooting

Here are the five most common problems and their fixes:

Problem Symptom Solution
Server not in Claude Desktop No ⚡ icon after restart Check claude_desktop_config.json for JSON syntax errors; ensure args uses absolute paths
initialize never completes Client hangs on connection Check stderr for startup errors; verify the Python executable path is correct for your env
Tool returns empty content Model says tool returned nothing Tool returned None — FastMCP serializes this as empty content. Return a string with a message when there's nothing to return.
Schema mismatch error Client rejects the tool call Python type hints generated invalid JSON Schema. Avoid bare Any; use explicit Optional[str] instead of str | None for older clients.
Security scanner alert CVE-2025-6514 flagged Update FastMCP to ≥0.12, audit tools that call subprocess or exec, validate all external inputs.

Frequently Asked Questions

What's the difference between MCP tools and function calling?

Function calling is in-process: you define a schema in your application code, the model returns a structured call, and you execute it in your runtime. MCP tools run in a separate server process — any conforming client or model can invoke them without you changing your server. Use function calling for tight single-model integrations; use MCP when multiple clients or models need the same tools.

Can I use MCP with OpenAI or other non-Anthropic models?

Yes. OpenAI added native MCP support in 2025, and the protocol is intentionally model-agnostic. Over 300 clients support MCP as of 2026 — including GitHub Copilot in VS Code, Cursor, Gemini, Windsurf, Zed, and most major AI development tools. A server you build today works with any of them.

Is MCP production-ready in 2026?

In 2026, 41% of surveyed organizations have MCP in active production — 29% in limited deployment and 12% at broad scale (Stacklok 2026 Software Report). The protocol spec is stable, Linux Foundation governance is in place, and FastMCP is mature. The real production risks are operational — auth, security, observability — not protocol instability.

How do I handle authentication for a public MCP server?

Use the OAuth 2.1 + PKCE flow defined in the 2026 MCP spec for remote Streamable HTTP servers. For a simpler path, a gateway like Portkey or Zuplo can enforce auth at the HTTP layer, keeping your server code auth-agnostic. Never expose an unauthenticated HTTP MCP server publicly — CVE-2025-6514 demonstrated that unprotected servers are actively exploited.

What's the best way to monitor MCP server performance in production?

Treat your MCP server like any other microservice: structured logging with correlation IDs, distributed tracing, and explicit health check endpoints. FastMCP's lifespan context manager lets you instrument startup and shutdown cleanly. For production at scale, route requests through an observability-aware gateway to get request/response tracing without adding middleware to every tool handler.


Conclusion

MCP is a real protocol with real production adoption. At 97 million monthly SDK downloads and 41% enterprise deployment, the question isn't whether to pay attention to it — it's whether it fits your specific architecture.

If you're building something that needs to be called by multiple clients or models, MCP gives you a standard interface that already works with 300+ hosts. If you're optimizing a single-model pipeline where every millisecond of latency counts, function calling is still the right call. Neither is universal.

The server you built above is production-capable with auth and logging added — that's 30 minutes of additional work. From here: harden the security posture, wire up structured observability, and explore the Resource primitive for exposing structured data sources the host app can attach to context without waiting for the model to request it.


Sources

Related Posts

Weekly Digest

Get the best AI engineering posts, weekly

No hype. Curated signal every Sunday.

← All posts