Model Context Protocol in 2026: building production AI integrations on a real standard

Eighteen months ago, every AI integration was a one-off. If you wanted Claude to read a Notion page, you wrote a Notion-to-Claude connector. If you wanted the same agent to also write a Linear ticket, you wrote a second one. Move to a new model vendor and the whole library went in the bin. In 2026, that work is gone. Model Context Protocol (MCP) ate it. Anthropic introduced MCP in November 2024, Anthropic, OpenAI, Google, and Microsoft now all support it on the client side, the protocol is governed by the Linux Foundation, and the official registry is approaching ten thousand public servers. This is how we build on it, what the spec actually contains, and where it still bites.

The N×M problem MCP actually solves

Before MCP, every AI host (Claude, ChatGPT, Cursor, a bespoke agent) needed its own custom integration for every external system. If you had N hosts and M tools, you needed N×M custom connectors. Each had its own auth flow, schema, error-handling story, and lifecycle. None of them were reusable across vendors. That maths is why the early agent ecosystem looked the way it did: every product reinvented the wheel, and changing models meant rebuilding the wheel.

MCP collapses N×M to N+M. You write one MCP server for your system, and every MCP-compatible host can use it. You build one MCP client inside your host, and every MCP server is suddenly within reach. It is the same trick TCP/IP, USB, and LSP pulled in their own categories: define the protocol once, let the ecosystem compound around it.

The signal in the numbers is hard to argue with. Per Anthropic’s December 2025 ecosystem update, the Python and TypeScript SDKs alone see roughly 97 million monthly downloads, and there are more than ten thousand active public MCP servers. Stacklok’s State of MCP in Software 2026 survey of senior software leaders found 41% of respondents already in some form of production with MCP. For a protocol that did not exist two years ago, that is the steepest adoption curve in agent infrastructure.

The three roles, and why the boundary matters

MCP defines three roles. Understand them as security and lifecycle boundaries, not as code modules.

Host. The application the user actually interacts with. Claude Desktop, Cursor, ChatGPT, VS Code, or your own agent. The host owns user consent, the model, and the overall task.
Client. Lives inside the host and manages one connection to one server. A host typically runs many clients in parallel, one per server. The client is the place where session state, transport, and the protocol handshake live.
Server. Exposes capabilities through the protocol. A server wraps an existing system (an API, a database, a SaaS platform) and presents it as a set of MCP primitives. Servers are where most of the engineering work goes.

A single user request can fan out across many servers. “Summarize the Q3 thread in Slack and open a Linear ticket for the top action” routes through both a Slack server and a Linear server, with the host composing results. That composition only works because every server speaks the same protocol; the host does not need bespoke logic for either.

Three primitives: tools, resources, prompts

Every MCP server exposes its capabilities through the same three primitives. Choose the right one for each capability or your server will fight the protocol.

Tools. Actions the model can invoke. Send a message, create a record, run a query, trigger a deployment. Tools are the write side of MCP. The host can require user approval per call.
Resources. Data the model or user can read. Files, rows, API responses, documents. Resources are the read side, and tend to be the safest surface to expose first.
Prompts. Reusable templates that standardize how the model engages with a domain. A code-review prompt, an incident summary prompt, a sales-call recap prompt. Prompts are the part of MCP most teams underuse on day one and miss most by day ninety.

MCP is not a replacement for your APIs. Your APIs still do the work. MCP is a uniform discovery and invocation layer that sits on top, so a model can find your capabilities without bespoke code on the host side. A common rule of thumb on our engagements: expose reads as resources, writes as tools, and your house style as prompts. Mixing those up is the most frequent design mistake we see in community-built servers.

Transports: stdio and Streamable HTTP

The current spec defines two standard transports. The older HTTP+SSE transport from the 2024-11-05 version is deprecated, with backwards compatibility called out explicitly in the spec.

stdio. The host launches the server as a local subprocess and they exchange JSON-RPC messages over standard input/output. Zero network, zero auth, lowest possible setup cost. The right choice for desktop tools, local dev integrations, and anything the user runs on their own machine.
Streamable HTTP. The remote transport. Server-Sent Events for progress and notifications, HTTP for the rest, fully compatible with existing load balancers, proxies, and CDNs. This is what you ship to a SaaS customer or run inside an enterprise. The 2025-03 spec made it production-ready; the 2025-11 spec added the session, task, and auth machinery you actually need to operate it at scale.

Connections are stateful in both modes. That distinguishes MCP from a plain REST API and is the feature that makes multi-step workflows feel coherent: the server can remember context across calls within a session, which matters for database transactions, multi-file refactors, or any tool that needs to track intermediate state.

A minimal MCP server in TypeScript

Here is the smallest server worth running. It exposes a single tool over stdio using the official TypeScript SDK and Zod for input schemas. This is the shape every server starts with, and the one we copy when bootstrapping a new integration on a client engagement.

src/server.ts

typescript

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
  name: "tickets-server",
  version: "1.0.0",
});

server.registerTool(
  "create_ticket",
  {
    description: "Create a support ticket in the internal tracker.",
    inputSchema: {
      title: z.string().min(3).describe("Short, action-oriented title"),
      priority: z
        .enum(["low", "medium", "high"])
        .describe("Triage priority assigned by the agent"),
      description: z.string().describe("Full ticket body, markdown allowed"),
    },
  },
  async ({ title, priority, description }) => {
    const res = await fetch(`${process.env.TICKETS_API}/tickets`, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${process.env.TICKETS_TOKEN}`,
      },
      body: JSON.stringify({ title, priority, description }),
    });

    if (!res.ok) {
      return {
        isError: true,
        content: [{ type: "text", text: `Tickets API failed: ${res.status}` }],
      };
    }

    const ticket = (await res.json()) as { id: string; url: string };
    return {
      content: [
        {
          type: "text",
          text: `Created ticket ${ticket.id}: ${ticket.url}`,
        },
      ],
    };
  },
);

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
}

main();

Two patterns in there earn their keep in production. First, return a structured isError response instead of throwing on upstream failure; hosts surface that cleanly to the model, which then often retries or asks the user for guidance. Second, treat the Zod schema as the contract. The host turns it into the JSON Schema the model sees, so the field descriptions are not documentation, they are the prompt.

A serious operational note: if you log on a stdio server, log to stderr. Anything written to stdout corrupts the JSON-RPC stream and breaks the server. This is the single most common bug we have debugged in community-built servers, and it is one line of config away.

Wiring a Streamable HTTP server with auth

For remote MCP, you need to terminate Streamable HTTP and enforce OAuth 2.1 token validation. Here is the shape we use when shipping a production server behind an Express app. It validates the bearer token on every request, scopes the session to a tenant, and treats the MCP server as an OAuth Resource Server per the June 2025 spec revision.

src/http-server.ts

typescript

import express from "express";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { verifyAccessToken } from "./auth";

const app = express();

app.use(async (req, res, next) => {
  const auth = req.headers.authorization;
  if (!auth?.startsWith("Bearer ")) {
    res.setHeader(
      "WWW-Authenticate",
      'Bearer resource_metadata="https://mcp.example.com/.well-known/oauth-protected-resource"',
    );
    return res.status(401).end();
  }

  const claims = await verifyAccessToken(auth.slice(7));
  if (!claims) return res.status(401).end();

  (req as any).tenant = claims.tenant_id;
  next();
});

app.all("/mcp", async (req, res) => {
  const server = new McpServer({ name: "crm", version: "1.0.0" });

  server.registerTool(
    "find_account",
    {
      description: "Find a CRM account by domain.",
      inputSchema: { domain: z.string() },
    },
    async ({ domain }) => {
      const tenant = (req as any).tenant as string;
      const account = await crm.findAccount(tenant, domain);
      return { content: [{ type: "text", text: JSON.stringify(account) }] };
    },
  );

  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => crypto.randomUUID(),
  });
  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});

app.listen(3000);

Three points worth flagging. The WWW-Authenticate header is how the client discovers your authorization server; the spec formalized this in June 2025 by adopting RFC 8707 Resource Indicators, which prevents a token issued for one server from being reused against another. The tenant claim on the request is the cleanest place we have found to enforce multi-tenancy on a shared server, since the protocol itself still does not define one. And the per-request McpServer instance is deliberate: it makes session state easier to reason about under load and matches the stateless-by-default direction the 2026-07 draft spec is moving in.

Auth in 2026: OAuth 2.1, CIMD, and session scoping

Auth is the area of MCP that has changed the most, and it is where every enterprise review focuses. The story now: remote servers act as OAuth 2.1 Resource Servers. The client goes through a standard authorization-code flow with PKCE against an authorization server you point at, gets back a scoped token, and presents it on every request. Local stdio servers inherit the host’s permissions, so the security boundary is the user’s own machine.

The November 2025 spec shifted the default for client registration from Dynamic Client Registration (DCR) to Client ID Metadata Documents (CIMD). With CIMD, a client’s identity is a URL pointing to a JSON document the client controls, and the authorization server fetches that metadata on demand instead of maintaining a registration database per client. This is the change that makes MCP’s scale tractable: a single client may connect to thousands of servers it has never met, and a database row per server-client pair was never going to hold.

On top of the spec, the pattern we recommend for sensitive workloads is session-scoped authorization: do not hand the agent a long-lived token at all. Issue access for the duration of a single task, revoke when the task ends, and require an explicit human approval for a new session. Static client secrets remain common in production and they will burn you. The 2026 roadmap calls them out by name as a gap to close.

Beyond request/response: tasks, sampling, elicitation

The 2025-11-25 spec turned MCP from a strict call-and-respond protocol into something closer to a collaboration framework. Three primitives drive the change.

Tasks. A call-now, fetch-later pattern. Any request can return a task handle immediately and continue work in the background. Clients poll or subscribe for progress. Tasks move through defined states (working, input_required, completed, failed, cancelled) so an agent can report meaningful status on a ten-minute ETL job without holding a connection open.
Sampling. The server asks the host to run a model completion. Lets a server reason about intermediate state, validate assumptions, or generate content inside its own workflow, with the user able to review and edit sampled output before it returns.
Elicitation. The server pauses execution and asks the user for input. Form-mode handles structured questions; URL-mode hands the user off to a trusted external page for OAuth, payment, or credential entry, then resumes. This is how you keep humans in the loop without inventing a side channel.

Combined, these turn the server from a passive endpoint into an active participant. A research server can spawn its own internal agent loop using sampling, report progress through tasks, and stop to ask the user for clarification through elicitation, all over the same MCP connection. The orchestration code we used to write by hand is now the protocol.

MCP Apps: the UI layer

Until January 2026, every MCP interaction was text. MCP Apps, launched as the first official spec extension and co-developed with OpenAI, lets tools return rich HTML interfaces that render in sandboxed iframes inside the host’s chat. Users edit a Figma board, manipulate an Amplitude dashboard, or compose a Slack message without leaving the conversation. Launch partners include Amplitude, Asana, Box, Canva, Clay, Figma, Hex, monday.com, Slack, and Salesforce, with support across Claude, ChatGPT, Goose, and VS Code.

The security model is iframe sandboxing, pre-declared templates, auditable JSON-RPC messaging, and user consent for any UI-initiated tool call. The pragmatic reading: this is the first protocol-level answer to the long-running question of how agentic workflows escape the chat box without each host building its own custom plugin UI.

Real workloads where MCP earned its place

Three shapes of work where MCP has paid for itself on engagements this year.

Internal developer agents. An MCP server over a private codebase exposes repo search, dependency graphs, and CI history as resources, with a small set of tools for opening PRs and triggering test runs. The same server is used by Cursor, VS Code, Claude Desktop, and an internal CLI agent. Three hosts, one server, one auth surface. The cost saving versus the previous one-connector-per-host world was not subtle.
Customer support copilots. A Streamable HTTP server fronts the support stack: Zendesk for tickets, the data warehouse for customer history, the docs site for canonical answers. The model gets read access by default; ticket updates, refunds, and account changes go through explicit approval tools. Multi-tenancy is enforced from the bearer token, not the protocol. Audit logs are written outside MCP, against a SIEM the security team already runs.
Data analyst agents. A read-only MCP server over the data warehouse exposes catalogs and sample rows as resources and SQL queries as a tool. The model plans, the server executes, results stream back. Cloudflare’s “Code Mode” pattern, publicly demoed at 98%+ token savings by letting agents discover tools dynamically rather than loading every definition upfront, is the right reference design here for hosts with hundreds of available servers.

Across all three, the consistent win is portability: when the underlying model or host changes, the server does not. That is not a cost saving you can point at on a slide, but it is what “moving fast” actually means twelve months in.

Where MCP still hurts

We use it on most engagements. It is not free.

Enterprise observability is DIY. The protocol does not define a standardized audit trail. Teams building production MCP deployments invent their own logging, tracing, and SIEM integration. The 2026 roadmap calls this out as a gap and most of the work is expected to land as an extension, not in core.
Multi-tenancy is not in the spec. SaaS providers building MCP servers have to enforce tenant isolation themselves, usually via claims on the OAuth token. There is no canonical model for this yet.
Rate limiting and cost attribution are unsolved. When agents invoke tools autonomously, organizations need caps and per-team attribution. The protocol does not address this; payment protocols like x402 and Stripe MPP are starting to fill the cross-organizational side, but internal cost governance is still your problem.
Server quality is uneven. Some servers are official and well-maintained (GitHub, Stripe). Many are weekend hacks. There is no conformance testing yet, though the roadmap commits to it. Treat any community server you adopt the same way you treat any other open-source dependency in production: read the code.
Config portability is missing. Setting up the same MCP server in Claude Desktop, Cursor, and VS Code means configuring it three times. The roadmap lists this as a 2026 priority. Until then, document your server’s setup carefully.

The 2026 roadmap

The official roadmap published in March 2026 organizes work around four priorities. These are the changes that will shape what you build for the rest of the year.

Transport evolution. Streamable HTTP gets a stateless mode so sessions can migrate across server instances during scale-out. MCP Server Cards standardize metadata discovery at a .well-known URL so registries and crawlers can inspect a server without connecting.
Agent communication. More sophisticated patterns for sampling, server-side agent loops, and parallel tool calls. The protocol is moving toward letting servers coordinate multi-step reasoning natively, not just expose endpoints.
Enterprise readiness. Structured audit trails, SSO-integrated auth, gateway and proxy patterns, and configuration portability. Most of this is expected to land as extensions rather than core spec changes. This is the bucket that decides whether you can stop building plumbing around the protocol.
Governance maturation. A clear contributor ladder, Working Group delegation, and standardized charter templates under the Linux Foundation, so SEPs stop bottlenecking on core-maintainer review.

How MCP compares to neighbors

MCP vs plain APIs. APIs are point-to-point. MCP is a protocol layer on top of your APIs that gives any AI host a standard way to discover and call them. Your APIs still do the work.
MCP vs function calling. Function calling is the model’s ability to decide to invoke a tool. MCP is the protocol that connects the model to the tool. OpenAI adopted MCP precisely to give its function-calling infrastructure access to a shared ecosystem instead of maintaining a separate plugin architecture.
MCP vs A2A. Google’s A2A standardizes agent-to-agent communication. MCP standardizes agent-to-tool. They are complementary, not competitive. A production agent system will commonly use MCP for tools and A2A for coordination.
MCP vs LangChain or LangGraph. Different layers. MCP is the wire protocol. LangChain and LangGraph are orchestration frameworks that sit above it. LangChain already integrates MCP, and we routinely pair a LangGraph state machine with MCP servers for tool access.

How to introduce MCP on a real project

Pick one integration you already maintain a connector for. The win on day one is replacing existing custom plumbing, not inventing a new product.
Build it as a stdio server first. Iterate locally against Claude Desktop or Cursor until the tool surface and resource shape are right. This is the cheapest possible feedback loop.
Port to Streamable HTTP only when you have a remote consumer or a multi-user requirement. Most of the spec’s complexity (auth, sessions, tasks) only matters at this step.
Adopt the most boring auth that satisfies your security review. OAuth 2.1 with PKCE, RFC 8707 resource indicators, scoped tokens. Reach for CIMD when you find yourself maintaining a client registration table that grows faster than your users do.
Wire structured logs from day one. The protocol will not give you observability; your existing APM and SIEM tooling has to. Treat MCP traffic like any other production HTTP traffic.
Only adopt async tasks, sampling, elicitation, or MCP Apps when a specific user-facing requirement demands them. Each one adds surface area you have to operate.

Where to go from here

If you have AI features in production and you are still maintaining bespoke connectors per host, the next ninety days are when MCP starts paying back. Start with the smallest server that replaces an existing connector. The official TypeScript and Python SDKs are production-quality, the registry has reference servers for most common SaaS systems, and Claude Desktop or Cursor make for a fast local test loop.

If you want a second opinion on whether MCP is the right next step for your agent stack, or help planning the migration from a custom connector layer, reach out.