History

Sankara R. Avula 78c8e6fb99 feat: Add TelemetryPassbackMiddleware for serverExecutionTelemetry capability (#797 ) Implements: [SEP-2448: server execution telemetry] (https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2448) Description: The Observability Gap (The Problem) MCP clients propagate trace context to servers, but server-side execution remains a black box. The client sees a single tools/call or resources/read span; everything the server does (auth checks, policy evaluation, API calls, sub-tool invocations) is invisible. In cross-organization deployments, clients and servers use separate observability backends with no shared collector access, making traditional span export useless. <img width="1015" height="450" alt="Screenshot 2026-03-23 at 3 43 21 PM" src="https://github.com/user-attachments/assets/58c817b5-fee6-46a3-9877-d523a25368ad" /> Server Execution Telemetry (The Solution) Servers advertise serverExecutionTelemetry and return a curated slice of their execution spans directly in _meta.otel of the response. Clients ingest these verbatim OTLP spans into their own collector, stitching server-side execution into their distributed trace; no shared infrastructure required. The black box becomes transparent. <img width="945" height="574" alt="Screenshot 2026-03-23 at 3 43 44 PM" src="https://github.com/user-attachments/assets/38d97c94-aa73-4e62-9b4e-3264600e5ed0" /> . Summary: Implement MCP serverExecutionTelemetry capability that enables cross-organization distributed tracing by returning server-side OpenTelemetry spans to clients inline via _meta.otel.traces. Server-side (middleware): - TelemetryPassbackMiddleware intercepts tools/call and resources/read - ContextVarSpanCollector isolates span collection per-request via ContextVar - Propagates traceparent from client request for distributed trace stitching - Serializes collected spans to verbatim OTLP JSON (resourceSpans format), directly POSTable to /v1/traces - Top-level span filtering by default; full span tree via detailed opt-in - Middleware advertises capabilities via get_capabilities() on the Middleware base class - Provisional API: FutureWarning emitted until SEP-2448 is ratified Client-side (reference agent): - LangChain ReAct agent connects to MCP server via streamable_http_client with OAuth 2.1 - Detects serverExecutionTelemetry capability at initialization - Dynamically wraps discovered MCP tools with traceparent propagation and _meta.otel span request - Ingests returned server spans into Jaeger (OTLP JSON) and Galileo (OTLP protobuf) - Two-act demo: --no-passback (black box) vs default (full server-side visibility) Dependencies: - opentelemetry-api and opentelemetry-sdk added to arcade-mcp-server Bump arcade-mcp-server version to 1.18.0.		2026-03-25 15:57:50 -07:00
..
src/telemetry_passback	feat: Add TelemetryPassbackMiddleware for serverExecutionTelemetry capability (#797 )	2026-03-25 15:57:50 -07:00
.env.example	feat: Add TelemetryPassbackMiddleware for serverExecutionTelemetry capability (#797 )	2026-03-25 15:57:50 -07:00
docker-compose.yml	feat: Add TelemetryPassbackMiddleware for serverExecutionTelemetry capability (#797 )	2026-03-25 15:57:50 -07:00
pyproject.toml	feat: Add TelemetryPassbackMiddleware for serverExecutionTelemetry capability (#797 )	2026-03-25 15:57:50 -07:00
README.md	feat: Add TelemetryPassbackMiddleware for serverExecutionTelemetry capability (#797 )	2026-03-25 15:57:50 -07:00

README.md

SEP-2448: MCP server execution telemetry — Reference Implementation

End-to-end reference implementation of SEP-2448 serverExecutionTelemetry — cross-organization distributed tracing via MCP.

Overview

This example demonstrates how an MCP server can pass back OpenTelemetry spans to the calling client, enabling full distributed tracing across organizational boundaries. Without this capability, the server side of an MCP tool call is a black box — you can see that it was called, but not what happened inside.

The example includes three components:

Server (server.py) — An Arcade MCP server with Gmail tools that uses TelemetryPassbackMiddleware to collect and return spans. This shows how a vendor adopts the SEP.
Agent (agent.py) — A LangChain ReAct agent that requests span passback, receives server spans, and ingests them into Jaeger/Galileo. This shows how a consumer uses the SEP.
Jaeger (docker-compose.yml) — Local trace collector and UI for visualizing the stitched traces.

Prerequisites

Python 3.11+
uv package manager
Docker (for Jaeger)
An Arcade account (quickstart)
An OpenAI API key (for the LangChain agent)

Setup

cd examples/mcp_servers/telemetry_passback

# Copy env file and add your keys
cp .env.example .env
# Edit .env: set OPENAI_API_KEY, ARCADE_API_KEY, ARCADE_USER_ID

# Install dependencies
uv sync

# Start Jaeger
docker compose up -d

Usage

The server and agent run as separate processes. Start the server first, then run the agent in another terminal.

Start the Server

# Terminal 1
uv run python src/telemetry_passback/server.py

The server listens at http://127.0.0.1:8000/mcp with OAuth 2.1 resource server auth via Arcade.

Run the Agent

In a separate terminal. On first run, the MCP SDK will open your browser for OAuth authorization (one-time).

Act 1 — "The Black Box" (no passback)

uv run python src/telemetry_passback/agent.py --no-passback "List my 3 most recent emails"

Open Jaeger at http://localhost:16686: you see agent LLM reasoning spans + one opaque mcp.call_tool CLIENT span. The tool call took ~3 seconds but there's no way to tell why. Is it the LLM? The network? Auth? The Gmail API? Everything inside the server is invisible.

Act 2 — "The Revelation" (with passback)

uv run python src/telemetry_passback/agent.py --detailed "List my 3 most recent emails"

Same call, but now the span tree reveals the server's internal structure:

mcp-gmail-agent
├── LangChain agent reasoning
├── ChatOpenAI (LLM decides to call tool)
├── mcp.call_tool list_emails (CLIENT)
│   └── tools/call list_emails (SERVER)           ← FROM SPAN PASSBACK
│       ├── auth.validate (50ms)
│       ├── gmail.list_messages (400ms)
│       │   └── GET messages (HTTP)
│       ├── gmail.fetch_details (1.6s)             ← bottleneck!
│       │   ├── GET messages/abc (HTTP, 520ms)
│       │   ├── GET messages/def (HTTP, 510ms)
│       │   └── GET messages/ghi (HTTP, 530ms)
│       └── format_response (5ms)
└── ChatOpenAI (LLM — final answer)

Now the consumer can see exactly what's happening: auth is fast, listing is fine, but detail fetching is sequential — three HTTP calls in a waterfall. Armed with this information, the consumer can:

File an informed bug report to the server vendor: "your list_emails has an N+1 in detail fetching — each email triggers a sequential HTTP call"
Adjust their usage: request fewer emails, use a query filter to reduce N
Make an informed vendor choice: compare span trees across MCP server providers

This is the core value of the SEP — the consumer doesn't need access to the server's code or deployment to understand its performance characteristics.

Granularity Control

The --detailed flag demonstrates the SEP's span filtering. Without it, the server returns only top-level phase spans (auth, list, fetch, format). With --detailed, the full tree including HTTP child spans is returned. This lets the server vendor control how much internal detail is exposed.

# Top-level phases only (default)
uv run python src/telemetry_passback/agent.py "List my 3 most recent emails"

# Full span tree including HTTP child spans
uv run python src/telemetry_passback/agent.py --detailed "List my 3 most recent emails"

CLI Options

Flag	Default	Description
`query`	`"List my 5 most recent emails"`	The question to ask the agent
`--detailed`	`false`	Request full span tree
`--no-passback`	`false`	Disable span passback (Act 1 — server is a black box)
`--server-url`	`http://127.0.0.1:8000/mcp`	MCP server URL

Expected Results in Jaeger

Open http://localhost:16686 and search for service mcp-gmail-agent.

Mode	What you see
`--no-passback`	Only agent-side spans: LLM calls + opaque `mcp.call_tool`. Server is a black box.
Default	Server phase spans stitched into the same trace: `auth.validate`, `gmail.list_messages`, `gmail.fetch_details`, `format_response`.
`--detailed`	Full span tree: phase spans plus HTTP child spans under each phase, revealing the sequential N+1 pattern in `gmail.fetch_details`.

Architecture

┌─────────────────────────┐     HTTP (streamable)    ┌──────────────────────────┐
│   agent.py              │ ───────────────────────>│   server.py              │
│   (LangChain ReAct)     │   :8000/mcp              │   (Arcade MCP Server)    │
│                         │                          │                          │
│   OAuth 2.1 via MCP SDK │  traceparent in _meta    │   OAuth 2.1 (Arcade)     │
│   OTel → Jaeger/Galileo │ ───────────────────────>│   OTel (internal only)   │
│                         │  spans back in _meta     │   TelemetryPassback MW   │
│                         │ <───────────────────────│                          │
└─────────────────────────┘                         └──────────────────────────┘
         │                                                     │
         └──────────── Stitched trace in Jaeger ───────────────┘

How It Works

Server side (server.py):

Validates Bearer tokens via ArcadeResourceServerAuth (OAuth 2.1, RFC 9728 discovery)
TelemetryPassbackMiddleware intercepts tools/call requests
Reads _meta.traceparent and _meta.otel.traces.{request, detailed}
Creates a SERVER span under the client's trace (via traceparent propagation)
Tool function creates logical-phase spans with gen_ai.* semantic conventions
httpx auto-instrumentation creates HTTP child spans for Gmail API calls
Middleware serializes to OTLP JSON and attaches to response._meta.otel.traces

Client side (agent.py):

MCP SDK handles OAuth 2.1 automatically (discovers auth server on 401, PKCE flow, token caching)
Connects to the server via streamable HTTP, detects serverExecutionTelemetry capability
For each tool call, creates a CLIENT span and injects traceparent in _meta
Sends _meta.otel.traces.request: true to opt into span passback
Receives server spans in response _meta.otel.traces.resourceSpans
POSTs OTLP JSON to Jaeger for trace stitching
Optionally exports to Galileo (protobuf) if GALILEO_API_KEY is set

Configuration

Copy .env.example to .env:

Variable	Default	Description
`OPENAI_API_KEY`	(required)	OpenAI API key for the LangChain agent
`ARCADE_API_KEY`	(required)	Arcade API key
`ARCADE_USER_ID`	(required)	Your Arcade account email
`ARCADE_API_URL`	`https://api.arcade.dev`	Arcade API endpoint
`GALILEO_API_KEY`	(optional)	Enables export to Galileo alongside Jaeger
`GALILEO_PROJECT`	(optional)	Galileo project name
`GALILEO_LOG_STREAM`	`default`	Galileo log stream
`GALILEO_OTLP_ENDPOINT`	`https://app.galileo.ai/api/galileo/otel/traces`	Galileo OTLP endpoint