**Implements**: [SEP-2448: server execution telemetry] (https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2448) **Description:** **The Observability Gap (The Problem)** MCP clients propagate trace context to servers, but server-side execution remains a black box. The client sees a single tools/call or resources/read span; everything the server does (auth checks, policy evaluation, API calls, sub-tool invocations) is invisible. In cross-organization deployments, clients and servers use separate observability backends with no shared collector access, making traditional span export useless. <img width="1015" height="450" alt="Screenshot 2026-03-23 at 3 43 21 PM" src="https://github.com/user-attachments/assets/58c817b5-fee6-46a3-9877-d523a25368ad" /> **Server Execution Telemetry (The Solution)** Servers advertise serverExecutionTelemetry and return a curated slice of their execution spans directly in _meta.otel of the response. Clients ingest these verbatim OTLP spans into their own collector, stitching server-side execution into their distributed trace; no shared infrastructure required. The black box becomes transparent. <img width="945" height="574" alt="Screenshot 2026-03-23 at 3 43 44 PM" src="https://github.com/user-attachments/assets/38d97c94-aa73-4e62-9b4e-3264600e5ed0" /> . **Summary:** Implement MCP serverExecutionTelemetry capability that enables cross-organization distributed tracing by returning server-side OpenTelemetry spans to clients inline via _meta.otel.traces. Server-side (middleware): - TelemetryPassbackMiddleware intercepts tools/call and resources/read - ContextVarSpanCollector isolates span collection per-request via ContextVar - Propagates traceparent from client request for distributed trace stitching - Serializes collected spans to verbatim OTLP JSON (resourceSpans format), directly POSTable to /v1/traces - Top-level span filtering by default; full span tree via detailed opt-in - Middleware advertises capabilities via get_capabilities() on the Middleware base class - Provisional API: FutureWarning emitted until SEP-2448 is ratified Client-side (reference agent): - LangChain ReAct agent connects to MCP server via streamable_http_client with OAuth 2.1 - Detects serverExecutionTelemetry capability at initialization - Dynamically wraps discovered MCP tools with traceparent propagation and _meta.otel span request - Ingests returned server spans into Jaeger (OTLP JSON) and Galileo (OTLP protobuf) - Two-act demo: --no-passback (black box) vs default (full server-side visibility) Dependencies: - opentelemetry-api and opentelemetry-sdk added to arcade-mcp-server Bump arcade-mcp-server version to 1.18.0. |
||
|---|---|---|
| .. | ||
| src/telemetry_passback | ||
| .env.example | ||
| docker-compose.yml | ||
| pyproject.toml | ||
| README.md | ||
SEP-2448: MCP server execution telemetry — Reference Implementation
End-to-end reference implementation of SEP-2448 serverExecutionTelemetry — cross-organization distributed tracing via MCP.
Overview
This example demonstrates how an MCP server can pass back OpenTelemetry spans to the calling client, enabling full distributed tracing across organizational boundaries. Without this capability, the server side of an MCP tool call is a black box — you can see that it was called, but not what happened inside.
The example includes three components:
- Server (
server.py) — An Arcade MCP server with Gmail tools that usesTelemetryPassbackMiddlewareto collect and return spans. This shows how a vendor adopts the SEP. - Agent (
agent.py) — A LangChain ReAct agent that requests span passback, receives server spans, and ingests them into Jaeger/Galileo. This shows how a consumer uses the SEP. - Jaeger (
docker-compose.yml) — Local trace collector and UI for visualizing the stitched traces.
Prerequisites
- Python 3.11+
- uv package manager
- Docker (for Jaeger)
- An Arcade account (quickstart)
- An OpenAI API key (for the LangChain agent)
Setup
cd examples/mcp_servers/telemetry_passback
# Copy env file and add your keys
cp .env.example .env
# Edit .env: set OPENAI_API_KEY, ARCADE_API_KEY, ARCADE_USER_ID
# Install dependencies
uv sync
# Start Jaeger
docker compose up -d
Usage
The server and agent run as separate processes. Start the server first, then run the agent in another terminal.
Start the Server
# Terminal 1
uv run python src/telemetry_passback/server.py
The server listens at http://127.0.0.1:8000/mcp with OAuth 2.1 resource server auth via Arcade.
Run the Agent
In a separate terminal. On first run, the MCP SDK will open your browser for OAuth authorization (one-time).
Act 1 — "The Black Box" (no passback)
uv run python src/telemetry_passback/agent.py --no-passback "List my 3 most recent emails"
Open Jaeger at http://localhost:16686: you see agent LLM reasoning spans + one opaque mcp.call_tool CLIENT span. The tool call took ~3 seconds but there's no way to tell why. Is it the LLM? The network? Auth? The Gmail API? Everything inside the server is invisible.
Act 2 — "The Revelation" (with passback)
uv run python src/telemetry_passback/agent.py --detailed "List my 3 most recent emails"
Same call, but now the span tree reveals the server's internal structure:
mcp-gmail-agent
├── LangChain agent reasoning
├── ChatOpenAI (LLM decides to call tool)
├── mcp.call_tool list_emails (CLIENT)
│ └── tools/call list_emails (SERVER) ← FROM SPAN PASSBACK
│ ├── auth.validate (50ms)
│ ├── gmail.list_messages (400ms)
│ │ └── GET messages (HTTP)
│ ├── gmail.fetch_details (1.6s) ← bottleneck!
│ │ ├── GET messages/abc (HTTP, 520ms)
│ │ ├── GET messages/def (HTTP, 510ms)
│ │ └── GET messages/ghi (HTTP, 530ms)
│ └── format_response (5ms)
└── ChatOpenAI (LLM — final answer)
Now the consumer can see exactly what's happening: auth is fast, listing is fine, but detail fetching is sequential — three HTTP calls in a waterfall. Armed with this information, the consumer can:
- File an informed bug report to the server vendor: "your
list_emailshas an N+1 in detail fetching — each email triggers a sequential HTTP call" - Adjust their usage: request fewer emails, use a query filter to reduce N
- Make an informed vendor choice: compare span trees across MCP server providers
This is the core value of the SEP — the consumer doesn't need access to the server's code or deployment to understand its performance characteristics.
Granularity Control
The --detailed flag demonstrates the SEP's span filtering. Without it, the server returns only top-level phase spans (auth, list, fetch, format). With --detailed, the full tree including HTTP child spans is returned. This lets the server vendor control how much internal detail is exposed.
# Top-level phases only (default)
uv run python src/telemetry_passback/agent.py "List my 3 most recent emails"
# Full span tree including HTTP child spans
uv run python src/telemetry_passback/agent.py --detailed "List my 3 most recent emails"
CLI Options
| Flag | Default | Description |
|---|---|---|
query |
"List my 5 most recent emails" |
The question to ask the agent |
--detailed |
false |
Request full span tree |
--no-passback |
false |
Disable span passback (Act 1 — server is a black box) |
--server-url |
http://127.0.0.1:8000/mcp |
MCP server URL |
Expected Results in Jaeger
Open http://localhost:16686 and search for service mcp-gmail-agent.
| Mode | What you see |
|---|---|
--no-passback |
Only agent-side spans: LLM calls + opaque mcp.call_tool. Server is a black box. |
| Default | Server phase spans stitched into the same trace: auth.validate, gmail.list_messages, gmail.fetch_details, format_response. |
--detailed |
Full span tree: phase spans plus HTTP child spans under each phase, revealing the sequential N+1 pattern in gmail.fetch_details. |
Architecture
┌─────────────────────────┐ HTTP (streamable) ┌──────────────────────────┐
│ agent.py │ ───────────────────────>│ server.py │
│ (LangChain ReAct) │ :8000/mcp │ (Arcade MCP Server) │
│ │ │ │
│ OAuth 2.1 via MCP SDK │ traceparent in _meta │ OAuth 2.1 (Arcade) │
│ OTel → Jaeger/Galileo │ ───────────────────────>│ OTel (internal only) │
│ │ spans back in _meta │ TelemetryPassback MW │
│ │ <───────────────────────│ │
└─────────────────────────┘ └──────────────────────────┘
│ │
└──────────── Stitched trace in Jaeger ───────────────┘
How It Works
Server side (server.py):
- Validates Bearer tokens via
ArcadeResourceServerAuth(OAuth 2.1, RFC 9728 discovery) TelemetryPassbackMiddlewareinterceptstools/callrequests- Reads
_meta.traceparentand_meta.otel.traces.{request, detailed} - Creates a SERVER span under the client's trace (via traceparent propagation)
- Tool function creates logical-phase spans with
gen_ai.*semantic conventions - httpx auto-instrumentation creates HTTP child spans for Gmail API calls
- Middleware serializes to OTLP JSON and attaches to
response._meta.otel.traces
Client side (agent.py):
- MCP SDK handles OAuth 2.1 automatically (discovers auth server on 401, PKCE flow, token caching)
- Connects to the server via streamable HTTP, detects
serverExecutionTelemetrycapability - For each tool call, creates a CLIENT span and injects
traceparentin_meta - Sends
_meta.otel.traces.request: trueto opt into span passback - Receives server spans in response
_meta.otel.traces.resourceSpans - POSTs OTLP JSON to Jaeger for trace stitching
- Optionally exports to Galileo (protobuf) if
GALILEO_API_KEYis set
Configuration
Copy .env.example to .env:
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | OpenAI API key for the LangChain agent |
ARCADE_API_KEY |
(required) | Arcade API key |
ARCADE_USER_ID |
(required) | Your Arcade account email |
ARCADE_API_URL |
https://api.arcade.dev |
Arcade API endpoint |
GALILEO_API_KEY |
(optional) | Enables export to Galileo alongside Jaeger |
GALILEO_PROJECT |
(optional) | Galileo project name |
GALILEO_LOG_STREAM |
default |
Galileo log stream |
GALILEO_OTLP_ENDPOINT |
https://app.galileo.ai/api/galileo/otel/traces |
Galileo OTLP endpoint |