# MCP Server Tool Evaluation Support
## Overview
Add support for evaluating tools from remote MCP servers without
requiring Python callables. Enables direct evaluation of any
MCP-compatible tool server.
## What's New
### Core Features
- **`MCPToolRegistry`**: Evaluate tools from a single MCP server
- **`CompositeMCPRegistry`**: Evaluate tools from multiple MCP servers
simultaneously
- **Automatic loaders**: `load_from_stdio()` and `load_from_http()` to
fetch tools from running servers
- **Automatic namespacing**: Tools prefixed with server name (e.g.,
`server_tool_name`)
- **Smart name resolution**: Use short names if unique, full names if
ambiguous
- **OpenAI strict mode**: Automatic schema conversion prevents parameter
hallucinations
### Usage
**Automatic Loading:**
```python
from arcade_evals import load_from_stdio, MCPToolRegistry
# Load tools automatically from MCP server
tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-github"])
registry = MCPToolRegistry(tools)
```
**Single MCP Server:**
```python
from arcade_evals import MCPToolRegistry, ExpectedToolCall
registry = MCPToolRegistry(mcp_tools)
suite = EvalSuite(catalog=registry)
suite.add_case(
expected_tool_calls=[
ExpectedToolCall(tool_name="tool_name", args={...})
]
)
```
**Multiple MCP Servers:**
```python
from arcade_evals import CompositeMCPRegistry, load_from_stdio
# Load from multiple servers
github_tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-github"])
slack_tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-slack"])
composite = CompositeMCPRegistry(
tool_lists={
"github": github_tools,
"slack": slack_tools,
}
)
suite = EvalSuite(catalog=composite)
suite.add_case(
expected_tool_calls=[
ExpectedToolCall(tool_name="github_list_issues", args={...})
]
)
```
## Implementation
### Files Changed
- **`libs/arcade-evals/arcade_evals/registry.py`** (NEW): Registry
abstractions and implementations
- **`libs/arcade-evals/arcade_evals/loaders.py`** (NEW): Automatic tool
loading from MCP servers
- **`libs/arcade-evals/arcade_evals/eval.py`** (MODIFIED): Enhanced
`ExpectedToolCall` and evaluation logic
- **`libs/arcade-evals/arcade_evals/__init__.py`** (MODIFIED): Exported
new registries and loaders
### Key Technical Details
- Added `BaseToolRegistry` interface for abstraction
- `MCPToolRegistry` handles single server tools
- `CompositeMCPRegistry` manages multiple servers with collision
detection
- `load_from_stdio()` and `load_from_http()` for automatic tool
discovery
- Fixed name normalization bug: MCP tools use underscores (not dots)
- Optimized tool copying: 2.5x faster via shallow copy
## Testing
- ✅ 41 tests passing (25 new tests added)
- ✅ `test_eval_mcp_registry.py`: MCPToolRegistry functionality
- ✅ `test_eval_composite_mcp.py`: CompositeMCPRegistry with multiple
servers
- ✅ Verified backward compatibility with Python tools
## Backward Compatibility
✅ **100% backward compatible** - No breaking changes
## Breaking Changes
**None**
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Adds end-to-end eval UX: examples, a robust CLI runner, and rich
outputs.
>
> - **New examples**: `eval_arcade_gateway.py`,
`eval_stdio_mcp_server.py`, `eval_http_mcp_server.py`,
`eval_comprehensive_comparison.py` with timeouts, error handling, and
track-based comparisons; detailed `README.md`
> - **CLI runner**: `arcade_cli/evals_runner.py` to execute
evals/capture in parallel with progress, error isolation, failed-only
filtering, context inclusion, and multi-provider/model support
> - **Output formatters**: `arcade_cli/formatters/` (txt, md, html,
json) for evals and capture; comparative and multi-model HTML with tabs
and context rendering
> - **Display refactor**: `display.py` now supports writing multiple
formats, failed-only disclaimers, include-context, and improved console
summaries
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
ff8acf9c34a6b61462a019a1ee9df081006517d0. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Francisco Liberal <francisco@arcade.dev>
Co-authored-by: Mateo Torres <torresmateo@gmail.com>
|
||
|---|---|---|
| .cursor/rules | ||
| .github | ||
| .vscode | ||
| contrib | ||
| examples | ||
| libs | ||
| schemas/preview | ||
| toolkits | ||
| .editorconfig | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| .prettierignore | ||
| .prettierrc.toml | ||
| .ruff.toml | ||
| CONTRIBUTING.md | ||
| core | ||
| cspell.config.yaml | ||
| LICENSE | ||
| Makefile | ||
| pyproject.toml | ||
| README.md | ||
| SECURITY.md | ||
| uv_setup.sh | ||
Arcade MCP Server Framework
-
To see example servers built with Arcade MCP Server Framework (this repo), check out our examples
-
To learn more about the Arcade MCP Server Framework (this repo), check out our Arcade MCP documentation
-
To learn more about other offerings from Arcade.dev, check out our documentation.
Pst. hey, you, give us a star if you like it!
Quick Start: Create a New Server
The fastest way to get started is with the arcade new CLI command, which creates a complete MCP server project:
# Install the CLI
uv tool install arcade-mcp
# Create a new server project
arcade new my_server
# Navigate to the project
cd my_server/src/my_server
This generates a project with:
-
server.py - Main server file with MCPApp and example tools
-
pyproject.toml - Dependencies and project configuration
-
.env.example - Example
.envfile containing a secret required by one of the generated tools inserver.py
The generated server.py includes proper command-line argument handling and three example tools:
#!/usr/bin/env python3
"""simple_server MCP server"""
import sys
from typing import Annotated
import httpx
from arcade_mcp_server import Context, MCPApp
from arcade_mcp_server.auth import Reddit
app = MCPApp(name="simple_server", version="1.0.0", log_level="DEBUG")
@app.tool
def greet(name: Annotated[str, "The name of the person to greet"]) -> str:
"""Greet a person by name."""
return f"Hello, {name}!"
# To use this tool locally, you need to either set the secret in the .env file or as an environment variable
@app.tool(requires_secrets=["MY_SECRET_KEY"])
def whisper_secret(context: Context) -> Annotated[str, "The last 4 characters of the secret"]:
"""Reveal the last 4 characters of a secret"""
# Secrets are injected into the context at runtime.
# LLMs and MCP clients cannot see or access your secrets
# You can define secrets in a .env file.
try:
secret = context.get_secret("MY_SECRET_KEY")
except Exception as e:
return str(e)
return "The last 4 characters of the secret are: " + secret[-4:]
# To use this tool locally, you need to install the Arcade CLI (uv tool install arcade-mcp)
# and then run 'arcade login' to authenticate.
@app.tool(requires_auth=Reddit(scopes=["read"]))
async def get_posts_in_subreddit(
context: Context, subreddit: Annotated[str, "The name of the subreddit"]
) -> dict:
"""Get posts from a specific subreddit"""
# Normalize the subreddit name
subreddit = subreddit.lower().replace("r/", "").replace(" ", "")
# Prepare the httpx request
# OAuth token is injected into the context at runtime.
# LLMs and MCP clients cannot see or access your OAuth tokens.
oauth_token = context.get_auth_token_or_empty()
headers = {
"Authorization": f"Bearer {oauth_token}",
"User-Agent": "{{ toolkit_name }}-mcp-server",
}
params = {"limit": 5}
url = f"https://oauth.reddit.com/r/{subreddit}/hot"
# Make the request
async with httpx.AsyncClient() as client:
response = await client.get(url, headers=headers, params=params)
response.raise_for_status()
# Return the response
return response.json()
# Run with specific transport
if __name__ == "__main__":
# Get transport from command line argument, default to "stdio"
# - "stdio" (default): Standard I/O for Claude Desktop, CLI tools, etc.
# Supports tools that require_auth or require_secrets out-of-the-box
# - "http": HTTPS streaming for Cursor, VS Code, etc.
# Does not support tools that require_auth or require_secrets unless the server is deployed
# using 'arcade deploy' or added in the Arcade Developer Dashboard with 'Arcade' server type
transport = sys.argv[1] if len(sys.argv) > 1 else "stdio"
# Run the server
app.run(transport=transport, host="127.0.0.1", port=8000)
This approach gives you:
-
Complete Project Setup - Everything you need in one command
-
Best Practices - Proper dependency management with pyproject.toml
-
Example Code - Learn from working examples of common patterns
-
Production Ready - Structured for growth and deployment
Running Your Server
Run your server directly with Python:
# Run with stdio transport (default)
uv run server.py
# Run with http transport via command line argument
uv run server.py http
# Or use python directly
python server.py http
python server.py stdio
Your server will start and listen for connections. With HTTP transport, you can access the API docs at http://127.0.0.1:8000/docs.
Configure MCP Clients
Once your server is running, connect it to your favorite AI assistant:
arcade configure claude # Configure Claude Desktop to connect to your stdio server in your current directory
arcade configure cursor --transport http --port 8080 # Configure Cursor to connect to your local HTTP server on port 8080
arcade configure vscode --entrypoint my_server.py # Configure VSCode to connect to your stdio server that will run when my_server.py is executed directly
Installing this Repo from Source
git clone https://github.com/ArcadeAI/arcade-mcp.git && cd arcade-mcp && make install
Support and Community
- Discord: Join our Discord community for real-time support and discussions.
- GitHub: Contribute or report issues on the Arcade GitHub repository.
- Documentation: Find in-depth guides and API references at Arcade Documentation.