Commit graph

10 commits

Author SHA1 Message Date
jottakka
fe8ddfd500
[TOO-326] Windows papercuts (#768)
<!-- CURSOR_SUMMARY -->
> [!NOTE]
> **Medium Risk**
> Touches authentication/login flow, credentials-file permissions, and
subprocess lifecycle behavior across platforms; while mostly defensive,
regressions could impact login or process management on Windows/macOS
runners.
> 
> **Overview**
> Improves Windows/cross-platform reliability across the CLI and MCP
server: OAuth login now binds the callback server to `127.0.0.1`, avoids
slow loopback reverse-DNS, adds a configurable callback timeout
(`--timeout` + env default), and opens URLs via a Windows-friendly
`_open_browser` to avoid flashing console windows.
> 
> Centralizes CLI output via a shared `console` that forces UTF-8 on
Windows, standardizes UTF-8 file reads/writes throughout, tightens
credentials-file permissions on Windows using `icacls`, and adds shared
Windows subprocess helpers for **no-window** process creation and
graceful termination (used by `deploy`, MCP reload, and usage-tracking
worker).
> 
> Updates client configuration UX/robustness (Windows AppData resolution
via `platformdirs`, Cursor config path fallbacks + compatibility writes,
overwrite warnings, absolute `uv` path for GUI clients, safer path
display) and improves `deploy` child-process handling to avoid
pipe-buffer deadlocks while giving better debug-aware error messages.
> 
> Expands CI to run tests on Linux/Windows/macOS, adds a no-auth CLI
integration workflow, disables usage tracking in toolkits CI, and adds
extensive regression tests for Windows signals, subprocess cleanup,
UTF-8, and config-path edge cases; bumps `arcade-core` to `4.4.2` and
`arcade-mcp-server` to `1.17.2` (with updated dependency pin).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
0fabd8ca1cd647039ba6ddbdf3f7809c330bab9e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-02-25 13:18:16 -03:00
jottakka
98fad93d21
Adding MCP Servers supports to Arcade Evals (#689)
# MCP Server Tool Evaluation Support

## Overview
Add support for evaluating tools from remote MCP servers without
requiring Python callables. Enables direct evaluation of any
MCP-compatible tool server.

## What's New

### Core Features
- **`MCPToolRegistry`**: Evaluate tools from a single MCP server
- **`CompositeMCPRegistry`**: Evaluate tools from multiple MCP servers
simultaneously
- **Automatic loaders**: `load_from_stdio()` and `load_from_http()` to
fetch tools from running servers
- **Automatic namespacing**: Tools prefixed with server name (e.g.,
`server_tool_name`)
- **Smart name resolution**: Use short names if unique, full names if
ambiguous
- **OpenAI strict mode**: Automatic schema conversion prevents parameter
hallucinations

### Usage

**Automatic Loading:**
```python
from arcade_evals import load_from_stdio, MCPToolRegistry

# Load tools automatically from MCP server
tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-github"])
registry = MCPToolRegistry(tools)
```

**Single MCP Server:**
```python
from arcade_evals import MCPToolRegistry, ExpectedToolCall

registry = MCPToolRegistry(mcp_tools)
suite = EvalSuite(catalog=registry)

suite.add_case(
    expected_tool_calls=[
        ExpectedToolCall(tool_name="tool_name", args={...})
    ]
)
```

**Multiple MCP Servers:**
```python
from arcade_evals import CompositeMCPRegistry, load_from_stdio

# Load from multiple servers
github_tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-github"])
slack_tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-slack"])

composite = CompositeMCPRegistry(
    tool_lists={
        "github": github_tools,
        "slack": slack_tools,
    }
)

suite = EvalSuite(catalog=composite)

suite.add_case(
    expected_tool_calls=[
        ExpectedToolCall(tool_name="github_list_issues", args={...})
    ]
)
```

## Implementation

### Files Changed
- **`libs/arcade-evals/arcade_evals/registry.py`** (NEW): Registry
abstractions and implementations
- **`libs/arcade-evals/arcade_evals/loaders.py`** (NEW): Automatic tool
loading from MCP servers
- **`libs/arcade-evals/arcade_evals/eval.py`** (MODIFIED): Enhanced
`ExpectedToolCall` and evaluation logic
- **`libs/arcade-evals/arcade_evals/__init__.py`** (MODIFIED): Exported
new registries and loaders

### Key Technical Details
- Added `BaseToolRegistry` interface for abstraction
- `MCPToolRegistry` handles single server tools
- `CompositeMCPRegistry` manages multiple servers with collision
detection
- `load_from_stdio()` and `load_from_http()` for automatic tool
discovery
- Fixed name normalization bug: MCP tools use underscores (not dots)
- Optimized tool copying: 2.5x faster via shallow copy

## Testing
-  41 tests passing (25 new tests added)
-  `test_eval_mcp_registry.py`: MCPToolRegistry functionality
-  `test_eval_composite_mcp.py`: CompositeMCPRegistry with multiple
servers
-  Verified backward compatibility with Python tools

## Backward Compatibility
 **100% backward compatible** - No breaking changes


## Breaking Changes
**None**


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds end-to-end eval UX: examples, a robust CLI runner, and rich
outputs.
> 
> - **New examples**: `eval_arcade_gateway.py`,
`eval_stdio_mcp_server.py`, `eval_http_mcp_server.py`,
`eval_comprehensive_comparison.py` with timeouts, error handling, and
track-based comparisons; detailed `README.md`
> - **CLI runner**: `arcade_cli/evals_runner.py` to execute
evals/capture in parallel with progress, error isolation, failed-only
filtering, context inclusion, and multi-provider/model support
> - **Output formatters**: `arcade_cli/formatters/` (txt, md, html,
json) for evals and capture; comparative and multi-model HTML with tabs
and context rendering
> - **Display refactor**: `display.py` now supports writing multiple
formats, failed-only disclaimers, include-context, and improved console
summaries
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
ff8acf9c34a6b61462a019a1ee9df081006517d0. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Francisco Liberal <francisco@arcade.dev>
Co-authored-by: Mateo Torres <torresmateo@gmail.com>
2026-01-07 20:26:23 -03:00
Eric Gustin
98fd13c4ed
Front-Door Auth (#696)
# Valuable references for the reviewer:
- Docs PR: https://github.com/ArcadeAI/docs/pull/583
- Implements Phase 1 of the following planning doc:
https://linear.app/arcadedev/project/arcade-mcp-supports-mcp-auth-front-door-auth-7cbaa20cb054/overview


https://github.com/user-attachments/assets/79ad43fd-f5e8-4793-a1dd-18b35acefdc3

# PR Description
Adds OAuth 2.1 Resource Server authentication to arcade-mcp-server,
enabling HTTP MCP servers to validate Bearer tokens on every request.
This unlocks tool-level authorization and secrets support for HTTP
servers.

- Multiple authorization server support
- Granular token validation options (verify_exp, verify_iat, verify_iss)
- Environment variable configuration
- OAuth discovery metadata endpoint
(/.well-known/oauth-protected-resource)
- Extracts sub claim from token as context.user_id
- Lifts transport restrictions for tools requiring auth/secrets on HTTP
when protected

```python
from arcade_mcp_server import MCPApp
from arcade_mcp_server.resource_server import ResourceServerAuth, AuthorizationServerEntry

resource_server_auth = ResourceServerAuth(
    canonical_url="http://127.0.0.1:8000/mcp",
    authorization_servers=[
        AuthorizationServerEntry(
            authorization_server_url="https://auth.example.com",
            issuer="https://auth.example.com",
            jwks_uri="https://auth.example.com/jwks",
        )
    ],
)

app = MCPApp(name="my_server", version="1.0.0", auth=resource_server_auth)
```

# Testing
Beyond the comprehensive unit tests, I also manually tested end-to-end
with WorkOS Authkit (DCR) and KeyCloak (non-DCR).

# Future Work
- CIMD support
- An `ArcadeResourceServer` to make adding front-door auth super easy
when using Arcade's Auth Server



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds OAuth 2.1 front-door auth (JWKS validation + OAuth discovery) and
propagates user identity to tools, enabling auth/secret-requiring tools
over HTTP.
> 
> - **Authentication (Front-Door OAuth 2.1)**
> - New `resource_server` module with `ResourceServerAuth`
(multi-authorization-server, metadata) and `JWKSTokenValidator`
(JWKS-based JWT validation) plus granular validation options.
> - ASGI `ResourceServerMiddleware` validates Bearer tokens on every
HTTP request and injects `resource_owner`.
> - OAuth discovery endpoint via FastAPI router at
`/.well-known/oauth-protected-resource[/<path>]`.
> - **Integration**
> - `MCPApp`/`worker` accept `auth`/`resource_server_validator`, mount
middleware, expose discovery; logs accepted auth servers.
> - HTTP transport (`http_streamable`) carries `SessionMessage` with
`resource_owner` from request → session.
> - `Context`/`Session`/`Server` plumb `resource_owner`; `Server`
selects `user_id` preferring token `sub`.
> - **Behavior Changes**
> - HTTP transport restriction lifted for tools requiring
`authorization`/`secrets` when request is authenticated; otherwise
blocked with actionable error.
> - **Configuration**
> - Env-var based auth config via `MCP_RESOURCE_SERVER_*` in
`MCPSettings.ResourceServerSettings`; `.env` auto-load.
> - **Telemetry**
>   - Usage tracking records `resource_server_type` on server start.
> - **Examples**
> - New `examples/mcp_servers/authorization` sample server (HTTP auth,
secrets, Reddit tool) with Docker setup.
> - **Tests**
> - Extensive unit tests for validators, middleware, env config,
multi-AS, transport rules, and app integration.
> - **Version**
> - Bump `arcade-mcp-server` to `1.12.0`; minor docstring tweak in
`__init__.py`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
d1116cdcafb0c7cb8f91e66682eb1fbae380da31. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->





Resolves TOO-152
2025-12-11 12:51:20 -08:00
Eric Gustin
5602578b2f
Worker Stability (#688)
This PR does three things:
1. Executes synchronous tool calls in thread pool allowing for up to 4 +
# of CPUs executions in parallel.
2. Makes force quitting via double SIGINT/SIGTERM possible and via
single SIGINT/SIGTERM + graceful shutdown timeout expiry possible, even
if there are active connections.
3. Sets `timeout_graceful_shutdown` to
`ARCADE_UVICORN_TIMEOUT_GRACEFUL_SHUTDOWN` env var if set, else defaults
to 15.
4. Disable the worker health check span to reduce noise

Tradeoffs:
Since this PR introduces executing synchronous tools via `await
asyncio.to_thread(func, **func_args)`, this means that there is no way
for the thread to be killed until it finishes. The ramifications of this
is that the force quitting logic that is also implemented in this PR has
to be very harsh `os._exit(1)` just in case there is a sync tool
actively executing. This means that `MCPApp` teardown logic will not
execute when force quitting is required. Although this was already the
case because we weren't previously able to force quit! This tradeoff is
justified for now since "parallel" tool executions will relieve us of
many worker timeouts that we are seeing in prod.

Future work:
Minimize/eliminate the need for `os._exit(1)` such that `MCPApp`
teardown logic will always execute, even when force quitting. The
solution will likely be moving away from `await asyncio.to_thread(func,
**func_args)` (while maintaining "parallelism" and then utilize the
`TaskTrackerMiddleware` introduced in this PR to cancel all of the
active HTTP requests.

Resolves PLT-713
2025-11-20 11:13:41 -08:00
Eric Gustin
4ca824cf8f
Improve arcade deploy CLI Command (#634)
Also fleshed out `arcade server` commands and MCPApp.name validation.

Example of output of `arcade deploy`:
<img width="2112" height="1320" alt="image"
src="https://github.com/user-attachments/assets/51fd3dd9-0ff1-442c-a9bb-1dbcd7337e7a"
/>
2025-11-03 11:19:04 -08:00
Eric Gustin
19bbaddf75
Reload for MCPApp (#622)
Previously, MCPApp did not truly have reload capabilities. Instead, if
`reload=True`, then under the hood we would just change over to the
module execution code path (e.g., `arcade mcp`, or `python -m
arcade_mcp_server`). This was bad because custom `MCPApp` startup code
was not being executed and tools that were not added to `MCPApp`'s
catalog were being discovered and added to the server.

`MCPApp` now contains its own custom reload logic. It doesn't use
uvicorn's reload because uvicorn's discovery & factory pattern wasn't
the best fit for `MCPApp`'s self-contained pattern.

Now when `MCPApp.run(reload=True)` is called, `MCPApp` becomes the
parent process that manages reload itself.
2025-10-17 17:38:11 -07:00
Eric Gustin
a8fc6691e7
arcade deploy for MCP Servers (#618)
Blocked by https://github.com/ArcadeAI/arcade-mcp/pull/614 (and the
reason for failing tests)

# PR Description
`arcade deploy` will deploy your local MCP server to Arcade. `arcade
deploy` should be executed at the root of your MCP Server package.
Before deploying, the command runs your server locally to ensure your
project is setup correctly and the server runs properly. `arcade deploy`
assumes your entrypoint file will execute `MCPApp.run` when the file is
invoked directly. This means you must either have an `if __name__ ==
"__main__" block that contains `MCPApp.run`, or `MCPApp.run` should be
top-level code (unindented living directly in the body of the file).

<img width="3318" height="594" alt="image"
src="https://github.com/user-attachments/assets/8249843e-6f9d-4d01-854d-356b0aae5055"
/>

<img width="1662" height="1056" alt="image"
src="https://github.com/user-attachments/assets/f44951f2-2718-4799-aecc-0e22c1b951b8"
/>
2025-10-16 09:00:10 -07:00
Eric Gustin
83c0eeab2b
Fix server info bug (#614)
Name, title, version, etc. for an `MCPApp` were being overwritten by its
internal `MCPServer`.

⚠️ this is blocking `arcade deploy` from working
2025-10-13 13:04:18 -07:00
Eric Gustin
805ad2d888
Add envvar overrides for mcp app (#606)
Needed for deployed MCP servers so that the Engine can override port,
host, and transport.
2025-10-07 12:43:59 -07:00
Eric Gustin
3424ec8219
MCP Local (#563)
Versions:
* arcade-mcp\==1.0.0rc1
* arcade-mcp-server\==1.0.0rc1
* arcade-core\==2.5.0rc1
* arcade-tdk\==2.6.0rc1
* arcade-serve\==2.2.0rc1

### Summary
Adds first-class MCP support across Arcade, introduces a new MCP server
and CLI, unifies the project under the arcade-mcp name, overhauls
templates/scaffolding, and improves developer tooling, secrets
management, and examples.

### Highlights
- **MCP Server & Core**
- New MCP server with stdio and HTTP/SSE transports, session management,
resumability, and lifecycle handling.
- FastAPI-like `MCPApp` for building servers with lazy init; integrated
worker+MCP HTTP app option.
- Middleware system (logging and error handling), robust exception
hierarchy, and Pydantic-based settings.
- Async-safe managers for tools, resources, and prompts backed by
registries and locks.
- Developer-facing, transport-agnostic runtime context interfaces (logs,
tools, prompts, resources, sampling, UI, notifications).
- Conversion from Arcade ToolDefinition to MCP tool schema; OpenAI JSON
tool schema converter.
  - Parser supports `@app.tool`/`@app.tool(...)` decorators.

- **CLI**
  - New `mcp` command to run MCP servers with stdio or HTTP/SSE.
- New `secret` command to set/list/unset tool secrets (supports .env
input, preserves original casing for lookups).
- `new` command refactored; option to create a full toolkit package with
scaffolding.
  - `chat` command removed.
- `serve.py` imports updated to `arcade_serve.fastapi.telemetry`;
version retrieval now uses `arcade-mcp`.
  - `show.py` refactor to use new local catalog utilities.
- `display_tool_details` improved: adds “Default” column and handles
nested properties.

- **Configuration & Discovery**
- New `configure.py` to set up Claude Desktop, Cursor, and VS Code to
connect to local or Arcade Cloud MCP servers.
- Discovery utilities to find/install toolkits, build `ToolCatalog`s,
analyze files for tools, load kits from directories (pyproject parsing),
and build minimal toolkits.
- Better handling of provider API key resolution and evaluation suite
loading.

- **Templates & Scaffolding**
- Reorganized template structure (minimal vs full); moved
`.pre-commit-config.yaml`, `.ruff.toml`, license, Makefile, README,
tests, and tools layout to correct paths.
  - Minimal template adds `.env.example` for runtime secret injection.
- Template pyproject updated for MCP servers; includes sample server
with greeting and secret-reveal tools.
  - Authorization flow in templates simplified.

- **Repo-wide Renaming & Examples**
- Migrates references from `arcade-ai` to `arcade-mcp` across READMEs,
scripts, and package metadata.
- Examples updated (LangChain/LangGraph/AI SDK/TypeScript) and package
name changed to `arcade-mcp-sdk`.

- **Evals & Core Utilities**
- Evals now use OpenAI tooling format (`OpenAIToolList`, `to_openai`);
`tool_eval` takes `provider_api_key`.
- Core utilities: fixed `does_function_return_value` by dedenting before
parse; version bump to `2.5.0rc1` and dependency cleanup.

- **Tooling & CI**
- `setup-uv-env` action splits toolkit vs contrib dependency
installation.
- Pre-commit: excludes `libs/arcade-mcp-server/mkdocs.yml` and
`libs/tests/` from YAML and Ruff hooks; Ruff per-file ignores (e.g.,
C901 in `libs/**/*.py`, TRY400 in server docs paths).
- Makefile updates for uv env setup, quality checks, tests, builds, and
new `shell` target.
  - Added Makefile to MCP server library to streamline dev workflow.

- **Cleanup**
  - Removed `claude.json` config.
- Simplified stdio entrypoint; removed unused imports (`arcade_gmail`,
`arcade_search`).

### Breaking Changes
- **CLI**: `chat` command removed; use `mcp`, `secret`, and updated
`new`.
- **Naming**: All users should update references from `arcade-ai` to
`arcade-mcp`.
- **Templates**: File paths moved; downstream scripts referencing old
template locations may need updates.

### Getting Started
- Run an MCP server:
  - `arcade mcp --stdio --toolkits your_toolkit`
  - `arcade mcp --http --toolkits your_toolkit`
- Manage secrets:
  - `arcade secret set your_toolkit KEY=value`
  - `arcade secret list your_toolkit`
  - `arcade secret unset your_toolkit KEY`
- Configure clients:
- `arcade configure` to set up Claude Desktop, Cursor, and VS Code for
local/Arcade Cloud MCP.

---------

Co-authored-by: Sam Partee <sam@arcade-ai.com>
Co-authored-by: Shub <125150494+shubcodes@users.noreply.github.com>
2025-09-25 15:28:15 -07:00