arcade-mcp

Author	SHA1	Message	Date
jottakka	fe8ddfd500	[TOO-326] Windows papercuts (#768 ) <!-- CURSOR_SUMMARY --> > [!NOTE] > Medium Risk > Touches authentication/login flow, credentials-file permissions, and subprocess lifecycle behavior across platforms; while mostly defensive, regressions could impact login or process management on Windows/macOS runners. > > Overview > Improves Windows/cross-platform reliability across the CLI and MCP server: OAuth login now binds the callback server to `127.0.0.1`, avoids slow loopback reverse-DNS, adds a configurable callback timeout (`--timeout` + env default), and opens URLs via a Windows-friendly `_open_browser` to avoid flashing console windows. > > Centralizes CLI output via a shared `console` that forces UTF-8 on Windows, standardizes UTF-8 file reads/writes throughout, tightens credentials-file permissions on Windows using `icacls`, and adds shared Windows subprocess helpers for no-window process creation and graceful termination (used by `deploy`, MCP reload, and usage-tracking worker). > > Updates client configuration UX/robustness (Windows AppData resolution via `platformdirs`, Cursor config path fallbacks + compatibility writes, overwrite warnings, absolute `uv` path for GUI clients, safer path display) and improves `deploy` child-process handling to avoid pipe-buffer deadlocks while giving better debug-aware error messages. > > Expands CI to run tests on Linux/Windows/macOS, adds a no-auth CLI integration workflow, disables usage tracking in toolkits CI, and adds extensive regression tests for Windows signals, subprocess cleanup, UTF-8, and config-path edge cases; bumps `arcade-core` to `4.4.2` and `arcade-mcp-server` to `1.17.2` (with updated dependency pin). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 0fabd8ca1cd647039ba6ddbdf3f7809c330bab9e. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-02-25 13:18:16 -03:00
jottakka	7472b18106	Fixing bug with multiple providers + stats for multiple runs (#752 ) @EricGustin you can use this cli command: ``` uv run arcade evals mcp_building_evals_results/eval_toolkit_iteration_dict.py \ -p openai:gpt-4o,gpt-4o-mini \ -p anthropic:claude-sonnet-4-20250514 \ -k openai:$OPENAI_API_KEY \ -k anthropic:$ANTHROPIC_API_KEY \ -d \ --num-runs 3 \ --seed random \ --multi-run-pass-rule majority \ --max-concurrent 6 \ -o mcp_building_evals_results/results ``` <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Touches core eval execution and all result formatters while adding new CLI inputs and output schema (`run_stats`/`critic_stats` and capture `runs`), so regressions could affect evaluation results and report compatibility despite being additive and validated. > > Overview > Adds multi-run evaluation support to `arcade evals` via new flags `--num-runs`, `--seed`, and `--multi-run-pass-rule`, with upfront validation and plumbing through the CLI runner into eval/capture suite execution. > > Fixes provider selection UX/bug by making `--use-provider/-p` repeatable (instead of a space-delimited string), updates docs/examples accordingly, and extends capture mode to optionally record per-run tool calls (`CapturedRun`) when `num_runs > 1`. > > Enhances all output formatters (HTML/Markdown/Text/JSON) to propagate and display per-case `run_stats` and `critic_stats`, including new HTML UI for run tabs/cards and comparative tables showing mean ± stddev when multi-run data is present. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 2ee1654b7d1fbb9538373507355636164b16a066. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-02-09 14:25:28 -03:00
Eric Gustin	a4160dd9fe	Four bug fixes (#754 ) 1. Resolves [TOO-363](https://linear.app/arcadedev/issue/TOO-363/arcade-deploy-fails-when-additional-deps-are-added-to-the-server). 2. Resolves [TOO-364](https://linear.app/arcadedev/issue/TOO-364/arcade-cores-tool-skip-logic-is-missing-case-for-direct-execution). 3. Resolves [TOO-358](https://linear.app/arcadedev/issue/TOO-358/missing-evals-error-message-shows-wrong-command). 4. Resolves [TOO-365](https://linear.app/arcadedev/issue/TOO-365/arcade-evals-unit-tests-are-hanging). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Medium risk because it changes how `arcade deploy` spawns the server process and adjusts toolkit discovery skip logic, which can affect deployments and tool discovery; however, the changes are small and covered by new unit/integration tests. > > Overview > `arcade deploy` now starts the validation server using the project’s `.venv` interpreter (via `find_python_interpreter`) instead of the CLI’s own `sys.executable`, preventing missing dependency failures when the CLI is installed in an isolated env. > > `arcade-core`’s `Toolkit.tools_from_directory` skip logic is hardened to also skip the currently executing entrypoint by module name (`__main__.__spec__.name`) when file paths don’t match (e.g., bundled execution). CLI error printing now escapes plain messages to avoid rich markup issues, and `arcade-evals` lock acquisition accepts an optional timeout default. > > Adds unit tests for the new toolkit skip behavior and an integration test that boots the MCP server via direct Python invocation to mirror deployment behavior, and bumps `arcade-core`, `arcade-mcp-server`, and root dependency versions accordingly. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit e7785634c231c059f2e0bd1bc73a56bd7470a494. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-01-29 15:12:06 -08:00
jottakka	98fad93d21	Adding MCP Servers supports to Arcade Evals (#689 ) # MCP Server Tool Evaluation Support ## Overview Add support for evaluating tools from remote MCP servers without requiring Python callables. Enables direct evaluation of any MCP-compatible tool server. ## What's New ### Core Features - `MCPToolRegistry`: Evaluate tools from a single MCP server - `CompositeMCPRegistry`: Evaluate tools from multiple MCP servers simultaneously - Automatic loaders: `load_from_stdio()` and `load_from_http()` to fetch tools from running servers - Automatic namespacing: Tools prefixed with server name (e.g., `server_tool_name`) - Smart name resolution: Use short names if unique, full names if ambiguous - OpenAI strict mode: Automatic schema conversion prevents parameter hallucinations ### Usage Automatic Loading: ```python from arcade_evals import load_from_stdio, MCPToolRegistry # Load tools automatically from MCP server tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-github"]) registry = MCPToolRegistry(tools) ``` Single MCP Server: ```python from arcade_evals import MCPToolRegistry, ExpectedToolCall registry = MCPToolRegistry(mcp_tools) suite = EvalSuite(catalog=registry) suite.add_case( expected_tool_calls=[ ExpectedToolCall(tool_name="tool_name", args={...}) ] ) ``` Multiple MCP Servers: ```python from arcade_evals import CompositeMCPRegistry, load_from_stdio # Load from multiple servers github_tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-github"]) slack_tools = load_from_stdio(["npx", "-y", "@modelcontextprotocol/server-slack"]) composite = CompositeMCPRegistry( tool_lists={ "github": github_tools, "slack": slack_tools, } ) suite = EvalSuite(catalog=composite) suite.add_case( expected_tool_calls=[ ExpectedToolCall(tool_name="github_list_issues", args={...}) ] ) ``` ## Implementation ### Files Changed - `libs/arcade-evals/arcade_evals/registry.py` (NEW): Registry abstractions and implementations - `libs/arcade-evals/arcade_evals/loaders.py` (NEW): Automatic tool loading from MCP servers - `libs/arcade-evals/arcade_evals/eval.py` (MODIFIED): Enhanced `ExpectedToolCall` and evaluation logic - `libs/arcade-evals/arcade_evals/__init__.py` (MODIFIED): Exported new registries and loaders ### Key Technical Details - Added `BaseToolRegistry` interface for abstraction - `MCPToolRegistry` handles single server tools - `CompositeMCPRegistry` manages multiple servers with collision detection - `load_from_stdio()` and `load_from_http()` for automatic tool discovery - Fixed name normalization bug: MCP tools use underscores (not dots) - Optimized tool copying: 2.5x faster via shallow copy ## Testing - ✅ 41 tests passing (25 new tests added) - ✅ `test_eval_mcp_registry.py`: MCPToolRegistry functionality - ✅ `test_eval_composite_mcp.py`: CompositeMCPRegistry with multiple servers - ✅ Verified backward compatibility with Python tools ## Backward Compatibility ✅ 100% backward compatible - No breaking changes ## Breaking Changes None <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds end-to-end eval UX: examples, a robust CLI runner, and rich outputs. > > - New examples: `eval_arcade_gateway.py`, `eval_stdio_mcp_server.py`, `eval_http_mcp_server.py`, `eval_comprehensive_comparison.py` with timeouts, error handling, and track-based comparisons; detailed `README.md` > - CLI runner: `arcade_cli/evals_runner.py` to execute evals/capture in parallel with progress, error isolation, failed-only filtering, context inclusion, and multi-provider/model support > - Output formatters: `arcade_cli/formatters/` (txt, md, html, json) for evals and capture; comparative and multi-model HTML with tabs and context rendering > - Display refactor: `display.py` now supports writing multiple formats, failed-only disclaimers, include-context, and improved console summaries > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit ff8acf9c34a6b61462a019a1ee9df081006517d0. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Francisco Liberal <francisco@arcade.dev> Co-authored-by: Mateo Torres <torresmateo@gmail.com>	2026-01-07 20:26:23 -03:00
Eric Gustin	3424ec8219	MCP Local (#563 ) Versions: * arcade-mcp\==1.0.0rc1 * arcade-mcp-server\==1.0.0rc1 * arcade-core\==2.5.0rc1 * arcade-tdk\==2.6.0rc1 * arcade-serve\==2.2.0rc1 ### Summary Adds first-class MCP support across Arcade, introduces a new MCP server and CLI, unifies the project under the arcade-mcp name, overhauls templates/scaffolding, and improves developer tooling, secrets management, and examples. ### Highlights - MCP Server & Core - New MCP server with stdio and HTTP/SSE transports, session management, resumability, and lifecycle handling. - FastAPI-like `MCPApp` for building servers with lazy init; integrated worker+MCP HTTP app option. - Middleware system (logging and error handling), robust exception hierarchy, and Pydantic-based settings. - Async-safe managers for tools, resources, and prompts backed by registries and locks. - Developer-facing, transport-agnostic runtime context interfaces (logs, tools, prompts, resources, sampling, UI, notifications). - Conversion from Arcade ToolDefinition to MCP tool schema; OpenAI JSON tool schema converter. - Parser supports `@app.tool`/`@app.tool(...)` decorators. - CLI - New `mcp` command to run MCP servers with stdio or HTTP/SSE. - New `secret` command to set/list/unset tool secrets (supports .env input, preserves original casing for lookups). - `new` command refactored; option to create a full toolkit package with scaffolding. - `chat` command removed. - `serve.py` imports updated to `arcade_serve.fastapi.telemetry`; version retrieval now uses `arcade-mcp`. - `show.py` refactor to use new local catalog utilities. - `display_tool_details` improved: adds “Default” column and handles nested properties. - Configuration & Discovery - New `configure.py` to set up Claude Desktop, Cursor, and VS Code to connect to local or Arcade Cloud MCP servers. - Discovery utilities to find/install toolkits, build `ToolCatalog`s, analyze files for tools, load kits from directories (pyproject parsing), and build minimal toolkits. - Better handling of provider API key resolution and evaluation suite loading. - Templates & Scaffolding - Reorganized template structure (minimal vs full); moved `.pre-commit-config.yaml`, `.ruff.toml`, license, Makefile, README, tests, and tools layout to correct paths. - Minimal template adds `.env.example` for runtime secret injection. - Template pyproject updated for MCP servers; includes sample server with greeting and secret-reveal tools. - Authorization flow in templates simplified. - Repo-wide Renaming & Examples - Migrates references from `arcade-ai` to `arcade-mcp` across READMEs, scripts, and package metadata. - Examples updated (LangChain/LangGraph/AI SDK/TypeScript) and package name changed to `arcade-mcp-sdk`. - Evals & Core Utilities - Evals now use OpenAI tooling format (`OpenAIToolList`, `to_openai`); `tool_eval` takes `provider_api_key`. - Core utilities: fixed `does_function_return_value` by dedenting before parse; version bump to `2.5.0rc1` and dependency cleanup. - Tooling & CI - `setup-uv-env` action splits toolkit vs contrib dependency installation. - Pre-commit: excludes `libs/arcade-mcp-server/mkdocs.yml` and `libs/tests/` from YAML and Ruff hooks; Ruff per-file ignores (e.g., C901 in `libs/*/.py`, TRY400 in server docs paths). - Makefile updates for uv env setup, quality checks, tests, builds, and new `shell` target. - Added Makefile to MCP server library to streamline dev workflow. - Cleanup - Removed `claude.json` config. - Simplified stdio entrypoint; removed unused imports (`arcade_gmail`, `arcade_search`). ### Breaking Changes - CLI: `chat` command removed; use `mcp`, `secret`, and updated `new`. - Naming: All users should update references from `arcade-ai` to `arcade-mcp`. - Templates: File paths moved; downstream scripts referencing old template locations may need updates. ### Getting Started - Run an MCP server: - `arcade mcp --stdio --toolkits your_toolkit` - `arcade mcp --http --toolkits your_toolkit` - Manage secrets: - `arcade secret set your_toolkit KEY=value` - `arcade secret list your_toolkit` - `arcade secret unset your_toolkit KEY` - Configure clients: - `arcade configure` to set up Claude Desktop, Cursor, and VS Code for local/Arcade Cloud MCP. --------- Co-authored-by: Sam Partee <sam@arcade-ai.com> Co-authored-by: Shub <125150494+shubcodes@users.noreply.github.com>	2025-09-25 15:28:15 -07:00
Sam Partee	b6b4cd0a4c	🏗️ Restructure: Multi-Package Architecture + uv Migration (#412 ) ### Overview Major restructuring from monolithic `arcade-ai` package to modular library architecture with standardized uv-based dependency management. ![arcade-ai Monorepo (2)](https://github.com/user-attachments/assets/25f102b0-bb87-4a04-9701-d227d05664b1) ### New Package Structure - `arcade-tdk` - Lightweight toolkit development kit (core decorators, auth) - `arcade-core` - Core execution engine and catalog functionality - `arcade-serve` - FastAPI/MCP server components - `arcade-ai` - Meta package that includes CLI functionality. Optionally include evals via the `evals` extra. Optionally include all packages via the `all` extra. ### Key Benefits - Lighter Dependencies: Toolkits now depend only on `arcade-tdk` (~2 deps) vs full `arcade-ai` (~30+ deps) - Faster Builds: uv provides 10-100x faster dependency resolution and installation - Better Modularity: Clear separation of concerns, consumers import only what they need - Standard Tooling: Eliminates custom poetry scripts, uses standard Python packaging ### Migration Impact - All 20 toolkits converted from poetry → uv with `arcade-tdk` dependencies plus `arcade-ai[evals]` and `arcade-serve` dev dependencies. When developing locally, devs should install toolkits via `make install-local`. - Modern Python 3.10+ type hints throughout - Standardized build system with hatchling backend - Enhanced Makefile with robust toolkit management commands - Removed `arcade dev` CLI command - Reduce the number of files created by `arcade new` and add an option to not generate a tests and evals folder. This foundation enables faster development cycles and cleaner dependency chains for the growing toolkit ecosystem. ### Todo After this PR is merged - [ ] Post-merge workflow(s) (release & publish containers, etc) - [ ] Release order plan. @EricGustin suggests releasing in the following order: 1. `arcade-core` version 0.1.0 2. `arcade-serve` version 0.1.0 and `arcade-tdk` version 0.1.0 3. `arcade-ai` version 2.0.0 4. Patch release for all toolkits (all changes in toolkits are internal refactors) - [ ] [Update docs](https://github.com/ArcadeAI/docs/pull/318) --------- Co-authored-by: Eric Gustin <eric@arcade.dev> Co-authored-by: Eric Gustin <34000337+EricGustin@users.noreply.github.com>	2025-06-11 16:48:17 -07:00

6 commits