arcade-mcp

History

jottakka 7472b18106 Fixing bug with multiple providers + stats for multiple runs (#752 ) @EricGustin you can use this cli command: ``` uv run arcade evals mcp_building_evals_results/eval_toolkit_iteration_dict.py \ -p openai:gpt-4o,gpt-4o-mini \ -p anthropic:claude-sonnet-4-20250514 \ -k openai:$OPENAI_API_KEY \ -k anthropic:$ANTHROPIC_API_KEY \ -d \ --num-runs 3 \ --seed random \ --multi-run-pass-rule majority \ --max-concurrent 6 \ -o mcp_building_evals_results/results ``` <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Touches core eval execution and all result formatters while adding new CLI inputs and output schema (`run_stats`/`critic_stats` and capture `runs`), so regressions could affect evaluation results and report compatibility despite being additive and validated. > > Overview > Adds multi-run evaluation support to `arcade evals` via new flags `--num-runs`, `--seed`, and `--multi-run-pass-rule`, with upfront validation and plumbing through the CLI runner into eval/capture suite execution. > > Fixes provider selection UX/bug by making `--use-provider/-p` repeatable (instead of a space-delimited string), updates docs/examples accordingly, and extends capture mode to optionally record per-run tool calls (`CapturedRun`) when `num_runs > 1`. > > Enhances all output formatters (HTML/Markdown/Text/JSON) to propagate and display per-case `run_stats` and `critic_stats`, including new HTML UI for run tabs/cards and comparative tables showing mean ± stddev when multi-run data is present. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 2ee1654b7d1fbb9538373507355636164b16a066. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->		2026-02-09 14:25:28 -03:00
..
arcade_math	Add toolkits (#514 )	2025-07-25 15:44:06 -07:00
evals	Fixing bug with multiple providers + stats for multiple runs (#752 )	2026-02-09 14:25:28 -03:00
tests	Add toolkits (#514 )	2025-07-25 15:44:06 -07:00
.pre-commit-config.yaml	Add toolkits (#514 )	2025-07-25 15:44:06 -07:00
.ruff.toml	Add toolkits (#514 )	2025-07-25 15:44:06 -07:00
LICENSE	Add toolkits (#514 )	2025-07-25 15:44:06 -07:00
Makefile	Add toolkits (#514 )	2025-07-25 15:44:06 -07:00
pyproject.toml	Update toolkit deps (#627 )	2025-10-16 09:05:56 -07:00