### Overview Major restructuring from monolithic `arcade-ai` package to modular library architecture with standardized uv-based dependency management.  ### New Package Structure - **`arcade-tdk`** - Lightweight toolkit development kit (core decorators, auth) - **`arcade-core`** - Core execution engine and catalog functionality - **`arcade-serve`** - FastAPI/MCP server components - **`arcade-ai`** - Meta package that includes CLI functionality. Optionally include evals via the `evals` extra. Optionally include all packages via the `all` extra. ### Key Benefits - **Lighter Dependencies**: Toolkits now depend only on `arcade-tdk` (~2 deps) vs full `arcade-ai` (~30+ deps) - **Faster Builds**: uv provides 10-100x faster dependency resolution and installation - **Better Modularity**: Clear separation of concerns, consumers import only what they need - **Standard Tooling**: Eliminates custom poetry scripts, uses standard Python packaging ### Migration Impact - All 20 toolkits converted from poetry → uv with `arcade-tdk` dependencies plus `arcade-ai[evals]` and `arcade-serve` dev dependencies. When developing locally, devs should install toolkits via `make install-local`. - Modern Python 3.10+ type hints throughout - Standardized build system with hatchling backend - Enhanced Makefile with robust toolkit management commands - Removed `arcade dev` CLI command - Reduce the number of files created by `arcade new` and add an option to not generate a tests and evals folder. This foundation enables faster development cycles and cleaner dependency chains for the growing toolkit ecosystem. ### Todo After this PR is merged - [ ] Post-merge workflow(s) (release & publish containers, etc) - [ ] Release order plan. @EricGustin suggests releasing in the following order: 1. `arcade-core` version 0.1.0 2. `arcade-serve` version 0.1.0 and `arcade-tdk` version 0.1.0 3. `arcade-ai` version 2.0.0 4. Patch release for all toolkits (all changes in toolkits are internal refactors) - [ ] [Update docs](https://github.com/ArcadeAI/docs/pull/318) --------- Co-authored-by: Eric Gustin <eric@arcade.dev> Co-authored-by: Eric Gustin <34000337+EricGustin@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| arcade_evals | ||
| README.md | ||
Arcade Evals
Evaluation toolkit for testing Arcade tools.
Overview
Arcade Evals provides comprehensive evaluation capabilities for Arcade tools:
- Evaluation Framework: Cases, suites, and rubrics for systematic testing
- Critics: Different types of comparisons (binary, numeric, similarity, datetime)
- Tool Evaluation: Decorators and utilities for evaluating tool performance
- Result Analysis: Comprehensive evaluation results and reporting
Installation
pip install 'arcade-ai[evals]'
Usage
Basic Evaluation
from arcade_evals import EvalCase, EvalSuite, tool_eval
# Create evaluation cases
case1 = EvalCase(
input={"query": "What is 2+2?"},
expected_output="4"
)
case2 = EvalCase(
input={"query": "What is the capital of France?"},
expected_output="Paris"
)
# Create evaluation suite
suite = EvalSuite(cases=[case1, case2])
# Evaluate a tool
@tool_eval(suite)
def my_calculator(query: str) -> str:
# Tool implementation
return "4" if "2+2" in query else "Unknown"
Using Critics
from arcade_evals import NumericCritic, SimilarityCritic
# Numeric comparison
numeric_critic = NumericCritic(tolerance=0.1)
result = numeric_critic.evaluate(expected=10.0, actual=10.05)
# Similarity comparison
similarity_critic = SimilarityCritic(threshold=0.8)
result = similarity_critic.evaluate(
expected="The capital of France is Paris",
actual="Paris is the capital of France"
)
Advanced Evaluation
from arcade_evals import EvalRubric, ExpectedToolCall
# Create rubric with tool calls
rubric = EvalRubric(
expected_tool_calls=[
ExpectedToolCall(
tool_name="calculator",
parameters={"operation": "add", "a": 2, "b": 2}
)
]
)
# Evaluate with rubric
suite = EvalSuite(cases=[case1], rubric=rubric)
License
MIT License - see LICENSE file for details.