1. New Eval SDK (`arcade/sdk/eval.py`): - Introduces `EvalSuite`, `EvalCase`, and `EvalRubric` classes for structured evaluation. - Implements various Critic classes (Binary, Numeric, Similarity) for flexible scoring. - Adds a `tool_eval` decorator for easy integration with existing tools. 2. CLI Integration (`arcade/cli/main.py` and `arcade/cli/utils.py`): - Adds an `evals` command to run evaluation suites from the CLI. - Implements result display functionality for evaluation outcomes. 3. Toolkit Updates: - Adds evaluation scripts for Gmail ([toolkits/gmail/evals/eval_gmail_tools.py](file:///Users/spartee/Dropbox/Arcade/platform/Team/arcade-ai/toolkits/gmail/evals/eval_gmail_tools.py#1%2C1-1%2C1)) and Slack ([toolkits/slack/evals/eval_slack_messaging.py](file:///Users/spartee/Dropbox/Arcade/platform/Team/arcade-ai/toolkits/slack/evals/eval_slack_messaging.py#1%2C1-1%2C1)) toolkits. - Demonstrates practical usage of the Eval SDK with real-world scenarios. 4. Miscellaneous: - Updates `arcade/cli/new.py` to optionally generate an `evals` directory for new toolkits. --------- Co-authored-by: Nate Barbettini <nate@arcade-ai.com> |
||
|---|---|---|
| .. | ||
| fastapi | ||
| flask | ||
| langchain | ||
| .gitignore | ||
| modal-deploy.py | ||