arcade-mcp/examples
Sam Partee db948125d5
Tool Evalulation SDK (#35)
1. New Eval SDK (`arcade/sdk/eval.py`):
- Introduces `EvalSuite`, `EvalCase`, and `EvalRubric` classes for
structured evaluation.
- Implements various Critic classes (Binary, Numeric, Similarity) for
flexible scoring.
- Adds a `tool_eval` decorator for easy integration with existing tools.

2. CLI Integration (`arcade/cli/main.py` and `arcade/cli/utils.py`):
   - Adds an `evals` command to run evaluation suites from the CLI.
   - Implements result display functionality for evaluation outcomes.

3. Toolkit Updates:
- Adds evaluation scripts for Gmail
([toolkits/gmail/evals/eval_gmail_tools.py](file:///Users/spartee/Dropbox/Arcade/platform/Team/arcade-ai/toolkits/gmail/evals/eval_gmail_tools.py#1%2C1-1%2C1))
and Slack
([toolkits/slack/evals/eval_slack_messaging.py](file:///Users/spartee/Dropbox/Arcade/platform/Team/arcade-ai/toolkits/slack/evals/eval_slack_messaging.py#1%2C1-1%2C1))
toolkits.
- Demonstrates practical usage of the Eval SDK with real-world
scenarios.

4. Miscellaneous:
- Updates `arcade/cli/new.py` to optionally generate an `evals`
directory for new toolkits.

---------

Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
2024-09-19 03:36:44 -07:00
..
fastapi gmail: search_emails_by_header tool (#28) 2024-08-30 15:29:02 -07:00
flask Pass ToolContext and CLI cleanup (#13) 2024-08-13 15:40:08 -07:00
langchain Tool auth (#30) 2024-09-09 15:00:17 -07:00
.gitignore Pass ToolContext and CLI cleanup (#13) 2024-08-13 15:40:08 -07:00
modal-deploy.py Tool Evalulation SDK (#35) 2024-09-19 03:36:44 -07:00