1. New Eval SDK (`arcade/sdk/eval.py`):
- Introduces `EvalSuite`, `EvalCase`, and `EvalRubric` classes for
structured evaluation.
- Implements various Critic classes (Binary, Numeric, Similarity) for
flexible scoring.
- Adds a `tool_eval` decorator for easy integration with existing tools.
2. CLI Integration (`arcade/cli/main.py` and `arcade/cli/utils.py`):
- Adds an `evals` command to run evaluation suites from the CLI.
- Implements result display functionality for evaluation outcomes.
3. Toolkit Updates:
- Adds evaluation scripts for Gmail
([toolkits/gmail/evals/eval_gmail_tools.py](file:///Users/spartee/Dropbox/Arcade/platform/Team/arcade-ai/toolkits/gmail/evals/eval_gmail_tools.py#1%2C1-1%2C1))
and Slack
([toolkits/slack/evals/eval_slack_messaging.py](file:///Users/spartee/Dropbox/Arcade/platform/Team/arcade-ai/toolkits/slack/evals/eval_slack_messaging.py#1%2C1-1%2C1))
toolkits.
- Demonstrates practical usage of the Eval SDK with real-world
scenarios.
4. Miscellaneous:
- Updates `arcade/cli/new.py` to optionally generate an `evals`
directory for new toolkits.
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
Note - This Engine PR must go first:
https://github.com/ArcadeAI/Engine/pull/65
In this PR:
- Add `client.tool.authorize` to authorize a tool by name by @Spartee
- Refactored client.auth methods to always pass around scopes (as needed
by the above Engine PR) by @nbarbettini
- Reduced the scopes needed in the Slack toolkit, which was blocked by
this until now! @nbarbettini
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
Working now:
- `arcade login` works against the Cloud
- `arcade logout` deletes your local credentials
---------
Co-authored-by: Sam Partee <sam@arcade-ai.com>
Protects Actors by requiring the Engine to send a token signed with the
Arcade API key.
Also, updates the FastAPI example so the Arcade API key is sent to the
Engine in a chat request.
Added
- `arcade dev` - serves a simple fastapi actor
- `arcade config` - show/edit/change config in `~/.arcade`
- `arcade chat` - chat with LLM without toolcalls
Changed:
- `arcade show`, `arcade run` - can now use all installed toolkits
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
- Adds initial `ToolContext` to tool invocations
- This unlocks the ability to call authenticated tools (e.g. Gmail),
which works in this branch against Nate's dev engine
As the title states.
Also added in a prompt/execute capabiltiy into the CLI for the ``arcade
run`` command
Committed by @nbarbettini
Original work by @Spartee
This PR makes a few sweeping changes to the actor, cli, and overall
structure of the project.
- CLI commands skeleton
- ``arcade run``, ``arcade show``, and ``arcade new``
- Working package mangement solution (``arcade_`` packages)
- Actor approach for using frameworks other than FastAPI
- Client for calling Engine within ``arcade/core``
- beginning of the config interface.
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
MyPy compliance for the whole codebase
- systematic way of executing tools (`executor.py`)
- support for using pydantic models in tool inputs and outputs
- mypy compliance (most of the changes)
- removal of unused code (from previous iterations)
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>