SDK support for tool secrets (stored and managed by the engine):
- [x] New `requires_secrets=` option in the `@tool` decorator
- [x] Internal plumbing in the catalog and `ToolContext`
- [x] Full test coverage of all added code
- [x] Bumped minor version (new feature)
This PR can be merged without waiting for Engine changes, because it is
additive only (no breaking changes).
After this is merged, I will open another PR to update existing toolkits
that will benefit from this feature!
## PR Description
Add the ability to mark a tool as deprecated and display the warning in
the user's runtime. This PR also lays the foundation for future work for
emitting other levels of logs (debug, info, etc) that occur during the
tool's execution.
NOTE: Updates to the Arcade Clients (Python and JS) still need to be
done before the deprecation warning is emitted, but this PR needs to be
merged before those updates!
Let's cross our fingers that we'll never need to deprecate
`@tool.deprecated`!
### Example
1. Mark your tool as deprecated
```python
from typing import Annotated
from arcade.sdk import tool
@tool.deprecated("Use the 'Math.AddInt' tool instead.") # order of decorators does not matter
@tool
def add(
a: Annotated[int, "The first number"], b: Annotated[int, "The second number"]
) -> Annotated[int, "The sum of the two numbers"]:
"""
Add two numbers together
"""
return a + b
```
2. Call the deprecated tool
```python
from arcadepy import Arcade
client = Arcade()
tool_input = {"a": 9001, "b": 42}
response = client.tools.execute(
tool_name="Math.Add",
input=tool_input,
user_id="me@example.com",
)
print(f"The result of adding {tool_input['a']} and {tool_input['b']} is: {response.output.value}")
```
3. Observe the DeprecationWarning:
```
❯ python examples/call_a_tool_directly.py
/Users/ericgustin/repos/Team/arcade-ai/examples/call_a_tool_directly.py:22: DeprecationWarning: 'Math.Add' is deprecated: Use the `Math.AddInt` tool instead.
response = client.tools.execute(
The result of adding 9001 and 42 is: 9043
```
# PR Description
This PR adds ~~four~~ three improvements to evals.
~~## 1. Add parameterized eval cases~~
~~Adds a new method named `add_parameterized_case`. Just like pytest’s
parameterized tests, eval cases can be parameterized with multiple user
messages. Adds a case to the `EvalSuite` for each user message. All
cases have the same expected tool call(s), params, additional_messages.
This reduces duplicate code and makes it easy to observe how a model
performs based on increasingly more difficult prompts.~~
```python
""" NO LONGER IN THIS PR
user_messages = [
"Call the delete tweet by id tool with the tweet ID '148975632'.",
"Delete the tweet with ID '148975632'.",
"I don't want to have this tweet (148975632) on my account anymore.",
"do the opposite of post for https://x.com/x/status/148975632",
]
suite.add_parameterized_case(
name="Delete a tweet by ID",
user_messages=user_messages,
expected_tool_calls=[
ExpectedToolCall(
func=delete_tweet_by_id,
args={"tweet_id": "148975632"},
)
],
critics=[
BinaryCritic(
critic_field="tweet_id",
weight=1.0,
),
],
)
"""
```
~~PASSED Delete a tweet by ID (user_message 1 of 4) -- Score: 100.00%~~
~~PASSED Delete a tweet by ID (user_message 2 of 4) -- Score: 100.00%~~
~~PASSED Delete a tweet by ID (user_message 3 of 4) -- Score: 100.00%~~
~~FAILED Delete a tweet by ID (user_message 4 of 4) -- Score: 0.00%~~
~~Summary -- Total: 4 -- Passed: 3 -- Failed: 1~~
## 2. Parameters that are not explicitly criticized are assigned a
`NoneCritic`.
A NoneCritic has no effect on the evaluation results and does not
actually evaluate. Parameters that have a NoneCritic will be displayed
as ‘un-criticized’ in the evaluation summary (if `-d` flag is used).

## 3. Add a hardcoded `seed` parameter for evals.
The seed parameter aides in receiving (mostly) consistent outputs -
aiding in reproducibility for evaluations.
## 4. Disallow more than one critic for the same field.
Raises a `ValueError` if more than one critic is assigned to a field.
---------
Co-authored-by: Eric Gustin <eric@arcade-ai.com>
The CLI was starting the login process on the old domain, which is why
we were seeing a CSRF error. The cookie was placed on `arcade-ai.com`
not `arcade.dev`!
# PR Description
This PR introduces a new environment variable `ARCADE_DISABLED_TOOLS`.
Tools that are added to this env var are not added to the worker's
`ToolCatalog`. In effect, they are disabled for the worker.
## How to use the `ARCADE_DISABLED_TOOLS` environment variable
* Each tool is separated by a comma.
* For each tool, specify the toolkit name in camel case and the tool
name in camel case.
* Do not include versions. (This is a simple implementation. We can add
disabling specific versions in the future if needed)
* Separate the toolkit name and the tool name with your environment's
tool name separator. By default, the tool name separator is `.`, but you
can override this with the `ARCADE_TOOL_NAME_SEPARATOR` environment
variable.
Correct: `export
ARCADE_DISABLED_TOOLS="Math.Add,Spotify.GetAvailableDevices,Math.Sqrt"`
Incorrect: `export
ARCADE_DISABLED_TOOLS="Math.Add@0.1.0,Spotify.get_available_devices,Sqrt`
# PR Description
Well known providers (Google, X, Dropbox, etc.) can optionally have an
`id` in addition to their hardcoded `provider_id`. For non well known
providers, they must provide an `id`, and the `provider_id` is hardcoded
as `None`.
```python
OAuth2() # INVALID
OAuth2(provider_id="abc") # INVALID
OAuth2(id="abc") # VALID
OAuth2(provider_id="abc", id="def") # INVALID
```
```python
Google() # VALID
Google(provider_id="abc") # INVALID
Google(id="abc") # VALID
Google(provider_id="abc", id="def") # INVALID
```
---------
Co-authored-by: Wils Dawson <wils@arcade-ai.com>
# PR Description
* Adds/updates the following files to all toolkits:
- `.pre-commit-config.yaml`
- `.ruff.toml`
- `LICENSE`
- `Makefile`
- `pyproject.toml`
* Lint all toolkits such that they pass `make check` and `make test` (a
total doozy). This includes adding some unit tests and evals.
* Github workflow for testing toolkits before merge into main (courtesy
of @sdreyer)
* Added a QOL improvement for tool developers for when they need to get
the context's auth token.
* Minor updates to `arcade new` template.
# PR Description
This PR renames `ExpectedToolCall` to `NamedExpectedToolCall` and then
creates a new dataclass called `ExpectedToolCall`. `ExpectedToolCall`
can be passed to the `EvalSuite.add_case` and `EvalSuite.extend_case`
methods.
1. Enhance `EvalSuite.add_case` and `EvalSuite.extend_case` by accepting
a list of `ExpectedToolCall` as their `expected_tool_calls` input
parameter. This helps create a scaffolding for developers. Previously,
the expected type was `list[tuple[Callable, dict[str, Any]]]`, which is
still valid for backward compatibility.
```python
# Before (still valid for backward compatibility)
expected_tool_calls=[
(
adjust_playback_position,
{
"absolute_position_ms": 10000,
},
)
]
# After
expected_tool_calls=[
ExpectedToolCall(
func=adjust_playback_position,
args={"absolute_position_ms": 10000},
)
]
```
2. Removed any references to arcade.core in toolkits directory.
3. Some linting for import organization.
# PR Description
### The following bug was observed:
* When connected to the cloud engine for `arcade chat`, and the user
types `/show`, then the local environment tools are displayed. Instead,
the cloud engine's tools should be displayed.
### Why was this bug happening?:
* When a user entered the `/show` command, the CLI Command `show` was
being called directly. Since the function was a CLI command, the `local`
parameter was not being processed and resolved to its intended value
because the Typer CLI interface was being bypassed. So, the conditional
`if local:` would always evaluate to `True`.
### How this was fixed:
* I created a wrapper function for the `show` CLI Command. Now, when the
user types `/show`, then the wrapper function is called instead of the
`show` CLI command. This ensures that all input parameters are resolved
to their intended values.
1. Fixes bug where arcade login doesn't work for localhost
- `arcade login -h localhost` will open login page at
`http://localhost:8000/...`
- Optionally specify the port: `arcade login -h localhost -p 8000`
3. Adds `local` flag to `arcade show`
- `-h localhost`, `-h 127.0.0.1`, and `-h 0.0.0.0` shows the tools that
are in the local engine's catalog
- `--local` show the tools that are in the local environment.
This PR ensures that `arcade.core` does not show up anywhere in "user
space". This is crucial for helping developers understand what objects
are safe to use, and helps maintain a good developer experience.
Specific changes:
- `ToolAuthorizationContext` and `ToolContext` are now visible via
`arcade.sdk`
- `ToolCatalog` is now visible via `arcade.sdk`
- `Toolkit` is now visible via `arcade.sdk`
- `config` is now visible via `arcade.sdk.config`
This PR introduces the following changes:
- **Engine Environment Configuration**: Adds support for specifying an
environment variables file for the engine via the `arcade dev` CLI
command.
- **Configuration File Handling**: Refactors configuration file handling
in the CLI launcher to generalize logic for locating configuration
files.
- **Tool Execution Logging**: Enhances logging in `BaseActor` to include
execution duration and adjusts logging levels for better visibility.
- **Enhanced Tool Exception Handling**: Improves exception handling in
`ToolExecutor` and updates the `@tool` decorator to ensure proper
propagation and handling of exceptions raised during tool execution.
# PR Description
When a tool call required authorization, `arcade chat` would hit rate
limits extremely quickly when waiting to authorize with a given link.
This PR introduces using long polling when sending a GET request to
Arcade API `/auth/status`. The `/auth/status` endpoint supports the
`wait` query parameter, where if present, will not respond until either
auth status becomes `completed` or `timeout` is reached. If the
`timeout` is reached, `arcade chat` catches the 408 response and tries
again. For `arcade chat` we set the `wait` query param to 60 seconds.
## Problem
I found a bug with `Github.CountStargazers` where a stargazer count of
`0` was interpreted as a null result. In other words, 0 wasn't passed
back to the Engine correctly.
Separately, the tool function was also not authorized correctly.
## Fix
- Don't use a falsy comparison when evaluating `result` inside the
`ToolOutputFactory`
- Add unit tests for `ToolOutputFactory` to give us confidence in the
business logic
- Added `ToolContext` to pass in the authorization token correctly.
Before
```
User (nate@arcade-ai.com):
how many stars does the ArcadeAI/Docs repo have on github?
Assistant (gpt-4o):
I successfully checked the repository, but unfortunately, I cannot provide the number of stars for the ArcadeAI/Docs repository. Please try checking directly on GitHub for the most accurate information.
Called tool 'Github_CountStargazers'
Parameters:{"owner":"ArcadeAI","name":"Docs"}
'Github_CountStargazers' tool returned:Github.CountStargazers called successfully
```
After
```
User (nate@arcade-ai.com):
how many stars does the ArcadeAI/Docs repo have on github?
Assistant (gpt-4o):
The ArcadeAI/Docs repository on GitHub has 0 stars.
Called tool 'Github_CountStargazers'
Parameters:{"owner":"ArcadeAI","name":"Docs"}
'Github_CountStargazers' tool returned:0
This PR adds four new tools to the Google ToolKit
* `create_event`
* `list_events`
* `update_event`
* `delete_event`
I also improved an error log when tools are being registered by the
actor.
This PR also sneaks in an eval for gmail
Here is a sample conversation that shows the tools and their
capabilities and limitiations:

On the last few PRs I have noticed two problems:
1. `ruff format` fails even though it seems OK on our local machines
(sometimes, not always)
2. Nate's and Sam's machines kept flip-flopping a specific piece of
formatting back and forth, indicating a subtle difference of config
hiding somewhere
3. This was reproducible by running `ruff format` in the terminal,
followed by `make check`. The former would edit files, and then `make
check` would edit them back!
This PR addresses both issues, and further standardizes our editor &
linter configs to be super stable.
Specifically:
1. The main fix for the above, the pre-commit hook was pinned to a super
old version of ruff.
This resulted in subtle differences in behavior between our machines,
and on CI.
2. Moved ruff settings from `pyproject.toml` to `.ruff.toml`
pyproject files in subdirectories (e.g. `toolkits/**`) were overriding
the main pyproject file and erasing the custom ruff config we set at the
root. This meant that our ruff config was applied to `arcade` but not to
any of the other packages.
By moving the config to `.ruff.toml` at the root, all projects will
inherit the same ruff linting & formatting config.
4. Un-ignored the `.vscode/` directory so that we can share
vscode/cursor workspace settings.
This is valuable for standardizing settings like the default formatter
(ruff) and default test framework (pytest).
However, it's important that going forward we _only_ commit things here
that should apply across all of our machines.
5. To avoid any conflict between prettier and ruff, prettier now
explicitly ignores *.py files
6. Finally, `ruff format` and `make check` agree. A number of files are
newly auto-formatted.
This PR includes several improvements to the Arcade client and adds
LangGraph examples:
1. Enhanced error handling in the Arcade client:
- Improved HTTP error handling in `BaseArcadeClient`
- Simplified request methods in `SyncArcadeClient` and
`AsyncArcadeClient`
2. Updated `ToolResource` class:
- Changed base path from `/v1/tool` to `/v1/tools`
- Added `tool_version` parameter to `authorize` method
3. Improved Toolkit discovery:
- Updated `find_all_arcade_toolkits` to search only in the current
Python interpreter's site-packages
5. Added LangGraph examples:
- New `langgraph_auth.py` example demonstrating Gmail authentication
- New `langgraph_with_tool_exec.py` example showing tool execution
within a LangGraph
6. Minor updates:
- Changed default `BASE_URL` to `https://api.arcade.com/`
- Updated import error message for eval dependencies
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
In this PR:
- Handle and require fully-qualified tool names `Toolkit.ToolName` in
the actor
Also, unrelated changes/fixes:
- Cleaned up the logic around actor secrets and `$ARCADE_ACTOR_SECRET`
- Removes experimental Flask actor for now
Note: Must be merged along with
https://github.com/ArcadeAI/Engine/pull/87
* Renamed `arcade_arithmetic` to `arcade_math`
* Deleted `arcade_github` toolkit for the next release. This will be
reintroduced later.
* Added 5 tools to `arcade_x` toolkit
- post_tweet
- delete_tweet_by_id
- search_recent_tweets_by_username
- search_recent_tweets_by_keywords
- lookup_single_user_by_username
Fixes 2 issues that were causing CI to fail:
- Loading `config` in `eval.py` breaks because no API key can be found
in CI
- Python 3.11+ changed `Union` to `UnionType`
1. New Eval SDK (`arcade/sdk/eval.py`):
- Introduces `EvalSuite`, `EvalCase`, and `EvalRubric` classes for
structured evaluation.
- Implements various Critic classes (Binary, Numeric, Similarity) for
flexible scoring.
- Adds a `tool_eval` decorator for easy integration with existing tools.
2. CLI Integration (`arcade/cli/main.py` and `arcade/cli/utils.py`):
- Adds an `evals` command to run evaluation suites from the CLI.
- Implements result display functionality for evaluation outcomes.
3. Toolkit Updates:
- Adds evaluation scripts for Gmail
([toolkits/gmail/evals/eval_gmail_tools.py](file:///Users/spartee/Dropbox/Arcade/platform/Team/arcade-ai/toolkits/gmail/evals/eval_gmail_tools.py#1%2C1-1%2C1))
and Slack
([toolkits/slack/evals/eval_slack_messaging.py](file:///Users/spartee/Dropbox/Arcade/platform/Team/arcade-ai/toolkits/slack/evals/eval_slack_messaging.py#1%2C1-1%2C1))
toolkits.
- Demonstrates practical usage of the Eval SDK with real-world
scenarios.
4. Miscellaneous:
- Updates `arcade/cli/new.py` to optionally generate an `evals`
directory for new toolkits.
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
In this PR:
- Rename `scope` to `scopes` so it is more understandable by humans
- DRY up provider structs, it was starting to get silly with so many
providers that just have 1 property called `scopes`
Must go along with this Engine PR:
https://github.com/ArcadeAI/Engine/pull/79
Adds:
- New options to `arcade chat`: `-h/--host`, `-p/--port`, and
`--tls/--notls`. This allows us to point `arcade chat` at a different
server than what's configured in `arcade.toml` which is very helpful for
debugging.
- Special case: if you do `-h localhost`, it will automatically use port
9099 and no TLS unless otherwise specified.
- Adds a non-fatal engine health check to `arcade chat` startup:
<img width="499" alt="image"
src="https://github.com/user-attachments/assets/b7fae29e-2f8d-4004-a27b-645b4cd997a8">
Note - This Engine PR must go first:
https://github.com/ArcadeAI/Engine/pull/65
In this PR:
- Add `client.tool.authorize` to authorize a tool by name by @Spartee
- Refactored client.auth methods to always pass around scopes (as needed
by the above Engine PR) by @nbarbettini
- Reduced the scopes needed in the Slack toolkit, which was blocked by
this until now! @nbarbettini
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
A few quick fixes while testing the gmail tool with the real Engine:
- Renamed `tool.requirements.auth` to `authorization` -- Engine already
used `authorization`
- Fixed the credentials initializer in the gmail tool
Added
- `arcade dev` - serves a simple fastapi actor
- `arcade config` - show/edit/change config in `~/.arcade`
- `arcade chat` - chat with LLM without toolcalls
Changed:
- `arcade show`, `arcade run` - can now use all installed toolkits
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
- Adds initial `ToolContext` to tool invocations
- This unlocks the ability to call authenticated tools (e.g. Gmail),
which works in this branch against Nate's dev engine
This PR makes a few sweeping changes to the actor, cli, and overall
structure of the project.
- CLI commands skeleton
- ``arcade run``, ``arcade show``, and ``arcade new``
- Working package mangement solution (``arcade_`` packages)
- Actor approach for using frameworks other than FastAPI
- Client for calling Engine within ``arcade/core``
- beginning of the config interface.
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
MyPy compliance for the whole codebase
- systematic way of executing tools (`executor.py`)
- support for using pydantic models in tool inputs and outputs
- mypy compliance (most of the changes)
- removal of unused code (from previous iterations)
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>