This tool will be useful in scenarios akin to RAG, where someone wants
to ask questions or request the production of a summary, for instance,
about a bunch of documents related to a particular topic. Currently, to
fulfill such requests, the LLM needs to first `list_documents`, then
`get_document_by_id` for each document.
We also implement a utility functions to return documents in Markdown
and HTML, since the Drive API JSON is verbose and would waste too many
tokens unnecessarily.
Limitations: the Markdown/HTML utilities do not handle table of contents
(which I think aren't really useful here), headers, footers, or
footnotes.
---
This PR deprecates `list_documents` and implements `search_documents`,
apart from `search_and_retrieve_documents`). This configuration makes it
easier for LLMs to understand when to call each tool.
Both tools had their interfaces refactored to remove Google API-specific
arguments that were confusing LLMs sometimes, such as "corpora" and
"support_all_drives". It now accepts arguments that better relate to
expected user requests.
---------
Co-authored-by: Eric Gustin <eric@arcade.dev>
Break down `search_contacts` into `search_contacts_by_name` and
`search_contacts_by_email`. The search_contacts' `query` argument was
not clear enough for LLMs.
Improved gmail toolkit. Added support for threading in draft replies,
multipart email parsing, and label management. Fixed the DateRange
parameter issue in list_emails_by_headers. Added logging and removed
print statements. Created custom exceptions for each specific google
toolkit.
-----
Summary of changes by @byrro:
- Fixed minor bug related to the `date_range` argument of
`list_emails_by_header`
- A few utility functions (`build_email_message`,
`build_reply_recipients`, `build_reply_body`) to centralize logic and
remove repeated code from email-sending tools
- New `reply_to_email` tool (apart from `write_draft_reply_email`,
implemented by Alex) to keep the toolkit consistent
- Evals and unit tests
- Handling of reply-to (only sender) and reply-to-all recipients
- Removed some unnecessary debug messages, which Alex had added to
replace print statements
- Removed HTML handling implemented by Alex in `write_draft_reply_email`
> I think we should either support HTML across all applicable tools or
not at all; I decided to remove it and leave this feature for a future
PR.
---------
Co-authored-by: Renato Byrro <rmbyrro@gmail.com>
# PR Description
This PR adds ~~four~~ three improvements to evals.
~~## 1. Add parameterized eval cases~~
~~Adds a new method named `add_parameterized_case`. Just like pytest’s
parameterized tests, eval cases can be parameterized with multiple user
messages. Adds a case to the `EvalSuite` for each user message. All
cases have the same expected tool call(s), params, additional_messages.
This reduces duplicate code and makes it easy to observe how a model
performs based on increasingly more difficult prompts.~~
```python
""" NO LONGER IN THIS PR
user_messages = [
"Call the delete tweet by id tool with the tweet ID '148975632'.",
"Delete the tweet with ID '148975632'.",
"I don't want to have this tweet (148975632) on my account anymore.",
"do the opposite of post for https://x.com/x/status/148975632",
]
suite.add_parameterized_case(
name="Delete a tweet by ID",
user_messages=user_messages,
expected_tool_calls=[
ExpectedToolCall(
func=delete_tweet_by_id,
args={"tweet_id": "148975632"},
)
],
critics=[
BinaryCritic(
critic_field="tweet_id",
weight=1.0,
),
],
)
"""
```
~~PASSED Delete a tweet by ID (user_message 1 of 4) -- Score: 100.00%~~
~~PASSED Delete a tweet by ID (user_message 2 of 4) -- Score: 100.00%~~
~~PASSED Delete a tweet by ID (user_message 3 of 4) -- Score: 100.00%~~
~~FAILED Delete a tweet by ID (user_message 4 of 4) -- Score: 0.00%~~
~~Summary -- Total: 4 -- Passed: 3 -- Failed: 1~~
## 2. Parameters that are not explicitly criticized are assigned a
`NoneCritic`.
A NoneCritic has no effect on the evaluation results and does not
actually evaluate. Parameters that have a NoneCritic will be displayed
as ‘un-criticized’ in the evaluation summary (if `-d` flag is used).

## 3. Add a hardcoded `seed` parameter for evals.
The seed parameter aides in receiving (mostly) consistent outputs -
aiding in reproducibility for evaluations.
## 4. Disallow more than one critic for the same field.
Raises a `ValueError` if more than one critic is assigned to a field.
---------
Co-authored-by: Eric Gustin <eric@arcade-ai.com>
# PR Description
* This PR updates code in `examples/` to be compatible with version
1.0.0
* This PR removes the Spotify examples since the Arcade hosted worker
doesn't currently cataloge the Spotify toolkit. We can reintroduce these
examples when it does.
* This PR performs various renames across the codebase for
`arcade-ai.com` --> `arcade.dev` and `Arcade AI` --> `Arcade`
# PR Description
For the `Google.GetThread` tool, we had a parameter named
`metadata_headers`. This parameter only makes a difference if the format
is "metadata", but the tool will never have the format "metadata". So,
the input parameter is useless. This parameter should have never been
added to the tool and we should remove it before public beta.
# PR Description
Poetry released v2 with many breaking changes a couple days ago. The
`install-poetry` action that our workflows use default to that v2
version, so many of our workflows are failing. This PR forces that
action to use poetry version 1.8.5 and also uses 1.8.5 for toolkits
A ticket to migrate to 2.0.0 has been filed for future work
# PR Description
* Adds/updates the following files to all toolkits:
- `.pre-commit-config.yaml`
- `.ruff.toml`
- `LICENSE`
- `Makefile`
- `pyproject.toml`
* Lint all toolkits such that they pass `make check` and `make test` (a
total doozy). This includes adding some unit tests and evals.
* Github workflow for testing toolkits before merge into main (courtesy
of @sdreyer)
* Added a QOL improvement for tool developers for when they need to get
the context's auth token.
* Minor updates to `arcade new` template.
# PR Description
This PR renames `ExpectedToolCall` to `NamedExpectedToolCall` and then
creates a new dataclass called `ExpectedToolCall`. `ExpectedToolCall`
can be passed to the `EvalSuite.add_case` and `EvalSuite.extend_case`
methods.
1. Enhance `EvalSuite.add_case` and `EvalSuite.extend_case` by accepting
a list of `ExpectedToolCall` as their `expected_tool_calls` input
parameter. This helps create a scaffolding for developers. Previously,
the expected type was `list[tuple[Callable, dict[str, Any]]]`, which is
still valid for backward compatibility.
```python
# Before (still valid for backward compatibility)
expected_tool_calls=[
(
adjust_playback_position,
{
"absolute_position_ms": 10000,
},
)
]
# After
expected_tool_calls=[
ExpectedToolCall(
func=adjust_playback_position,
args={"absolute_position_ms": 10000},
)
]
```
2. Removed any references to arcade.core in toolkits directory.
3. Some linting for import organization.
# PR Description
1. This PR adds three new tools:
- GetThread (by ID)
- ListThreads
- SearchThreads
2. This PR updates the return type for various Gmail tools from str to
dict.
3. This PR adds evals and tests for the added tools
This PR ensures that `arcade.core` does not show up anywhere in "user
space". This is crucial for helping developers understand what objects
are safe to use, and helps maintain a good developer experience.
Specific changes:
- `ToolAuthorizationContext` and `ToolContext` are now visible via
`arcade.sdk`
- `ToolCatalog` is now visible via `arcade.sdk`
- `Toolkit` is now visible via `arcade.sdk`
- `config` is now visible via `arcade.sdk.config`
**New Tools Added**
- `docs.py`: Provides tools for Google Docs functionalities, including
creating documents and inserting text.
- `drive.py`: Introduces tools for Google Drive operations, such as
listing documents.
This PR also focuses on simplifying the error handling logic in the Google
toolkit, specifically within the Calendar and Gmail tools. The primary
change involves removing redundant `try-except` blocks that were
catching `HttpError` and general exceptions, and re-raising them as
`ToolExecutionError`. By removing these blocks, we allow exceptions to
propagate naturally, and be handled by the ``ToolExecutor``
---------
Co-authored-by: Eric Gustin <eric@arcade-ai.com>
```
authors = ["Arcade AI <dev@arcade-ai.com>"]
```
vs
```
authors = ["Arcade AI <dev@arcade-ai.com"]
```
There is also now a ``make`` command for ``make install-toolkits``
### Adds the following tools to the Github Toolkit:
1. CreateIssueComment
2. SetStarred
3. CountStargazers
4. ListOrgRepositories
5. GetRepository
6. ListRepositoryActivities
7. ListReviewCommentsInARepository
8. ListPullRequests
9. GetPullRequest
10. UpdatePullRequest
11. ListPullRequestCommits
12. CreateReplyForReviewComment
13. ListReviewCommentsOnPullRequest
14. CreateReviewComment
Adds evals for all of these tools and unit tests.
---------
Co-authored-by: Sam Partee <sam@arcade-ai.com>
This PR adds four new tools to the Google ToolKit
* `create_event`
* `list_events`
* `update_event`
* `delete_event`
I also improved an error log when tools are being registered by the
actor.
This PR also sneaks in an eval for gmail
Here is a sample conversation that shows the tools and their
capabilities and limitiations:

This PR improves the Docker build process by shifting from building the
project within the Docker image to using pre-built wheels. The main
changes are:
1. **Updated Makefile:**
- **`VERSION` Variable:** Set to `0.1.0.dev0` to reflect the new default
development version.
- **`docker` Target:**
- Added steps to build the Arcade and toolkit wheels before building the
Docker image.
- Exports the required extras (`fastapi`, `evals`) to a
`requirements.txt` file.
- **`full-dist` Target:**
- Builds distributions for the main project and all toolkits.
- Copies all the built wheels to a centralized `./dist` directory.
- **`clean-dist` Target:**
- Cleans build artifacts from `./dist`, `arcade/dist`, and
`toolkits/*/dist` directories.
2. **Modified Dockerfile:**
- **Copy Pre-built Wheels:** Adjusted to copy wheels and the
`requirements.txt` from the `./dist` directory into the Docker image.
- **Installation Process:**
- Installs the Arcade wheel with the necessary extras.
- Installs toolkits from the copied wheel files, eliminating the need to
build them inside the Docker image.
- **Simplification:** Removed unnecessary commands, such as installing
build tools and copying the entire codebase, to streamline the
Dockerfile.
3. **Toolkits `pyproject.toml` Updates:**
- Changed the `arcade-ai` dependency version from `^0.1.0` to `0.1.*` in
all toolkit `pyproject.toml` files to ensure compatibility with the new
versioning scheme.
4. **Docker Makefile Adjustments:**
- Set the `VERSION` variable to `0.1.0.dev0` to align with the main
Makefile.
- Ensures consistent versioning across Docker-related build processes.
**Benefits:**
- **Efficiency:** Building wheels outside the Docker context reduces the
Docker image build time and resource consumption. overall docker image
size reduced by **1Gb**!!!
- **Reliability:** Using pre-built wheels ensures consistency across
different environments and simplifies dependency management.
- **Maintainability:** The Dockerfile and Makefiles are cleaner and more
straightforward, making them easier to understand and maintain.
**Notes:**
- Developers should run `make docker` to build and run the Docker
container using the new process.
- Ensure that any CI/CD pipelines are updated to accommodate these
changes in the build process. @sdreyer
Included toolkits as part of the linting process.
Cleaned up any tools that needed to be updated because of this.
This portion of the PR description was added via arcade chat!
On the last few PRs I have noticed two problems:
1. `ruff format` fails even though it seems OK on our local machines
(sometimes, not always)
2. Nate's and Sam's machines kept flip-flopping a specific piece of
formatting back and forth, indicating a subtle difference of config
hiding somewhere
3. This was reproducible by running `ruff format` in the terminal,
followed by `make check`. The former would edit files, and then `make
check` would edit them back!
This PR addresses both issues, and further standardizes our editor &
linter configs to be super stable.
Specifically:
1. The main fix for the above, the pre-commit hook was pinned to a super
old version of ruff.
This resulted in subtle differences in behavior between our machines,
and on CI.
2. Moved ruff settings from `pyproject.toml` to `.ruff.toml`
pyproject files in subdirectories (e.g. `toolkits/**`) were overriding
the main pyproject file and erasing the custom ruff config we set at the
root. This meant that our ruff config was applied to `arcade` but not to
any of the other packages.
By moving the config to `.ruff.toml` at the root, all projects will
inherit the same ruff linting & formatting config.
4. Un-ignored the `.vscode/` directory so that we can share
vscode/cursor workspace settings.
This is valuable for standardizing settings like the default formatter
(ruff) and default test framework (pytest).
However, it's important that going forward we _only_ commit things here
that should apply across all of our machines.
5. To avoid any conflict between prettier and ruff, prettier now
explicitly ignores *.py files
6. Finally, `ruff format` and `make check` agree. A number of files are
newly auto-formatted.
This PR includes several improvements to the Arcade client and adds
LangGraph examples:
1. Enhanced error handling in the Arcade client:
- Improved HTTP error handling in `BaseArcadeClient`
- Simplified request methods in `SyncArcadeClient` and
`AsyncArcadeClient`
2. Updated `ToolResource` class:
- Changed base path from `/v1/tool` to `/v1/tools`
- Added `tool_version` parameter to `authorize` method
3. Improved Toolkit discovery:
- Updated `find_all_arcade_toolkits` to search only in the current
Python interpreter's site-packages
5. Added LangGraph examples:
- New `langgraph_auth.py` example demonstrating Gmail authentication
- New `langgraph_with_tool_exec.py` example showing tool execution
within a LangGraph
6. Minor updates:
- Changed default `BASE_URL` to `https://api.arcade.com/`
- Updated import error message for eval dependencies
---------
Co-authored-by: Nate Barbettini <nate@arcade-ai.com>
# PR Description
## Summary
Changes include renaming the `arcade_gmail` toolkit to `arcade_google`,
adding unit tests for Google toolkit, add new tools to the Google
toolkit.
## Changes
### Makefile
- Added a new `make test-toolkits` target to iterate over all toolkits
and run pytest on each one.
### Added new tools for the google toolkit
1. `send_email`
This tool sends an email using the Gmail API.
2. `write_draft_email`
This tool creates a draft email using the Gmail API.
3. `update_draft_email`
This tool updates an existing draft email using the Gmail API.
4. `send_draft_email`
This tool sends a draft email using the Gmail API.
5. `delete_draft_email`
This tool deletes a draft email using the Gmail API.
6. `list_draft_emails`
This tool retrieves a list of draft emails using the Gmail API.
7. `list_emails_by_header`
This tool searches for emails by a specific header using the Gmail API.
- `sender`: The sender's email address to search for.
- `limit`: The maximum number of emails to retrieve.
8. `list_emails`
This tool retrieves a list of emails using the Gmail API.
9. `trash_email`
This tool moves an email to the trash using the Gmail API.