This tool will be useful in scenarios akin to RAG, where someone wants
to ask questions or request the production of a summary, for instance,
about a bunch of documents related to a particular topic. Currently, to
fulfill such requests, the LLM needs to first `list_documents`, then
`get_document_by_id` for each document.
We also implement a utility functions to return documents in Markdown
and HTML, since the Drive API JSON is verbose and would waste too many
tokens unnecessarily.
Limitations: the Markdown/HTML utilities do not handle table of contents
(which I think aren't really useful here), headers, footers, or
footnotes.
---
This PR deprecates `list_documents` and implements `search_documents`,
apart from `search_and_retrieve_documents`). This configuration makes it
easier for LLMs to understand when to call each tool.
Both tools had their interfaces refactored to remove Google API-specific
arguments that were confusing LLMs sometimes, such as "corpora" and
"support_all_drives". It now accepts arguments that better relate to
expected user requests.
---------
Co-authored-by: Eric Gustin <eric@arcade.dev>
~~Note: Don't merge until the correct secrets have been added to Arcade
Cloud.~~
Ready to merge, the feature is already on its way to prod.
---------
Co-authored-by: Eric Gustin <eric@arcade.dev>
Break down `search_contacts` into `search_contacts_by_name` and
`search_contacts_by_email`. The search_contacts' `query` argument was
not clear enough for LLMs.
Improved gmail toolkit. Added support for threading in draft replies,
multipart email parsing, and label management. Fixed the DateRange
parameter issue in list_emails_by_headers. Added logging and removed
print statements. Created custom exceptions for each specific google
toolkit.
-----
Summary of changes by @byrro:
- Fixed minor bug related to the `date_range` argument of
`list_emails_by_header`
- A few utility functions (`build_email_message`,
`build_reply_recipients`, `build_reply_body`) to centralize logic and
remove repeated code from email-sending tools
- New `reply_to_email` tool (apart from `write_draft_reply_email`,
implemented by Alex) to keep the toolkit consistent
- Evals and unit tests
- Handling of reply-to (only sender) and reply-to-all recipients
- Removed some unnecessary debug messages, which Alex had added to
replace print statements
- Removed HTML handling implemented by Alex in `write_draft_reply_email`
> I think we should either support HTML across all applicable tools or
not at all; I decided to remove it and leave this feature for a future
PR.
---------
Co-authored-by: Renato Byrro <rmbyrro@gmail.com>
## PR Description
Search ([see API docs
here](https://docs.x.com/x-api/posts/recent-search)) returns a 'data'
field that maps to a list of tweets returned. We've observed that the
'data' field is not present if no tweets match the search. This PR
handles that case safely.
Currently, retrieving DMs with a given username requires several
actions: first get the current user's ID; list all users and find the ID
of the username; then scan all DM conversations and find the one with
the current user's ID and the username's ID, to finally retrieve the
messages using that conversation ID.
This tool abstracts all that in a single call.
PS: we'll implement a similar tool for multi-person DM conversations in
a subsequent PR.
# PR Description
This PR adds ~~four~~ three improvements to evals.
~~## 1. Add parameterized eval cases~~
~~Adds a new method named `add_parameterized_case`. Just like pytest’s
parameterized tests, eval cases can be parameterized with multiple user
messages. Adds a case to the `EvalSuite` for each user message. All
cases have the same expected tool call(s), params, additional_messages.
This reduces duplicate code and makes it easy to observe how a model
performs based on increasingly more difficult prompts.~~
```python
""" NO LONGER IN THIS PR
user_messages = [
"Call the delete tweet by id tool with the tweet ID '148975632'.",
"Delete the tweet with ID '148975632'.",
"I don't want to have this tweet (148975632) on my account anymore.",
"do the opposite of post for https://x.com/x/status/148975632",
]
suite.add_parameterized_case(
name="Delete a tweet by ID",
user_messages=user_messages,
expected_tool_calls=[
ExpectedToolCall(
func=delete_tweet_by_id,
args={"tweet_id": "148975632"},
)
],
critics=[
BinaryCritic(
critic_field="tweet_id",
weight=1.0,
),
],
)
"""
```
~~PASSED Delete a tweet by ID (user_message 1 of 4) -- Score: 100.00%~~
~~PASSED Delete a tweet by ID (user_message 2 of 4) -- Score: 100.00%~~
~~PASSED Delete a tweet by ID (user_message 3 of 4) -- Score: 100.00%~~
~~FAILED Delete a tweet by ID (user_message 4 of 4) -- Score: 0.00%~~
~~Summary -- Total: 4 -- Passed: 3 -- Failed: 1~~
## 2. Parameters that are not explicitly criticized are assigned a
`NoneCritic`.
A NoneCritic has no effect on the evaluation results and does not
actually evaluate. Parameters that have a NoneCritic will be displayed
as ‘un-criticized’ in the evaluation summary (if `-d` flag is used).

## 3. Add a hardcoded `seed` parameter for evals.
The seed parameter aides in receiving (mostly) consistent outputs -
aiding in reproducibility for evaluations.
## 4. Disallow more than one critic for the same field.
Raises a `ValueError` if more than one critic is assigned to a field.
---------
Co-authored-by: Eric Gustin <eric@arcade-ai.com>
# PR Description
* This PR updates code in `examples/` to be compatible with version
1.0.0
* This PR removes the Spotify examples since the Arcade hosted worker
doesn't currently cataloge the Spotify toolkit. We can reintroduce these
examples when it does.
* This PR performs various renames across the codebase for
`arcade-ai.com` --> `arcade.dev` and `Arcade AI` --> `Arcade`
# PR Description
The `github.event.pull_request.author_association` in the "Prevent
Unauthorized Version Updates" workflow was returning inconsistent
results by saying that MEMBERS were CONTRIBUTORS. This PR moves away
from `author_association` in favor of a whitelist text file containing
the GitHub usernames of authorized toolkit release managers.
A toolkit release manager has the following special permissions:
* Can change the version of an existing toolkit
* Can delete an existing toolkit
* Can rename an existing toolkit
The `get_conversation_metadata_by_name` tool retrieves conversation
metadata from another tool, `list_conversations_metadata`, but was
accessing the `next_cursor` using the Slack API response dict structure,
instead of the tool response structure. As a result, in that tool, the
tool would never actually paginate to the second page. This PR fixes it
and also adjust tests to capture the issue appropriately.
`starred` is a required argument of the
`arcade_github.activity.set_starred` tool, but when it is not provided
in the tool call, the engine is somehow passing it with a falsy value,
instead of raising an error. the falsy value makes the tool unstar a
repo by default, which is not the desired behavior.
we're setting the `starred` arg to True in the tool interface to prevent
that.
implements additional tools for Slack related to retrieving
conversations metadata, list of members, history of messages, as well as
sending messages to private/public channels and DMs / multi-person DMs.
---------
Co-authored-by: Eric Gustin <eric@arcade-ai.com>
Co-authored-by: Renato Byrro <rmbyrro@gmail.com>
# PR Description
For the `Google.GetThread` tool, we had a parameter named
`metadata_headers`. This parameter only makes a difference if the format
is "metadata", but the tool will never have the format "metadata". So,
the input parameter is useless. This parameter should have never been
added to the tool and we should remove it before public beta.
# PR Description
Changes to a toolkit without changes to the toolkit's version fail the
'Publish Toolkit' workflow with `HTTP Error 400: File already exists
('arcade_zoom-0.1.7.tar.gz', with blake2_256 hash
'02183cda607f06616e7edb17e3d22bc11d1d83b074b3e44066b78ec72602fb37'). See
https://pypi.org/help/#file-name-reuse for more information.`, for
example.
This PR adds the `--skip-existing` flag to `poetry publish` to avoid
attempting to publish an existing version. Skips slack notification if
publish is skipped.
The `grep`'d string comes from
https://github.com/python-poetry/poetry/blob/main/src/poetry/publishing/uploader.py#L246-L249
Modifies X tweet tools to return metadata about media attachments
(photo, GIF or video) when retrieving a tweet by ID, username or
keywords.
The tool will always return media attachments by default. Since it's
only metadata, it shouldn't add significant network overhead to existing
implementations of the tool.
My guess is more often than not people will want this info included.
When not needed, it doesn't hurt to include by default. It'd be annoying
to have to ask the LLM to include it every time they need.
# PR Description
Poetry released v2 with many breaking changes a couple days ago. The
`install-poetry` action that our workflows use default to that v2
version, so many of our workflows are failing. This PR forces that
action to use poetry version 1.8.5 and also uses 1.8.5 for toolkits
A ticket to migrate to 2.0.0 has been filed for future work
# PR Description
* Adds/updates the following files to all toolkits:
- `.pre-commit-config.yaml`
- `.ruff.toml`
- `LICENSE`
- `Makefile`
- `pyproject.toml`
* Lint all toolkits such that they pass `make check` and `make test` (a
total doozy). This includes adding some unit tests and evals.
* Github workflow for testing toolkits before merge into main (courtesy
of @sdreyer)
* Added a QOL improvement for tool developers for when they need to get
the context's auth token.
* Minor updates to `arcade new` template.
# PR Description
This PR renames `ExpectedToolCall` to `NamedExpectedToolCall` and then
creates a new dataclass called `ExpectedToolCall`. `ExpectedToolCall`
can be passed to the `EvalSuite.add_case` and `EvalSuite.extend_case`
methods.
1. Enhance `EvalSuite.add_case` and `EvalSuite.extend_case` by accepting
a list of `ExpectedToolCall` as their `expected_tool_calls` input
parameter. This helps create a scaffolding for developers. Previously,
the expected type was `list[tuple[Callable, dict[str, Any]]]`, which is
still valid for backward compatibility.
```python
# Before (still valid for backward compatibility)
expected_tool_calls=[
(
adjust_playback_position,
{
"absolute_position_ms": 10000,
},
)
]
# After
expected_tool_calls=[
ExpectedToolCall(
func=adjust_playback_position,
args={"absolute_position_ms": 10000},
)
]
```
2. Removed any references to arcade.core in toolkits directory.
3. Some linting for import organization.
# PR Description
* Update `search_recent_tweets_by_username`,
`search_recent_tweets_by_keywords`, and `lookup_tweet_by_id` to support
long tweets. Previously, only the first 280 characters of the tweet's
text were returned by the tool.
# PR Description
Adds an optional `next_token` input parameter to the
`X.SearchRecentTweetsByUsername` and `X.SearchRecentTweetsByKeywords`
tools.
This allows users to paginate through tweets. A `next_token` is provided
in the tools's response.
For example, to access the `next_token` when using the `tools.execute`,
you can do `next_token = response.output.value["meta"].get("next_token",
None)` and then pass it to the tool on your next call through the tools'
`next_token` input parameter.
This PR introduces the `lookup_tweet_by_id` tool to the X toolkit,
enabling users to retrieve tweet details by tweet ID. This enhancement
extends the toolkit's capabilities, allowing for more comprehensive
interactions with the X (Twitter) API.
**Key Changes:**
- **Added `lookup_tweet_by_id` Tool:**
- Implemented the `lookup_tweet_by_id` function in `tools/tweets.py`,
which allows users to fetch tweet information using a tweet ID.
- Included error handling for API response codes and expanded URLs in
tweets to assist language models in avoiding hallucinations due to
shortened URLs.
- **Enhanced Toolkit Structure:**
- Added several configuration files to the X toolkit to establish a
standardized project structure, which in the future will be generated by
`arcade new`. These include:
- `.pre-commit-config.yaml`: Defines pre-commit hooks for code quality
checks.
- `.ruff.toml`: Configuration for the Ruff linter.
- `LICENSE`: MIT License file for the toolkit.
- `Makefile`: Contains common commands for building, testing, and
linting the toolkit.
- **Updated Makefile:**
- Added `make check-toolkits` command to the top-level `Makefile`. This
command runs code quality tools for each toolkit that contains a
`Makefile`.
**Additional Notes:**
- **Tests:**
- Added unit tests for the new `lookup_tweet_by_id` tool in
`tests/test_tweets.py`.
- Included tests for the user lookup functionality in
`tests/test_users.py`.
- **Linting and Code Quality:**
- Configured pre-commit hooks and Ruff linter to enforce code standards.
- Updated the `pyproject.toml` file with development dependencies for
testing and linting.
-
---------
Co-authored-by: Eric Gustin <eric@arcade-ai.com>
# PR Description
1. This PR adds three new tools:
- GetThread (by ID)
- ListThreads
- SearchThreads
2. This PR updates the return type for various Gmail tools from str to
dict.
3. This PR adds evals and tests for the added tools
# PR Description
This PR creates a new toolkit called CodeSandbox. This toolkit has two
tools:
1. `RunCode`: Creates an E2B sandbox and runs the provided code in that
sandbox. Returns the execution logs, result, and errors. Supports
Python, JavaScript, R, Java, and Bash code.
2. `CreateStaticMatplotlibChart`: Creates a sandbox, runs the provided
python code that uses matplotlib, and returns the base64 encoded image
of the chart along with any logs or errors.
- I recommend not using `tool_choice="generate"` since the return object
contains a base64 image can be a lot of tokens that will not provide
much value to a generate's response.
Example of creating a pie chart:
```python
import base64
import json
import os
from openai import OpenAI
def call_tool_with_openai(client: OpenAI) -> dict:
response = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "There are 17 red apples, 4 green apples, and 10 yellow apples. Create a pie chart for this data.",
},
],
model="gpt-4o-mini",
user="you@example.com",
tools=["CodeSandbox.CreateStaticMatplotlibChart"],
tool_choice="execute",
)
return response
arcade_api_key = os.environ.get("ARCADE_API_KEY")
cloud_host = "http://localhost:9099/v1"
openai_client = OpenAI(
api_key=arcade_api_key,
base_url=cloud_host,
)
chat_result = call_tool_with_openai(openai_client)
tool_call_id = chat_result.choices[0].message.tool_calls[0].id
content = json.loads(chat_result.choices[0].message.content)
base64_image = content[tool_call_id]["value"]["base64_image"]
image_data = base64.b64decode(base64_image)
with open("output_image.png", "wb") as image_file:
image_file.write(image_data)
```
## PR Description
This PR adds 7 examples.
* `call_a_tool_directly_with_auth.py` - Simple example that uses Arcade
client to execute a tool that lists Gmail emails
* `call_a_tool_directly.py` - Simple example that uses Arcade client to
execute a tool that adds two numbers together
* `call_a_tool_with_llm.py` - Simple example that uses the LLM api to
star the arcade-ai repository
* `get_auth_token.py` - Simple example that gets a Google auth token and
then calls the Google API
* `call_multiple_tools_directly_with_auth.py` - A more involved example
that directly calls multiple spotify tools sequentially
* `call_multiple_tools_with_llm.py` - A more involved example that uses
an llm to call multiple spotify tools sequentially
* `simple_chatbot.py` - Simple chatbot that uses arcade tools and has
history
---------
Co-authored-by: Nate Barbettini <nathanaelb@gmail.com>
# PR Description
Previously, if a tool adjusted the playback state, then the tool would
return the current playback state after the modification had occurred.
The problem with this approach was that Spotify would not update the
playback state in time (sometimes), so the tools were returning stale
data!
# PR Description
This PR adds three new spotify tools that are natural language friendly.
1. `search` - Search Spotify Catalog information
2. `play_artist_by_name` - Gets 5 songs by the specified artist and
plays them. Uses `search`, and `start_tracks_playback_by_id` under the
hood
3. `play_track_by_name` - Plays the specified song, optionally provide
the artist name who plays the song. Uses `search`, and
`start_tracks_playback_by_id` under the hood
# PR Description
1. `adjust_playback_position` - Adjust the playback position within the
currently playing track
2. `skip_to_previous_track` - Skip to the previous track in the user's
queue, if any
3. `skip_to_next_track` - Skip to the next track in the user's queue, if
any
4. `pause_playback` - Pause the currently playing track, if any
5. `resume_playback` - Resume the currently playing track, if any
6. `start_tracks_playback_by_id` - Start playback of a list of tracks
(songs)
7. `get_playback_state` - Get information about the user's current
playback state, including track or episode, and active device
8. `get_currently_playing` - Get information about the user's currently
playing track
9. `get_track_from_id` - Get information about a track
10. `get_recommendations` - Get track (song) recommendations based on
seed artists, genres, and tracks, and multiple target audio stats
11. `get_tracks_audio_features` - Get audio features for a list of
tracks (songs)
----------------------
My favorite feature of this toolkit is
1. Start playing my favorite song
2. Get the song that I'm currently playing
3. Get audio features of that song
4. Ask for recommended songs that are similar to it
5. Jam out
------------