# PR Description
* Adds/updates the following files to all toolkits:
- `.pre-commit-config.yaml`
- `.ruff.toml`
- `LICENSE`
- `Makefile`
- `pyproject.toml`
* Lint all toolkits such that they pass `make check` and `make test` (a
total doozy). This includes adding some unit tests and evals.
* Github workflow for testing toolkits before merge into main (courtesy
of @sdreyer)
* Added a QOL improvement for tool developers for when they need to get
the context's auth token.
* Minor updates to `arcade new` template.
# PR Description
This PR renames `ExpectedToolCall` to `NamedExpectedToolCall` and then
creates a new dataclass called `ExpectedToolCall`. `ExpectedToolCall`
can be passed to the `EvalSuite.add_case` and `EvalSuite.extend_case`
methods.
1. Enhance `EvalSuite.add_case` and `EvalSuite.extend_case` by accepting
a list of `ExpectedToolCall` as their `expected_tool_calls` input
parameter. This helps create a scaffolding for developers. Previously,
the expected type was `list[tuple[Callable, dict[str, Any]]]`, which is
still valid for backward compatibility.
```python
# Before (still valid for backward compatibility)
expected_tool_calls=[
(
adjust_playback_position,
{
"absolute_position_ms": 10000,
},
)
]
# After
expected_tool_calls=[
ExpectedToolCall(
func=adjust_playback_position,
args={"absolute_position_ms": 10000},
)
]
```
2. Removed any references to arcade.core in toolkits directory.
3. Some linting for import organization.
# PR Description
This PR creates a new toolkit called CodeSandbox. This toolkit has two
tools:
1. `RunCode`: Creates an E2B sandbox and runs the provided code in that
sandbox. Returns the execution logs, result, and errors. Supports
Python, JavaScript, R, Java, and Bash code.
2. `CreateStaticMatplotlibChart`: Creates a sandbox, runs the provided
python code that uses matplotlib, and returns the base64 encoded image
of the chart along with any logs or errors.
- I recommend not using `tool_choice="generate"` since the return object
contains a base64 image can be a lot of tokens that will not provide
much value to a generate's response.
Example of creating a pie chart:
```python
import base64
import json
import os
from openai import OpenAI
def call_tool_with_openai(client: OpenAI) -> dict:
response = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "There are 17 red apples, 4 green apples, and 10 yellow apples. Create a pie chart for this data.",
},
],
model="gpt-4o-mini",
user="you@example.com",
tools=["CodeSandbox.CreateStaticMatplotlibChart"],
tool_choice="execute",
)
return response
arcade_api_key = os.environ.get("ARCADE_API_KEY")
cloud_host = "http://localhost:9099/v1"
openai_client = OpenAI(
api_key=arcade_api_key,
base_url=cloud_host,
)
chat_result = call_tool_with_openai(openai_client)
tool_call_id = chat_result.choices[0].message.tool_calls[0].id
content = json.loads(chat_result.choices[0].message.content)
base64_image = content[tool_call_id]["value"]["base64_image"]
image_data = base64.b64decode(base64_image)
with open("output_image.png", "wb") as image_file:
image_file.write(image_data)
```