arcade-mcp/.claude/skills/build-error-adapter/SKILL.md
Francisco Or Something 8f5d0ff54e
Improve typed httpx error mapping and adapter guidance (#820)
## Summary

Routes HTTP adapter exceptions to the right error class instead of
shoe-horning everything into `UpstreamError`. Addresses Eric's earlier
feedback that several exceptions this PR was wrapping as `UpstreamError`
didn't satisfy the "something happened with the upstream" claim (local
pool exhaustion, client-side request construction, local TLS failures).

### Scope

- `UpstreamError` (unchanged) — upstream responded with an HTTP status
code.
- **`NetworkTransportError`** (new sibling in `arcade-core`) — no
complete response was received. `status_code=None`. Three kinds:
`NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, `_UNREACHABLE`, `_UNMAPPED`.
- **`FatalToolError`** (existing) — client construction bugs
(`InvalidURL`, `UnsupportedProtocol`, `MissingSchema`, `InvalidHeader`,
`LocalProtocolError`, …) and local TLS/cert config failures. Never
retried.

---

## Before / After (per Eric's request)

Shows the error payload a tool produces for each exception, before this
PR vs. after. "Before" = current `main` (exceptions without real HTTP
responses fall through to the generic `@tool` `FatalToolError` catch-all
with `message=str(exc)`).

### No-response transport failures

| Exception | Before — class / message / kind | After — class / message
/ kind |
|---|---|---|
| `httpx.PoolTimeout` | `FatalToolError` — `str(exc)` leaks raw detail —
`TOOL_RUNTIME_FATAL`, not retryable | `NetworkTransportError` — `"HTTP
request timed out before a complete response was received."` —
`NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, **retryable** |
| `httpx.ConnectTimeout` | same as above | same as PoolTimeout —
`TIMEOUT`, retryable |
| `httpx.ConnectError` (refused / DNS) | `FatalToolError` — `str(exc)` |
`NetworkTransportError` — `"HTTP request failed before reaching the
upstream service."` — `UNREACHABLE`, retryable |
| `httpx.RemoteProtocolError` (upstream sent bad HTTP) |
`FatalToolError` — `str(exc)` | `NetworkTransportError` — same message
as ConnectError — `UNREACHABLE`, retryable |
| `httpx.DecodingError` | `FatalToolError` — `str(exc)` |
`NetworkTransportError` — `"HTTP response from upstream could not be
decoded."` — `UNMAPPED`, retryable |
| `httpx.TooManyRedirects` | `FatalToolError` — `str(exc)` |
`NetworkTransportError` — `"HTTP redirect limit exceeded before a final
response was received."` — `UNMAPPED`, **not** retryable |

### Client construction / local env bugs

| Exception | Before | After |
|---|---|---|
| `httpx.UnsupportedProtocol`, `httpx.InvalidURL`,
`httpx.LocalProtocolError` | `FatalToolError` with `message=str(exc)`
(may leak scheme / URL content) | `FatalToolError` — `"Tool constructed
an invalid HTTP request — likely a tool-authoring bug."` —
`TOOL_RUNTIME_FATAL`, not retryable |
| `requests.MissingSchema`, `InvalidURL`, `InvalidHeader`,
`InvalidSchema`, `InvalidProxyURL`, `URLRequired` | same as above | same
as above |
| `requests.SSLError` | `FatalToolError` — `str(exc)` often contains raw
cert chain detail | `FatalToolError` — `"TLS handshake failed — likely a
local certificate or trust configuration issue."` —
`TOOL_RUNTIME_FATAL`, not retryable |

### Real HTTP response errors (UNCHANGED — same behavior)

| Exception | Class | Message | Kind | Retryable |
|---|---|---|---|---|
| `httpx.HTTPStatusError` 404 | `UpstreamError` | `"Upstream HTTP
request failed (Not Found, client error)."` |
`UPSTREAM_RUNTIME_NOT_FOUND` | No |
| `httpx.HTTPStatusError` 429 (w/ Retry-After: 60) |
`UpstreamRateLimitError` | `"Upstream HTTP request failed (Too Many
Requests, client error). Retry after 60 second(s)."` |
`UPSTREAM_RUNTIME_RATE_LIMIT` | Yes |
| `httpx.HTTPStatusError` 500 | `UpstreamError` | `"Upstream HTTP
request failed (Internal Server Error, server error)."` |
`UPSTREAM_RUNTIME_SERVER_ERROR` | Yes |

### What's no longer in the message

- Raw exception `str(exc)` output (which frequently includes the full
URL with query-string tokens, connection pool details, or cert chains)
is **no longer the agent-facing `message`**. It's preserved in
`developer_message` for server-side diagnostics.
- The misleading "Upstream HTTP…" prefix is gone from network-transport
and construction-bug messages. Those messages now honestly describe what
happened on the tool side.
- For 429s without a `Retry-After` header, we still show "Retry after N
seconds." (pre-existing behavior; see follow-up notes).

---

## Companion PRs

-
[ArcadeAI/arcade-mcp#823](https://github.com/ArcadeAI/arcade-mcp/pull/823)
— introduces `NetworkTransportError` in `arcade-core`
- [ArcadeAI/monorepo#911](https://github.com/ArcadeAI/monorepo/pull/911)
— adds the 3 `ErrorKind` constants to the Go engine and Datadog
dashboards
- [ArcadeAI/docs#920](https://github.com/ArcadeAI/docs/pull/920) —
documents the new hierarchy and adapter routing

## Follow-ups (out of scope for this PR)

A short investigation surfaced several pre-existing issues that are
worth fixing separately. A full list is in
`NETWORK_TRANSPORT_ERROR_FOLLOWUPS.md` (shared offline). Summary:

1. `requests.HTTPError` with `response is None` returns `None` from the
adapter; should fall through to the `NetworkTransportError(UNMAPPED)`
fallback instead of becoming a generic `FatalToolError`.
2. `developer_message` can leak URL query strings (and therefore tokens)
since it stores raw `str(exc)`.
3. `_sanitize_uri` does not strip userinfo (credentials in URL path).
4. `_parse_retry_ms` misinterprets epoch-style `x-ratelimit-reset`
headers.
5. 429 responses without `Retry-After` synthesize a fabricated "Retry
after 1 second(s)." suffix.
6. `UPSTREAM_RUNTIME_VALIDATION_ERROR` is defined but never emitted.
7. `UpstreamError` silently accepts out-of-range status codes.
8. `requests.HTTPError` branch re-extracts `request_url` /
`request_method` inconsistently (dead work).

## Test plan

- [x] Existing `libs/tests/sdk/test_httpx_adapter.py` +
`test_graphql_adapter.py` updated; every no-response / construction-bug
test asserts the new class + kind + `can_retry`.
- [x] Full test suite passes locally.
- [x] mypy clean on `arcade-core`, `arcade-tdk`, `arcade-mcp-server`.
- [x] Smoke-tested 21 exception routing cases end-to-end against real
httpx / requests exceptions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes core error classification and retryability for
`httpx`/`requests`/GraphQL transport failures, which can affect tool
retry behavior and telemetry. Risk is mitigated by extensive new/updated
tests covering the new mappings and privacy expectations.
> 
> **Overview**
> **Improves error adapter behavior to be more semantically correct and
privacy-safe.** The HTTP adapter now distinguishes real HTTP responses
(`UpstreamError`/`UpstreamRateLimitError`) from no-response failures
(`NetworkTransportError` with `ErrorKind` + retryability) and from
client construction/local TLS issues (`FatalToolError`).
> 
> **Reduces sensitive data exposure in agent-facing messages.**
Status-based errors now emit standardized messages derived from status
phrase/class, while preserving raw exception detail in
`developer_message`; Google/Microsoft/Slack fallback paths similarly
switch to `unhandled <ExceptionType>` messages and move `str(exc)` into
`developer_message`. GraphQL transport connection/protocol errors are
reclassified from `UpstreamError` (502) to `NetworkTransportError`, and
transport/server messages are standardized.
> 
> Bumps `arcade-tdk` version to `3.8.0` and expands/updates the SDK test
suite to assert new classes, `kind`, `can_retry`, request metadata
extraction, and privacy behavior.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
1041cb1bec4fa3b0bae3e7c6b860b84cf376cf9a. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 20:32:17 -03:00

7.8 KiB

name description
build-error-adapter Build new Arcade error adapters from scratch using public Arcade TDK patterns. Use when adding provider integrations, mapping SDK exceptions, or extending HTTP/GraphQL/auth adapter behavior.

Build Error Adapter

Use this workflow to create new error adapters that fit Arcade TDK conventions.

Official Reference

Start here and align behavior with this doc:

Quick Context

  • Adapter protocol: arcade_tdk.error_adapters.base.ErrorAdapter
  • Common error classes:
    • arcade_tdk.errors.UpstreamError — upstream responded with an HTTP status code
    • arcade_tdk.errors.UpstreamRateLimitError — 429 / quota-exhausted with retry_after_ms
    • arcade_tdk.errors.NetworkTransportError — no complete response was received (timeouts, connection/DNS/TLS failures, decoding errors, redirect exhaustion). status_code is always None; use one of the NETWORK_TRANSPORT_RUNTIME_* kinds: _TIMEOUT, _UNREACHABLE, _UNMAPPED.
    • arcade_tdk.errors.FatalToolError — unrecoverable tool-authoring bug or environment misconfiguration (invalid URL, unsupported protocol, bad headers, TLS trust failures). Never retried.
    • arcade_tdk.errors.RetryableToolError — transient tool-body failure with a hint for the LLM to retry.
    • arcade_tdk.errors.ContextRequiredToolError — needs human input before retry.

Rules To Follow

  1. Keep imports at top-level only (no inline imports), except optional dependency imports that must be lazy by design.
  2. Adapter interface contract:
    • slug class attribute
    • from_exception(self, exc: Exception) -> ToolRuntimeError | None
  3. Return None when the exception is not recognized for that adapter.
  4. Return a ToolRuntimeError subclass for recognized exceptions (UpstreamError, UpstreamRateLimitError, etc.).
  5. Preserve privacy:
    • Agent-facing message must be safe.
    • Put raw vendor detail into developer_message when needed.
  6. Add tests for every new mapping path.
  7. Match your installed Arcade version's decorator API and parameter names.

Privacy Rule When Uncertain

If you are not fully sure what str(exc), vendor reason, or nested payload fields can contain, treat them as potentially sensitive.

  • Default to a safe agent-facing message template:
    • "Upstream <Service> request failed with status code <code>."
    • "Upstream <Service> error: unhandled <ExceptionType>."
  • Put raw details in developer_message instead of message.
  • Prefer structured non-secret context in message (status code, error class, stable provider error code).
  • Never put tokens, auth headers, full URLs with query params, raw response bodies, or stack traces in agent-facing message.

Use this decision rule:

  1. Known-safe field (documented stable code/reason without sensitive payload): may be included in message.
  2. Unknown or mixed-content field: keep out of message; include only in developer_message.
  3. High-risk content (headers/body/credential-like strings): never include in message; sanitize or omit even in developer_message if policy requires.

When in doubt, prefer slightly less detail in message and richer diagnostics in developer_message.

Decide: Adapter vs explicit tool error

Use an error adapter when:

  • You need repeatable translation from vendor exceptions to Arcade errors.
  • The same exception family appears across multiple tools.

Raise explicit tool errors in tool code when:

  • You need user guidance for immediate retry (RetryableToolError).
  • You need user/orchestrator input before retry (ContextRequiredToolError).
  • You need a special business rule for one endpoint/tool path only.

Implementation Pattern

1) Create adapter skeleton

from arcade_core.errors import ToolRuntimeError


class VendorErrorAdapter:
    slug = "_vendor"

    def from_exception(self, exc: Exception) -> ToolRuntimeError | None:
        # recognize typed vendor exceptions
        # return mapped ToolRuntimeError
        return None

2) Use typed exception matching

  • Match most specific subclasses first.
  • Keep a final typed fallback for broad vendor exceptions.
  • Avoid broad except Exception handling inside from_exception.

Example ordering:

  1. Rate limit subtype
  2. Auth subtype
  3. Timeout/transport subtype
  4. General vendor exception fallback

3) Normalize metadata

For adapted errors:

  • Include extra["service"] = self.slug
  • Include extra["error_type"] = type(exc).__name__ for non-status failures
  • Include sanitized endpoint/method when available

4) Map status-like semantics consistently

Upstream responded with an HTTP status code → UpstreamError:

  • 429 → UpstreamRateLimitError with retry_after_ms
  • 5xx → retryable UpstreamError (status_code >= 500)
  • 4xx → non-retryable UpstreamError

UpstreamError derives retryability from status code, so predictable behavior is automatic.

No complete response from upstream → NetworkTransportError:

Use this class when the exception inherently means the request never reached the upstream, or no complete response came back. status_code is None by design.

Exception kind kind= can_retry=
Timeouts (connect, read, pool) NETWORK_TRANSPORT_RUNTIME_TIMEOUT True
Connection refused, DNS, TLS handshake, remote-protocol errors NETWORK_TRANSPORT_RUNTIME_UNREACHABLE True
Decoding failures, generic transport fallback NETWORK_TRANSPORT_RUNTIME_UNMAPPED True
Redirect-loop exhaustion NETWORK_TRANSPORT_RUNTIME_UNMAPPED False

Tool-authoring bugs / local environment misconfiguration → FatalToolError:

Use this class for exceptions that will never succeed on retry — the tool's code or environment needs to change:

  • Invalid URL, unsupported scheme, missing scheme, bad headers, malformed local HTTP protocol state
  • TLS / certificate / trust configuration failures (ssl.SSLError and siblings)

Do not dress these up as UpstreamError — an UpstreamError implies the upstream service actually said "no". Miscategorizing pollutes telemetry and sends the wrong retry signal.

5) Optional dependency handling

For SDK-specific adapters, lazy-import the SDK module inside from_exception if that dependency may be optional.

  • If import fails, log and return None.
  • Do not raise import errors from adapter code paths.

Registration Pattern

For httpx and requests, automatic adaptation is typically available.

For SDK-specific adapters, register explicitly on tools.

from arcade_mcp_server import tool
from arcade_tdk.error_adapters import GoogleErrorAdapter

@tool(
    # Depending on Arcade version, this may be `adapters=` or `error_adapters=`.
    adapters=[GoogleErrorAdapter()],
)
def my_tool(...) -> ...:
    ...

If your project uses a different parameter name, follow your installed API docs/signature.

Required Test Matrix

Create or extend tests in your project test suite:

  • recognized typed exception -> expected ToolRuntimeError subclass
  • expected status_code, kind, can_retry
  • expected extra keys (service, error_type, endpoint/method when applicable)
  • unknown exception returns None
  • optional dependency missing path returns None
  • privacy split is verified:
    • message stays safe for uncertain/raw exceptions
    • developer_message carries deep diagnostics

Done Checklist

  • Adapter returns ToolRuntimeError | None
  • Safe agent-facing messages
  • Uncertain exception content defaults to safe templates
  • Typed exception coverage added
  • Tests added/updated and passing
  • Any required package versioning updated for your repo rules
  • No noisy stdout/stderr output in MCP tool runtime paths