## Summary Routes HTTP adapter exceptions to the right error class instead of shoe-horning everything into `UpstreamError`. Addresses Eric's earlier feedback that several exceptions this PR was wrapping as `UpstreamError` didn't satisfy the "something happened with the upstream" claim (local pool exhaustion, client-side request construction, local TLS failures). ### Scope - `UpstreamError` (unchanged) — upstream responded with an HTTP status code. - **`NetworkTransportError`** (new sibling in `arcade-core`) — no complete response was received. `status_code=None`. Three kinds: `NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, `_UNREACHABLE`, `_UNMAPPED`. - **`FatalToolError`** (existing) — client construction bugs (`InvalidURL`, `UnsupportedProtocol`, `MissingSchema`, `InvalidHeader`, `LocalProtocolError`, …) and local TLS/cert config failures. Never retried. --- ## Before / After (per Eric's request) Shows the error payload a tool produces for each exception, before this PR vs. after. "Before" = current `main` (exceptions without real HTTP responses fall through to the generic `@tool` `FatalToolError` catch-all with `message=str(exc)`). ### No-response transport failures | Exception | Before — class / message / kind | After — class / message / kind | |---|---|---| | `httpx.PoolTimeout` | `FatalToolError` — `str(exc)` leaks raw detail — `TOOL_RUNTIME_FATAL`, not retryable | `NetworkTransportError` — `"HTTP request timed out before a complete response was received."` — `NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, **retryable** | | `httpx.ConnectTimeout` | same as above | same as PoolTimeout — `TIMEOUT`, retryable | | `httpx.ConnectError` (refused / DNS) | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP request failed before reaching the upstream service."` — `UNREACHABLE`, retryable | | `httpx.RemoteProtocolError` (upstream sent bad HTTP) | `FatalToolError` — `str(exc)` | `NetworkTransportError` — same message as ConnectError — `UNREACHABLE`, retryable | | `httpx.DecodingError` | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP response from upstream could not be decoded."` — `UNMAPPED`, retryable | | `httpx.TooManyRedirects` | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP redirect limit exceeded before a final response was received."` — `UNMAPPED`, **not** retryable | ### Client construction / local env bugs | Exception | Before | After | |---|---|---| | `httpx.UnsupportedProtocol`, `httpx.InvalidURL`, `httpx.LocalProtocolError` | `FatalToolError` with `message=str(exc)` (may leak scheme / URL content) | `FatalToolError` — `"Tool constructed an invalid HTTP request — likely a tool-authoring bug."` — `TOOL_RUNTIME_FATAL`, not retryable | | `requests.MissingSchema`, `InvalidURL`, `InvalidHeader`, `InvalidSchema`, `InvalidProxyURL`, `URLRequired` | same as above | same as above | | `requests.SSLError` | `FatalToolError` — `str(exc)` often contains raw cert chain detail | `FatalToolError` — `"TLS handshake failed — likely a local certificate or trust configuration issue."` — `TOOL_RUNTIME_FATAL`, not retryable | ### Real HTTP response errors (UNCHANGED — same behavior) | Exception | Class | Message | Kind | Retryable | |---|---|---|---|---| | `httpx.HTTPStatusError` 404 | `UpstreamError` | `"Upstream HTTP request failed (Not Found, client error)."` | `UPSTREAM_RUNTIME_NOT_FOUND` | No | | `httpx.HTTPStatusError` 429 (w/ Retry-After: 60) | `UpstreamRateLimitError` | `"Upstream HTTP request failed (Too Many Requests, client error). Retry after 60 second(s)."` | `UPSTREAM_RUNTIME_RATE_LIMIT` | Yes | | `httpx.HTTPStatusError` 500 | `UpstreamError` | `"Upstream HTTP request failed (Internal Server Error, server error)."` | `UPSTREAM_RUNTIME_SERVER_ERROR` | Yes | ### What's no longer in the message - Raw exception `str(exc)` output (which frequently includes the full URL with query-string tokens, connection pool details, or cert chains) is **no longer the agent-facing `message`**. It's preserved in `developer_message` for server-side diagnostics. - The misleading "Upstream HTTP…" prefix is gone from network-transport and construction-bug messages. Those messages now honestly describe what happened on the tool side. - For 429s without a `Retry-After` header, we still show "Retry after N seconds." (pre-existing behavior; see follow-up notes). --- ## Companion PRs - [ArcadeAI/arcade-mcp#823](https://github.com/ArcadeAI/arcade-mcp/pull/823) — introduces `NetworkTransportError` in `arcade-core` - [ArcadeAI/monorepo#911](https://github.com/ArcadeAI/monorepo/pull/911) — adds the 3 `ErrorKind` constants to the Go engine and Datadog dashboards - [ArcadeAI/docs#920](https://github.com/ArcadeAI/docs/pull/920) — documents the new hierarchy and adapter routing ## Follow-ups (out of scope for this PR) A short investigation surfaced several pre-existing issues that are worth fixing separately. A full list is in `NETWORK_TRANSPORT_ERROR_FOLLOWUPS.md` (shared offline). Summary: 1. `requests.HTTPError` with `response is None` returns `None` from the adapter; should fall through to the `NetworkTransportError(UNMAPPED)` fallback instead of becoming a generic `FatalToolError`. 2. `developer_message` can leak URL query strings (and therefore tokens) since it stores raw `str(exc)`. 3. `_sanitize_uri` does not strip userinfo (credentials in URL path). 4. `_parse_retry_ms` misinterprets epoch-style `x-ratelimit-reset` headers. 5. 429 responses without `Retry-After` synthesize a fabricated "Retry after 1 second(s)." suffix. 6. `UPSTREAM_RUNTIME_VALIDATION_ERROR` is defined but never emitted. 7. `UpstreamError` silently accepts out-of-range status codes. 8. `requests.HTTPError` branch re-extracts `request_url` / `request_method` inconsistently (dead work). ## Test plan - [x] Existing `libs/tests/sdk/test_httpx_adapter.py` + `test_graphql_adapter.py` updated; every no-response / construction-bug test asserts the new class + kind + `can_retry`. - [x] Full test suite passes locally. - [x] mypy clean on `arcade-core`, `arcade-tdk`, `arcade-mcp-server`. - [x] Smoke-tested 21 exception routing cases end-to-end against real httpx / requests exceptions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes core error classification and retryability for `httpx`/`requests`/GraphQL transport failures, which can affect tool retry behavior and telemetry. Risk is mitigated by extensive new/updated tests covering the new mappings and privacy expectations. > > **Overview** > **Improves error adapter behavior to be more semantically correct and privacy-safe.** The HTTP adapter now distinguishes real HTTP responses (`UpstreamError`/`UpstreamRateLimitError`) from no-response failures (`NetworkTransportError` with `ErrorKind` + retryability) and from client construction/local TLS issues (`FatalToolError`). > > **Reduces sensitive data exposure in agent-facing messages.** Status-based errors now emit standardized messages derived from status phrase/class, while preserving raw exception detail in `developer_message`; Google/Microsoft/Slack fallback paths similarly switch to `unhandled <ExceptionType>` messages and move `str(exc)` into `developer_message`. GraphQL transport connection/protocol errors are reclassified from `UpstreamError` (502) to `NetworkTransportError`, and transport/server messages are standardized. > > Bumps `arcade-tdk` version to `3.8.0` and expands/updates the SDK test suite to assert new classes, `kind`, `can_retry`, request metadata extraction, and privacy behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 1041cb1bec4fa3b0bae3e7c6b860b84cf376cf9a. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7.8 KiB
| name | description |
|---|---|
| build-error-adapter | Build new Arcade error adapters from scratch using public Arcade TDK patterns. Use when adding provider integrations, mapping SDK exceptions, or extending HTTP/GraphQL/auth adapter behavior. |
Build Error Adapter
Use this workflow to create new error adapters that fit Arcade TDK conventions.
Official Reference
Start here and align behavior with this doc:
Quick Context
- Adapter protocol:
arcade_tdk.error_adapters.base.ErrorAdapter - Common error classes:
arcade_tdk.errors.UpstreamError— upstream responded with an HTTP status codearcade_tdk.errors.UpstreamRateLimitError— 429 / quota-exhausted withretry_after_msarcade_tdk.errors.NetworkTransportError— no complete response was received (timeouts, connection/DNS/TLS failures, decoding errors, redirect exhaustion).status_codeis alwaysNone; use one of theNETWORK_TRANSPORT_RUNTIME_*kinds:_TIMEOUT,_UNREACHABLE,_UNMAPPED.arcade_tdk.errors.FatalToolError— unrecoverable tool-authoring bug or environment misconfiguration (invalid URL, unsupported protocol, bad headers, TLS trust failures). Never retried.arcade_tdk.errors.RetryableToolError— transient tool-body failure with a hint for the LLM to retry.arcade_tdk.errors.ContextRequiredToolError— needs human input before retry.
Rules To Follow
- Keep imports at top-level only (no inline imports), except optional dependency imports that must be lazy by design.
- Adapter interface contract:
slugclass attributefrom_exception(self, exc: Exception) -> ToolRuntimeError | None
- Return
Nonewhen the exception is not recognized for that adapter. - Return a
ToolRuntimeErrorsubclass for recognized exceptions (UpstreamError,UpstreamRateLimitError, etc.). - Preserve privacy:
- Agent-facing
messagemust be safe. - Put raw vendor detail into
developer_messagewhen needed.
- Agent-facing
- Add tests for every new mapping path.
- Match your installed Arcade version's decorator API and parameter names.
Privacy Rule When Uncertain
If you are not fully sure what str(exc), vendor reason, or nested payload fields can contain, treat them as potentially sensitive.
- Default to a safe agent-facing message template:
"Upstream <Service> request failed with status code <code>.""Upstream <Service> error: unhandled <ExceptionType>."
- Put raw details in
developer_messageinstead ofmessage. - Prefer structured non-secret context in
message(status code, error class, stable provider error code). - Never put tokens, auth headers, full URLs with query params, raw response bodies, or stack traces in agent-facing
message.
Use this decision rule:
- Known-safe field (documented stable code/reason without sensitive payload): may be included in
message. - Unknown or mixed-content field: keep out of
message; include only indeveloper_message. - High-risk content (headers/body/credential-like strings): never include in
message; sanitize or omit even indeveloper_messageif policy requires.
When in doubt, prefer slightly less detail in message and richer diagnostics in developer_message.
Decide: Adapter vs explicit tool error
Use an error adapter when:
- You need repeatable translation from vendor exceptions to Arcade errors.
- The same exception family appears across multiple tools.
Raise explicit tool errors in tool code when:
- You need user guidance for immediate retry (
RetryableToolError). - You need user/orchestrator input before retry (
ContextRequiredToolError). - You need a special business rule for one endpoint/tool path only.
Implementation Pattern
1) Create adapter skeleton
from arcade_core.errors import ToolRuntimeError
class VendorErrorAdapter:
slug = "_vendor"
def from_exception(self, exc: Exception) -> ToolRuntimeError | None:
# recognize typed vendor exceptions
# return mapped ToolRuntimeError
return None
2) Use typed exception matching
- Match most specific subclasses first.
- Keep a final typed fallback for broad vendor exceptions.
- Avoid broad
except Exceptionhandling insidefrom_exception.
Example ordering:
- Rate limit subtype
- Auth subtype
- Timeout/transport subtype
- General vendor exception fallback
3) Normalize metadata
For adapted errors:
- Include
extra["service"] = self.slug - Include
extra["error_type"] = type(exc).__name__for non-status failures - Include sanitized endpoint/method when available
4) Map status-like semantics consistently
Upstream responded with an HTTP status code → UpstreamError:
- 429 →
UpstreamRateLimitErrorwithretry_after_ms - 5xx → retryable
UpstreamError(status_code >= 500) - 4xx → non-retryable
UpstreamError
UpstreamError derives retryability from status code, so predictable behavior is automatic.
No complete response from upstream → NetworkTransportError:
Use this class when the exception inherently means the request never reached the
upstream, or no complete response came back. status_code is None by design.
| Exception kind | kind= |
can_retry= |
|---|---|---|
| Timeouts (connect, read, pool) | NETWORK_TRANSPORT_RUNTIME_TIMEOUT |
True |
| Connection refused, DNS, TLS handshake, remote-protocol errors | NETWORK_TRANSPORT_RUNTIME_UNREACHABLE |
True |
| Decoding failures, generic transport fallback | NETWORK_TRANSPORT_RUNTIME_UNMAPPED |
True |
| Redirect-loop exhaustion | NETWORK_TRANSPORT_RUNTIME_UNMAPPED |
False |
Tool-authoring bugs / local environment misconfiguration → FatalToolError:
Use this class for exceptions that will never succeed on retry — the tool's code or environment needs to change:
- Invalid URL, unsupported scheme, missing scheme, bad headers, malformed local HTTP protocol state
- TLS / certificate / trust configuration failures (
ssl.SSLErrorand siblings)
Do not dress these up as UpstreamError — an UpstreamError implies the
upstream service actually said "no". Miscategorizing pollutes telemetry and
sends the wrong retry signal.
5) Optional dependency handling
For SDK-specific adapters, lazy-import the SDK module inside from_exception if that dependency may be optional.
- If import fails, log and return
None. - Do not raise import errors from adapter code paths.
Registration Pattern
For httpx and requests, automatic adaptation is typically available.
For SDK-specific adapters, register explicitly on tools.
from arcade_mcp_server import tool
from arcade_tdk.error_adapters import GoogleErrorAdapter
@tool(
# Depending on Arcade version, this may be `adapters=` or `error_adapters=`.
adapters=[GoogleErrorAdapter()],
)
def my_tool(...) -> ...:
...
If your project uses a different parameter name, follow your installed API docs/signature.
Required Test Matrix
Create or extend tests in your project test suite:
- recognized typed exception -> expected
ToolRuntimeErrorsubclass - expected
status_code,kind,can_retry - expected
extrakeys (service,error_type, endpoint/method when applicable) - unknown exception returns
None - optional dependency missing path returns
None - privacy split is verified:
messagestays safe for uncertain/raw exceptionsdeveloper_messagecarries deep diagnostics
Done Checklist
- Adapter returns
ToolRuntimeError | None - Safe agent-facing messages
- Uncertain exception content defaults to safe templates
- Typed exception coverage added
- Tests added/updated and passing
- Any required package versioning updated for your repo rules
- No noisy stdout/stderr output in MCP tool runtime paths