## Summary Routes HTTP adapter exceptions to the right error class instead of shoe-horning everything into `UpstreamError`. Addresses Eric's earlier feedback that several exceptions this PR was wrapping as `UpstreamError` didn't satisfy the "something happened with the upstream" claim (local pool exhaustion, client-side request construction, local TLS failures). ### Scope - `UpstreamError` (unchanged) — upstream responded with an HTTP status code. - **`NetworkTransportError`** (new sibling in `arcade-core`) — no complete response was received. `status_code=None`. Three kinds: `NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, `_UNREACHABLE`, `_UNMAPPED`. - **`FatalToolError`** (existing) — client construction bugs (`InvalidURL`, `UnsupportedProtocol`, `MissingSchema`, `InvalidHeader`, `LocalProtocolError`, …) and local TLS/cert config failures. Never retried. --- ## Before / After (per Eric's request) Shows the error payload a tool produces for each exception, before this PR vs. after. "Before" = current `main` (exceptions without real HTTP responses fall through to the generic `@tool` `FatalToolError` catch-all with `message=str(exc)`). ### No-response transport failures | Exception | Before — class / message / kind | After — class / message / kind | |---|---|---| | `httpx.PoolTimeout` | `FatalToolError` — `str(exc)` leaks raw detail — `TOOL_RUNTIME_FATAL`, not retryable | `NetworkTransportError` — `"HTTP request timed out before a complete response was received."` — `NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, **retryable** | | `httpx.ConnectTimeout` | same as above | same as PoolTimeout — `TIMEOUT`, retryable | | `httpx.ConnectError` (refused / DNS) | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP request failed before reaching the upstream service."` — `UNREACHABLE`, retryable | | `httpx.RemoteProtocolError` (upstream sent bad HTTP) | `FatalToolError` — `str(exc)` | `NetworkTransportError` — same message as ConnectError — `UNREACHABLE`, retryable | | `httpx.DecodingError` | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP response from upstream could not be decoded."` — `UNMAPPED`, retryable | | `httpx.TooManyRedirects` | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP redirect limit exceeded before a final response was received."` — `UNMAPPED`, **not** retryable | ### Client construction / local env bugs | Exception | Before | After | |---|---|---| | `httpx.UnsupportedProtocol`, `httpx.InvalidURL`, `httpx.LocalProtocolError` | `FatalToolError` with `message=str(exc)` (may leak scheme / URL content) | `FatalToolError` — `"Tool constructed an invalid HTTP request — likely a tool-authoring bug."` — `TOOL_RUNTIME_FATAL`, not retryable | | `requests.MissingSchema`, `InvalidURL`, `InvalidHeader`, `InvalidSchema`, `InvalidProxyURL`, `URLRequired` | same as above | same as above | | `requests.SSLError` | `FatalToolError` — `str(exc)` often contains raw cert chain detail | `FatalToolError` — `"TLS handshake failed — likely a local certificate or trust configuration issue."` — `TOOL_RUNTIME_FATAL`, not retryable | ### Real HTTP response errors (UNCHANGED — same behavior) | Exception | Class | Message | Kind | Retryable | |---|---|---|---|---| | `httpx.HTTPStatusError` 404 | `UpstreamError` | `"Upstream HTTP request failed (Not Found, client error)."` | `UPSTREAM_RUNTIME_NOT_FOUND` | No | | `httpx.HTTPStatusError` 429 (w/ Retry-After: 60) | `UpstreamRateLimitError` | `"Upstream HTTP request failed (Too Many Requests, client error). Retry after 60 second(s)."` | `UPSTREAM_RUNTIME_RATE_LIMIT` | Yes | | `httpx.HTTPStatusError` 500 | `UpstreamError` | `"Upstream HTTP request failed (Internal Server Error, server error)."` | `UPSTREAM_RUNTIME_SERVER_ERROR` | Yes | ### What's no longer in the message - Raw exception `str(exc)` output (which frequently includes the full URL with query-string tokens, connection pool details, or cert chains) is **no longer the agent-facing `message`**. It's preserved in `developer_message` for server-side diagnostics. - The misleading "Upstream HTTP…" prefix is gone from network-transport and construction-bug messages. Those messages now honestly describe what happened on the tool side. - For 429s without a `Retry-After` header, we still show "Retry after N seconds." (pre-existing behavior; see follow-up notes). --- ## Companion PRs - [ArcadeAI/arcade-mcp#823](https://github.com/ArcadeAI/arcade-mcp/pull/823) — introduces `NetworkTransportError` in `arcade-core` - [ArcadeAI/monorepo#911](https://github.com/ArcadeAI/monorepo/pull/911) — adds the 3 `ErrorKind` constants to the Go engine and Datadog dashboards - [ArcadeAI/docs#920](https://github.com/ArcadeAI/docs/pull/920) — documents the new hierarchy and adapter routing ## Follow-ups (out of scope for this PR) A short investigation surfaced several pre-existing issues that are worth fixing separately. A full list is in `NETWORK_TRANSPORT_ERROR_FOLLOWUPS.md` (shared offline). Summary: 1. `requests.HTTPError` with `response is None` returns `None` from the adapter; should fall through to the `NetworkTransportError(UNMAPPED)` fallback instead of becoming a generic `FatalToolError`. 2. `developer_message` can leak URL query strings (and therefore tokens) since it stores raw `str(exc)`. 3. `_sanitize_uri` does not strip userinfo (credentials in URL path). 4. `_parse_retry_ms` misinterprets epoch-style `x-ratelimit-reset` headers. 5. 429 responses without `Retry-After` synthesize a fabricated "Retry after 1 second(s)." suffix. 6. `UPSTREAM_RUNTIME_VALIDATION_ERROR` is defined but never emitted. 7. `UpstreamError` silently accepts out-of-range status codes. 8. `requests.HTTPError` branch re-extracts `request_url` / `request_method` inconsistently (dead work). ## Test plan - [x] Existing `libs/tests/sdk/test_httpx_adapter.py` + `test_graphql_adapter.py` updated; every no-response / construction-bug test asserts the new class + kind + `can_retry`. - [x] Full test suite passes locally. - [x] mypy clean on `arcade-core`, `arcade-tdk`, `arcade-mcp-server`. - [x] Smoke-tested 21 exception routing cases end-to-end against real httpx / requests exceptions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes core error classification and retryability for `httpx`/`requests`/GraphQL transport failures, which can affect tool retry behavior and telemetry. Risk is mitigated by extensive new/updated tests covering the new mappings and privacy expectations. > > **Overview** > **Improves error adapter behavior to be more semantically correct and privacy-safe.** The HTTP adapter now distinguishes real HTTP responses (`UpstreamError`/`UpstreamRateLimitError`) from no-response failures (`NetworkTransportError` with `ErrorKind` + retryability) and from client construction/local TLS issues (`FatalToolError`). > > **Reduces sensitive data exposure in agent-facing messages.** Status-based errors now emit standardized messages derived from status phrase/class, while preserving raw exception detail in `developer_message`; Google/Microsoft/Slack fallback paths similarly switch to `unhandled <ExceptionType>` messages and move `str(exc)` into `developer_message`. GraphQL transport connection/protocol errors are reclassified from `UpstreamError` (502) to `NetworkTransportError`, and transport/server messages are standardized. > > Bumps `arcade-tdk` version to `3.8.0` and expands/updates the SDK test suite to assert new classes, `kind`, `can_retry`, request metadata extraction, and privacy behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 1041cb1bec4fa3b0bae3e7c6b860b84cf376cf9a. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
296 lines
11 KiB
Python
296 lines
11 KiB
Python
from __future__ import annotations
|
|
|
|
import sys
|
|
from collections.abc import Iterator
|
|
from http import HTTPStatus
|
|
from pathlib import Path
|
|
from typing import Any
|
|
from unittest.mock import patch
|
|
|
|
import pytest
|
|
from arcade_core.errors import (
|
|
ErrorKind,
|
|
NetworkTransportError,
|
|
UpstreamError,
|
|
UpstreamRateLimitError,
|
|
)
|
|
|
|
LIBS_DIR = Path(__file__).resolve().parents[2]
|
|
TDK_SRC = LIBS_DIR / "arcade-tdk"
|
|
if str(TDK_SRC) not in sys.path:
|
|
sys.path.insert(0, str(TDK_SRC))
|
|
|
|
import arcade_tdk.providers.graphql.error_adapter as gql_adapter # noqa: E402
|
|
|
|
# --- Dummy exception classes for testing ---
|
|
|
|
|
|
class DummyTransportError(Exception):
|
|
def __init__(self, message: str, code: int | None = None) -> None:
|
|
super().__init__(message)
|
|
self.code = code
|
|
|
|
|
|
class DummyTransportQueryError(Exception):
|
|
def __init__(self, errors: list[dict[str, Any]] | None = None) -> None:
|
|
super().__init__("query error")
|
|
self.errors = errors
|
|
|
|
|
|
class DummyResponse:
|
|
def __init__(self, headers: dict[str, str] | None = None) -> None:
|
|
self.headers = headers or {}
|
|
|
|
|
|
class DummyTransportServerError(Exception):
|
|
def __init__(
|
|
self, message: str, code: int | None = None, headers: dict[str, str] | None = None
|
|
):
|
|
super().__init__(message)
|
|
self.code = code
|
|
if headers is not None:
|
|
self.response = DummyResponse(headers)
|
|
|
|
|
|
class DummyTransportConnectionFailed(DummyTransportError):
|
|
pass
|
|
|
|
|
|
class DummyTransportProtocolError(DummyTransportError):
|
|
pass
|
|
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def reset_cache() -> Iterator[None]:
|
|
"""Clear cached gql import state between tests."""
|
|
gql_adapter._load_gql_transport_errors.cache_clear()
|
|
yield
|
|
gql_adapter._load_gql_transport_errors.cache_clear()
|
|
|
|
|
|
def _patch_loader() -> Any:
|
|
"""Patch the loader to return our dummy classes."""
|
|
return patch.object(
|
|
gql_adapter,
|
|
"_load_gql_transport_errors",
|
|
return_value=(
|
|
DummyTransportError,
|
|
DummyTransportQueryError,
|
|
DummyTransportServerError,
|
|
DummyTransportConnectionFailed,
|
|
DummyTransportProtocolError,
|
|
),
|
|
)
|
|
|
|
|
|
class TestGraphQLErrorAdapter:
|
|
# --- Import/caching tests ---
|
|
|
|
def test_skips_when_gql_not_installed(self, monkeypatch: pytest.MonkeyPatch) -> None:
|
|
"""Should return None and cache the import failure."""
|
|
call_count = {"n": 0}
|
|
|
|
def fake_import(name: str) -> None:
|
|
call_count["n"] += 1
|
|
raise ImportError("no gql")
|
|
|
|
monkeypatch.setattr(gql_adapter.importlib, "import_module", fake_import)
|
|
adapter = gql_adapter.GraphQLErrorAdapter()
|
|
|
|
assert adapter.from_exception(Exception("x")) is None
|
|
assert adapter.from_exception(Exception("y")) is None
|
|
assert call_count["n"] == 1 # Only tried once
|
|
|
|
def test_ignores_non_gql_exceptions(self) -> None:
|
|
"""Non-gql exceptions should return None."""
|
|
with _patch_loader():
|
|
adapter = gql_adapter.GraphQLErrorAdapter()
|
|
assert adapter.from_exception(RuntimeError("not gql")) is None
|
|
|
|
# --- TransportQueryError tests ---
|
|
|
|
def test_query_error_extracts_messages_and_codes(self) -> None:
|
|
"""Should extract messages and map error codes to status."""
|
|
errors = [
|
|
{"message": "Not authorized", "extensions": {"code": "FORBIDDEN"}},
|
|
{"message": "Server error", "extensions": {"code": "INTERNAL_SERVER_ERROR"}},
|
|
]
|
|
exc = DummyTransportQueryError(errors=errors)
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, UpstreamError)
|
|
assert result.status_code == HTTPStatus.INTERNAL_SERVER_ERROR # Highest mapped status
|
|
assert "Not authorized" in result.message
|
|
assert "Server error" in result.message
|
|
assert result.extra["gql_error_codes"] == ["FORBIDDEN", "INTERNAL_SERVER_ERROR"]
|
|
|
|
def test_query_error_defaults_when_empty(self) -> None:
|
|
"""Should handle empty/missing errors gracefully."""
|
|
exc = DummyTransportQueryError(errors=None)
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, UpstreamError)
|
|
assert result.status_code == HTTPStatus.UNPROCESSABLE_ENTITY
|
|
assert "Unknown GraphQL error" in result.message
|
|
|
|
def test_query_error_deduplicates_codes(self) -> None:
|
|
"""Duplicate error codes should be deduplicated."""
|
|
errors = [
|
|
{"message": "A", "extensions": {"code": "FORBIDDEN"}},
|
|
{"message": "B", "extensions": {"code": "FORBIDDEN"}},
|
|
]
|
|
exc = DummyTransportQueryError(errors=errors)
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert result.extra["gql_error_codes"] == ["FORBIDDEN"]
|
|
|
|
# --- TransportServerError tests ---
|
|
|
|
def test_server_error_detects_rate_limit(self) -> None:
|
|
"""Should detect rate limits from status + headers."""
|
|
exc = DummyTransportServerError(
|
|
message="Too many requests",
|
|
code=429,
|
|
headers={"retry-after": "5"},
|
|
)
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, UpstreamRateLimitError)
|
|
assert result.retry_after_ms == 5000
|
|
|
|
def test_server_error_defaults_to_500(self) -> None:
|
|
"""Should default to 500 when no status code."""
|
|
exc = DummyTransportServerError("Server error", code=None)
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, UpstreamError)
|
|
assert result.status_code == HTTPStatus.INTERNAL_SERVER_ERROR
|
|
assert result.message == "Upstream GraphQL request failed with status code 500."
|
|
assert result.developer_message == "Server error"
|
|
|
|
def test_server_error_extracts_headers_from_cause(self) -> None:
|
|
"""Should extract headers from __cause__ if not on exception."""
|
|
exc = DummyTransportServerError("Error", code=429)
|
|
# No headers on exc, but on __cause__
|
|
cause = Exception("inner")
|
|
cause.response = DummyResponse({"retry-after": "10"}) # type: ignore
|
|
exc.__cause__ = cause
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, UpstreamRateLimitError)
|
|
assert result.retry_after_ms == 10000
|
|
|
|
def test_server_error_extracts_url_from_cause_aiohttp(self) -> None:
|
|
"""Should extract URL from __cause__ (aiohttp pattern)."""
|
|
exc = DummyTransportServerError("Error", code=500)
|
|
|
|
# aiohttp style: request_info.url
|
|
class FakeRequestInfo:
|
|
url = "https://api.github.com/graphql"
|
|
method = "POST"
|
|
|
|
cause = Exception("inner")
|
|
cause.request_info = FakeRequestInfo() # type: ignore
|
|
exc.__cause__ = cause
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, UpstreamError)
|
|
assert result.extra is not None
|
|
assert result.extra.get("endpoint") == "https://api.github.com/graphql"
|
|
assert result.extra.get("http_method") == "POST"
|
|
|
|
def test_server_error_extracts_url_from_cause_httpx(self) -> None:
|
|
"""Should extract URL from __cause__ (httpx/requests pattern)."""
|
|
exc = DummyTransportServerError("Error", code=500)
|
|
|
|
# httpx style: response.request.url
|
|
class FakeRequest:
|
|
url = "https://api.stripe.com/graphql"
|
|
method = "POST"
|
|
|
|
class FakeResponse:
|
|
request = FakeRequest()
|
|
|
|
cause = Exception("inner")
|
|
cause.response = FakeResponse() # type: ignore
|
|
exc.__cause__ = cause
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, UpstreamError)
|
|
assert result.extra is not None
|
|
assert result.extra.get("endpoint") == "https://api.stripe.com/graphql"
|
|
assert result.extra.get("http_method") == "POST"
|
|
|
|
# --- Connection/Protocol error tests ---
|
|
|
|
def test_connection_failed_maps_to_network_transport_unreachable(self) -> None:
|
|
"""Connection failures never reached upstream — NetworkTransportError."""
|
|
exc = DummyTransportConnectionFailed("Connection refused")
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, NetworkTransportError)
|
|
assert result.kind == ErrorKind.NETWORK_TRANSPORT_RUNTIME_UNREACHABLE
|
|
assert result.can_retry is True
|
|
assert result.status_code is None
|
|
assert result.extra["error_type"] == "DummyTransportConnectionFailed"
|
|
|
|
def test_protocol_error_maps_to_network_transport_unreachable(self) -> None:
|
|
"""Protocol errors (incomplete / malformed exchange) → NetworkTransportError."""
|
|
exc = DummyTransportProtocolError("Invalid response")
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, NetworkTransportError)
|
|
assert result.kind == ErrorKind.NETWORK_TRANSPORT_RUNTIME_UNREACHABLE
|
|
assert result.can_retry is True
|
|
assert result.status_code is None
|
|
assert result.extra["error_type"] == "DummyTransportProtocolError"
|
|
|
|
# --- Generic TransportError catch-all ---
|
|
|
|
def test_generic_transport_error_handled(self) -> None:
|
|
"""Unknown TransportError subclasses should be caught."""
|
|
exc = DummyTransportError("Unknown error", code=503)
|
|
|
|
with _patch_loader():
|
|
result = gql_adapter.GraphQLErrorAdapter().from_exception(exc)
|
|
|
|
assert isinstance(result, UpstreamError)
|
|
assert result.status_code == 503
|
|
assert result.message == "Upstream GraphQL request failed with status code 503."
|
|
assert result.developer_message == "Unknown error"
|
|
|
|
# --- Edge cases ---
|
|
|
|
def test_extract_message_handles_bad_str(self) -> None:
|
|
"""Should handle objects that fail str()."""
|
|
|
|
class BadStr:
|
|
def __str__(self) -> str:
|
|
raise ValueError("nope")
|
|
|
|
assert gql_adapter._extract_error_message(BadStr()) == "Unknown GraphQL error"
|
|
|
|
def test_extract_message_handles_empty(self) -> None:
|
|
"""Should handle empty/None messages."""
|
|
assert gql_adapter._extract_error_message(None) == "Unknown GraphQL error"
|
|
assert gql_adapter._extract_error_message("") == "Unknown GraphQL error"
|