## Summary Routes HTTP adapter exceptions to the right error class instead of shoe-horning everything into `UpstreamError`. Addresses Eric's earlier feedback that several exceptions this PR was wrapping as `UpstreamError` didn't satisfy the "something happened with the upstream" claim (local pool exhaustion, client-side request construction, local TLS failures). ### Scope - `UpstreamError` (unchanged) — upstream responded with an HTTP status code. - **`NetworkTransportError`** (new sibling in `arcade-core`) — no complete response was received. `status_code=None`. Three kinds: `NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, `_UNREACHABLE`, `_UNMAPPED`. - **`FatalToolError`** (existing) — client construction bugs (`InvalidURL`, `UnsupportedProtocol`, `MissingSchema`, `InvalidHeader`, `LocalProtocolError`, …) and local TLS/cert config failures. Never retried. --- ## Before / After (per Eric's request) Shows the error payload a tool produces for each exception, before this PR vs. after. "Before" = current `main` (exceptions without real HTTP responses fall through to the generic `@tool` `FatalToolError` catch-all with `message=str(exc)`). ### No-response transport failures | Exception | Before — class / message / kind | After — class / message / kind | |---|---|---| | `httpx.PoolTimeout` | `FatalToolError` — `str(exc)` leaks raw detail — `TOOL_RUNTIME_FATAL`, not retryable | `NetworkTransportError` — `"HTTP request timed out before a complete response was received."` — `NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, **retryable** | | `httpx.ConnectTimeout` | same as above | same as PoolTimeout — `TIMEOUT`, retryable | | `httpx.ConnectError` (refused / DNS) | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP request failed before reaching the upstream service."` — `UNREACHABLE`, retryable | | `httpx.RemoteProtocolError` (upstream sent bad HTTP) | `FatalToolError` — `str(exc)` | `NetworkTransportError` — same message as ConnectError — `UNREACHABLE`, retryable | | `httpx.DecodingError` | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP response from upstream could not be decoded."` — `UNMAPPED`, retryable | | `httpx.TooManyRedirects` | `FatalToolError` — `str(exc)` | `NetworkTransportError` — `"HTTP redirect limit exceeded before a final response was received."` — `UNMAPPED`, **not** retryable | ### Client construction / local env bugs | Exception | Before | After | |---|---|---| | `httpx.UnsupportedProtocol`, `httpx.InvalidURL`, `httpx.LocalProtocolError` | `FatalToolError` with `message=str(exc)` (may leak scheme / URL content) | `FatalToolError` — `"Tool constructed an invalid HTTP request — likely a tool-authoring bug."` — `TOOL_RUNTIME_FATAL`, not retryable | | `requests.MissingSchema`, `InvalidURL`, `InvalidHeader`, `InvalidSchema`, `InvalidProxyURL`, `URLRequired` | same as above | same as above | | `requests.SSLError` | `FatalToolError` — `str(exc)` often contains raw cert chain detail | `FatalToolError` — `"TLS handshake failed — likely a local certificate or trust configuration issue."` — `TOOL_RUNTIME_FATAL`, not retryable | ### Real HTTP response errors (UNCHANGED — same behavior) | Exception | Class | Message | Kind | Retryable | |---|---|---|---|---| | `httpx.HTTPStatusError` 404 | `UpstreamError` | `"Upstream HTTP request failed (Not Found, client error)."` | `UPSTREAM_RUNTIME_NOT_FOUND` | No | | `httpx.HTTPStatusError` 429 (w/ Retry-After: 60) | `UpstreamRateLimitError` | `"Upstream HTTP request failed (Too Many Requests, client error). Retry after 60 second(s)."` | `UPSTREAM_RUNTIME_RATE_LIMIT` | Yes | | `httpx.HTTPStatusError` 500 | `UpstreamError` | `"Upstream HTTP request failed (Internal Server Error, server error)."` | `UPSTREAM_RUNTIME_SERVER_ERROR` | Yes | ### What's no longer in the message - Raw exception `str(exc)` output (which frequently includes the full URL with query-string tokens, connection pool details, or cert chains) is **no longer the agent-facing `message`**. It's preserved in `developer_message` for server-side diagnostics. - The misleading "Upstream HTTP…" prefix is gone from network-transport and construction-bug messages. Those messages now honestly describe what happened on the tool side. - For 429s without a `Retry-After` header, we still show "Retry after N seconds." (pre-existing behavior; see follow-up notes). --- ## Companion PRs - [ArcadeAI/arcade-mcp#823](https://github.com/ArcadeAI/arcade-mcp/pull/823) — introduces `NetworkTransportError` in `arcade-core` - [ArcadeAI/monorepo#911](https://github.com/ArcadeAI/monorepo/pull/911) — adds the 3 `ErrorKind` constants to the Go engine and Datadog dashboards - [ArcadeAI/docs#920](https://github.com/ArcadeAI/docs/pull/920) — documents the new hierarchy and adapter routing ## Follow-ups (out of scope for this PR) A short investigation surfaced several pre-existing issues that are worth fixing separately. A full list is in `NETWORK_TRANSPORT_ERROR_FOLLOWUPS.md` (shared offline). Summary: 1. `requests.HTTPError` with `response is None` returns `None` from the adapter; should fall through to the `NetworkTransportError(UNMAPPED)` fallback instead of becoming a generic `FatalToolError`. 2. `developer_message` can leak URL query strings (and therefore tokens) since it stores raw `str(exc)`. 3. `_sanitize_uri` does not strip userinfo (credentials in URL path). 4. `_parse_retry_ms` misinterprets epoch-style `x-ratelimit-reset` headers. 5. 429 responses without `Retry-After` synthesize a fabricated "Retry after 1 second(s)." suffix. 6. `UPSTREAM_RUNTIME_VALIDATION_ERROR` is defined but never emitted. 7. `UpstreamError` silently accepts out-of-range status codes. 8. `requests.HTTPError` branch re-extracts `request_url` / `request_method` inconsistently (dead work). ## Test plan - [x] Existing `libs/tests/sdk/test_httpx_adapter.py` + `test_graphql_adapter.py` updated; every no-response / construction-bug test asserts the new class + kind + `can_retry`. - [x] Full test suite passes locally. - [x] mypy clean on `arcade-core`, `arcade-tdk`, `arcade-mcp-server`. - [x] Smoke-tested 21 exception routing cases end-to-end against real httpx / requests exceptions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes core error classification and retryability for `httpx`/`requests`/GraphQL transport failures, which can affect tool retry behavior and telemetry. Risk is mitigated by extensive new/updated tests covering the new mappings and privacy expectations. > > **Overview** > **Improves error adapter behavior to be more semantically correct and privacy-safe.** The HTTP adapter now distinguishes real HTTP responses (`UpstreamError`/`UpstreamRateLimitError`) from no-response failures (`NetworkTransportError` with `ErrorKind` + retryability) and from client construction/local TLS issues (`FatalToolError`). > > **Reduces sensitive data exposure in agent-facing messages.** Status-based errors now emit standardized messages derived from status phrase/class, while preserving raw exception detail in `developer_message`; Google/Microsoft/Slack fallback paths similarly switch to `unhandled <ExceptionType>` messages and move `str(exc)` into `developer_message`. GraphQL transport connection/protocol errors are reclassified from `UpstreamError` (502) to `NetworkTransportError`, and transport/server messages are standardized. > > Bumps `arcade-tdk` version to `3.8.0` and expands/updates the SDK test suite to assert new classes, `kind`, `can_retry`, request metadata extraction, and privacy behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 1041cb1bec4fa3b0bae3e7c6b860b84cf376cf9a. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
257 lines
9.4 KiB
Python
257 lines
9.4 KiB
Python
import logging
|
|
from datetime import datetime, timezone
|
|
from typing import Any
|
|
from urllib.parse import urlparse
|
|
|
|
from arcade_core.errors import (
|
|
ToolRuntimeError,
|
|
UpstreamError,
|
|
UpstreamRateLimitError,
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class SlackErrorAdapter:
|
|
"""Error adapter for Slack SDK (slack-sdk)."""
|
|
|
|
slug = "_slack_sdk"
|
|
|
|
def from_exception(self, exc: Exception) -> ToolRuntimeError | None:
|
|
"""
|
|
Translate a Slack SDK exception into a ToolRuntimeError.
|
|
"""
|
|
# Lazy import the Slack SDK errors module to avoid import errors for toolkits that don't use slack-sdk
|
|
try:
|
|
from slack_sdk import errors
|
|
except ImportError:
|
|
logger.info(
|
|
f"'slack-sdk' is not installed in the toolkit's environment, "
|
|
f"so the '{self.slug}' adapter was not used to handle the upstream error"
|
|
)
|
|
return None
|
|
|
|
result = self._handle_api_errors(exc, errors)
|
|
if result:
|
|
return result
|
|
|
|
result = self._handle_other_errors(exc, errors)
|
|
if result:
|
|
return result
|
|
|
|
# Failsafe for any unhandled Slack SDK errors that are not mapped above
|
|
if hasattr(exc, "__module__") and exc.__module__ and "slack_sdk" in exc.__module__:
|
|
logger.warning(
|
|
"Unknown Slack SDK error encountered: %r. Falling back to generic UpstreamError.",
|
|
exc,
|
|
exc_info=True,
|
|
)
|
|
return UpstreamError(
|
|
message=f"Upstream Slack SDK error: unhandled {exc.__class__.__name__}.",
|
|
status_code=500,
|
|
developer_message=str(exc),
|
|
extra={
|
|
"service": self.slug,
|
|
"error_type": exc.__class__.__name__,
|
|
},
|
|
)
|
|
|
|
# Not a Slack SDK error
|
|
return None
|
|
|
|
def _sanitize_uri(self, uri: str) -> str:
|
|
"""Strip query params and fragments from URI for privacy."""
|
|
|
|
try:
|
|
parsed = urlparse(uri)
|
|
return f"{parsed.scheme}://{parsed.netloc.strip('/')}/{parsed.path.strip('/')}"
|
|
except Exception:
|
|
return uri
|
|
|
|
def _parse_retry_after(self, error: Any) -> int:
|
|
"""
|
|
Extract retry-after from Slack API errors.
|
|
Returns milliseconds to wait before retry.
|
|
Defaults to 1000ms if not found.
|
|
|
|
Args:
|
|
error: The Slack API error to parse
|
|
|
|
Returns:
|
|
The number of milliseconds to wait before retry
|
|
"""
|
|
if hasattr(error, "response") and hasattr(error.response, "headers"):
|
|
headers = error.response.headers
|
|
|
|
retry_after = headers.get("Retry-After", headers.get("retry-after"))
|
|
if retry_after:
|
|
try:
|
|
# If it's a number, it's seconds
|
|
if retry_after.isdigit():
|
|
return int(retry_after) * 1000
|
|
# Otherwise try to parse as date
|
|
dt = datetime.strptime(retry_after, "%a, %d %b %Y %H:%M:%S %Z")
|
|
return int((dt - datetime.now(timezone.utc)).total_seconds() * 1000)
|
|
except Exception:
|
|
logger.warning(
|
|
f"Failed to parse retry-after header: {retry_after}. Defaulting to 1000ms."
|
|
)
|
|
return 1000
|
|
|
|
return 1000
|
|
|
|
def _map_api_error(self, error: Any) -> ToolRuntimeError | None:
|
|
"""Map Slack SlackApiError to appropriate ToolRuntimeError."""
|
|
# Extract error code from Slack API response
|
|
error_code = "unknown_error"
|
|
if hasattr(error, "response") and error.response:
|
|
error_code = error.response.get("error", "unknown_error")
|
|
|
|
status_code = 500 # Default to server error
|
|
if (
|
|
hasattr(error, "response")
|
|
and hasattr(error.response, "status_code")
|
|
and isinstance(error.response.status_code, int)
|
|
):
|
|
status_code = error.response.status_code
|
|
|
|
reason = error_code if error_code != "unknown_error" else "Unknown Slack SDK error"
|
|
|
|
message = f"Upstream Slack API error: {reason}"
|
|
|
|
# Build developer message with additional details
|
|
developer_message = self._build_developer_message(error, error_code)
|
|
|
|
# Build extra metadata
|
|
extra = {
|
|
"service": self.slug,
|
|
}
|
|
|
|
# Try to extract request details if available
|
|
if hasattr(error, "api_url") and error.api_url:
|
|
extra["endpoint"] = self._sanitize_uri(str(error.api_url))
|
|
|
|
extra["error_code"] = error_code
|
|
|
|
# Special case for rate limiting
|
|
if status_code == 429:
|
|
return UpstreamRateLimitError(
|
|
retry_after_ms=self._parse_retry_after(error),
|
|
message=message,
|
|
developer_message=developer_message,
|
|
extra=extra,
|
|
)
|
|
|
|
return UpstreamError(
|
|
message=message,
|
|
status_code=status_code,
|
|
developer_message=developer_message,
|
|
extra=extra,
|
|
)
|
|
|
|
def _build_developer_message(self, error: Any, error_code: str) -> str:
|
|
"""Build developer message with additional details from Slack API error."""
|
|
developer_details = [f"Slack error code: {error_code}"]
|
|
|
|
if not (hasattr(error, "response") and error.response):
|
|
return developer_details[0]
|
|
|
|
warning = self._extract_response_field(error.response, "warning")
|
|
if warning:
|
|
developer_details.append(f"warning: {warning}")
|
|
|
|
response_metadata = self._extract_response_field(error.response, "response_metadata")
|
|
if response_metadata and isinstance(response_metadata, dict):
|
|
warnings = response_metadata.get("warnings", [])
|
|
if warnings:
|
|
developer_details.append(f"warnings: {', '.join(warnings)}")
|
|
|
|
return " - ".join(developer_details)
|
|
|
|
def _extract_response_field(self, response: Any, field: str) -> Any:
|
|
"""Safely extract a field from Slack API response."""
|
|
try:
|
|
if hasattr(response, "get"):
|
|
return response.get(field)
|
|
elif hasattr(response, "__getitem__") and field in response:
|
|
return response[field]
|
|
except (TypeError, KeyError):
|
|
pass
|
|
return None
|
|
|
|
def _handle_api_errors(self, exc: Exception, errors_module: Any) -> ToolRuntimeError | None:
|
|
"""Handle SlackApiError and its subclasses."""
|
|
if isinstance(exc, errors_module.SlackApiError):
|
|
return self._map_api_error(exc)
|
|
|
|
return None
|
|
|
|
def _handle_other_errors(self, exc: Exception, errors_module: Any) -> ToolRuntimeError | None:
|
|
"""Handle non-API Slack SDK errors."""
|
|
if isinstance(exc, errors_module.SlackRequestError):
|
|
return UpstreamError(
|
|
message="Upstream Slack SDK error: Problem with the request being submitted",
|
|
status_code=502,
|
|
developer_message=str(exc),
|
|
extra={
|
|
"service": self.slug,
|
|
"error_type": errors_module.SlackRequestError.__name__,
|
|
},
|
|
)
|
|
|
|
if isinstance(exc, errors_module.SlackTokenRotationError):
|
|
return UpstreamError(
|
|
message="Upstream Slack SDK error: Token rotation failed",
|
|
status_code=401,
|
|
developer_message=str(exc),
|
|
extra={
|
|
"service": self.slug,
|
|
"error_type": errors_module.SlackTokenRotationError.__name__,
|
|
},
|
|
)
|
|
|
|
if isinstance(exc, errors_module.BotUserAccessError):
|
|
return UpstreamError(
|
|
message="Upstream Slack SDK error: Bot token used for user-only API method",
|
|
status_code=403,
|
|
developer_message=str(exc),
|
|
extra={
|
|
"service": self.slug,
|
|
"error_type": errors_module.BotUserAccessError.__name__,
|
|
},
|
|
)
|
|
|
|
if isinstance(exc, errors_module.SlackClientConfigurationError):
|
|
return UpstreamError(
|
|
message="Upstream Slack SDK error: Invalid client configuration",
|
|
status_code=400,
|
|
developer_message=str(exc),
|
|
extra={
|
|
"service": self.slug,
|
|
"error_type": errors_module.SlackClientConfigurationError.__name__,
|
|
},
|
|
)
|
|
|
|
if isinstance(exc, errors_module.SlackClientNotConnectedError):
|
|
return UpstreamError(
|
|
message="Upstream Slack SDK error: WebSocket connection is closed",
|
|
status_code=503,
|
|
developer_message=str(exc),
|
|
extra={
|
|
"service": self.slug,
|
|
"error_type": errors_module.SlackClientNotConnectedError.__name__,
|
|
},
|
|
)
|
|
|
|
if isinstance(exc, errors_module.SlackObjectFormationError):
|
|
return UpstreamError(
|
|
message="Upstream Slack SDK error: Invalid or malformed object",
|
|
status_code=400,
|
|
developer_message=str(exc),
|
|
extra={
|
|
"service": self.slug,
|
|
"error_type": errors_module.SlackObjectFormationError.__name__,
|
|
},
|
|
)
|
|
|
|
return None
|