arcade-mcp/libs/arcade-tdk/arcade_tdk/providers/microsoft/error_adapter.py
Francisco Or Something 8f5d0ff54e
Improve typed httpx error mapping and adapter guidance (#820)
## Summary

Routes HTTP adapter exceptions to the right error class instead of
shoe-horning everything into `UpstreamError`. Addresses Eric's earlier
feedback that several exceptions this PR was wrapping as `UpstreamError`
didn't satisfy the "something happened with the upstream" claim (local
pool exhaustion, client-side request construction, local TLS failures).

### Scope

- `UpstreamError` (unchanged) — upstream responded with an HTTP status
code.
- **`NetworkTransportError`** (new sibling in `arcade-core`) — no
complete response was received. `status_code=None`. Three kinds:
`NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, `_UNREACHABLE`, `_UNMAPPED`.
- **`FatalToolError`** (existing) — client construction bugs
(`InvalidURL`, `UnsupportedProtocol`, `MissingSchema`, `InvalidHeader`,
`LocalProtocolError`, …) and local TLS/cert config failures. Never
retried.

---

## Before / After (per Eric's request)

Shows the error payload a tool produces for each exception, before this
PR vs. after. "Before" = current `main` (exceptions without real HTTP
responses fall through to the generic `@tool` `FatalToolError` catch-all
with `message=str(exc)`).

### No-response transport failures

| Exception | Before — class / message / kind | After — class / message
/ kind |
|---|---|---|
| `httpx.PoolTimeout` | `FatalToolError` — `str(exc)` leaks raw detail —
`TOOL_RUNTIME_FATAL`, not retryable | `NetworkTransportError` — `"HTTP
request timed out before a complete response was received."` —
`NETWORK_TRANSPORT_RUNTIME_TIMEOUT`, **retryable** |
| `httpx.ConnectTimeout` | same as above | same as PoolTimeout —
`TIMEOUT`, retryable |
| `httpx.ConnectError` (refused / DNS) | `FatalToolError` — `str(exc)` |
`NetworkTransportError` — `"HTTP request failed before reaching the
upstream service."` — `UNREACHABLE`, retryable |
| `httpx.RemoteProtocolError` (upstream sent bad HTTP) |
`FatalToolError` — `str(exc)` | `NetworkTransportError` — same message
as ConnectError — `UNREACHABLE`, retryable |
| `httpx.DecodingError` | `FatalToolError` — `str(exc)` |
`NetworkTransportError` — `"HTTP response from upstream could not be
decoded."` — `UNMAPPED`, retryable |
| `httpx.TooManyRedirects` | `FatalToolError` — `str(exc)` |
`NetworkTransportError` — `"HTTP redirect limit exceeded before a final
response was received."` — `UNMAPPED`, **not** retryable |

### Client construction / local env bugs

| Exception | Before | After |
|---|---|---|
| `httpx.UnsupportedProtocol`, `httpx.InvalidURL`,
`httpx.LocalProtocolError` | `FatalToolError` with `message=str(exc)`
(may leak scheme / URL content) | `FatalToolError` — `"Tool constructed
an invalid HTTP request — likely a tool-authoring bug."` —
`TOOL_RUNTIME_FATAL`, not retryable |
| `requests.MissingSchema`, `InvalidURL`, `InvalidHeader`,
`InvalidSchema`, `InvalidProxyURL`, `URLRequired` | same as above | same
as above |
| `requests.SSLError` | `FatalToolError` — `str(exc)` often contains raw
cert chain detail | `FatalToolError` — `"TLS handshake failed — likely a
local certificate or trust configuration issue."` —
`TOOL_RUNTIME_FATAL`, not retryable |

### Real HTTP response errors (UNCHANGED — same behavior)

| Exception | Class | Message | Kind | Retryable |
|---|---|---|---|---|
| `httpx.HTTPStatusError` 404 | `UpstreamError` | `"Upstream HTTP
request failed (Not Found, client error)."` |
`UPSTREAM_RUNTIME_NOT_FOUND` | No |
| `httpx.HTTPStatusError` 429 (w/ Retry-After: 60) |
`UpstreamRateLimitError` | `"Upstream HTTP request failed (Too Many
Requests, client error). Retry after 60 second(s)."` |
`UPSTREAM_RUNTIME_RATE_LIMIT` | Yes |
| `httpx.HTTPStatusError` 500 | `UpstreamError` | `"Upstream HTTP
request failed (Internal Server Error, server error)."` |
`UPSTREAM_RUNTIME_SERVER_ERROR` | Yes |

### What's no longer in the message

- Raw exception `str(exc)` output (which frequently includes the full
URL with query-string tokens, connection pool details, or cert chains)
is **no longer the agent-facing `message`**. It's preserved in
`developer_message` for server-side diagnostics.
- The misleading "Upstream HTTP…" prefix is gone from network-transport
and construction-bug messages. Those messages now honestly describe what
happened on the tool side.
- For 429s without a `Retry-After` header, we still show "Retry after N
seconds." (pre-existing behavior; see follow-up notes).

---

## Companion PRs

-
[ArcadeAI/arcade-mcp#823](https://github.com/ArcadeAI/arcade-mcp/pull/823)
— introduces `NetworkTransportError` in `arcade-core`
- [ArcadeAI/monorepo#911](https://github.com/ArcadeAI/monorepo/pull/911)
— adds the 3 `ErrorKind` constants to the Go engine and Datadog
dashboards
- [ArcadeAI/docs#920](https://github.com/ArcadeAI/docs/pull/920) —
documents the new hierarchy and adapter routing

## Follow-ups (out of scope for this PR)

A short investigation surfaced several pre-existing issues that are
worth fixing separately. A full list is in
`NETWORK_TRANSPORT_ERROR_FOLLOWUPS.md` (shared offline). Summary:

1. `requests.HTTPError` with `response is None` returns `None` from the
adapter; should fall through to the `NetworkTransportError(UNMAPPED)`
fallback instead of becoming a generic `FatalToolError`.
2. `developer_message` can leak URL query strings (and therefore tokens)
since it stores raw `str(exc)`.
3. `_sanitize_uri` does not strip userinfo (credentials in URL path).
4. `_parse_retry_ms` misinterprets epoch-style `x-ratelimit-reset`
headers.
5. 429 responses without `Retry-After` synthesize a fabricated "Retry
after 1 second(s)." suffix.
6. `UPSTREAM_RUNTIME_VALIDATION_ERROR` is defined but never emitted.
7. `UpstreamError` silently accepts out-of-range status codes.
8. `requests.HTTPError` branch re-extracts `request_url` /
`request_method` inconsistently (dead work).

## Test plan

- [x] Existing `libs/tests/sdk/test_httpx_adapter.py` +
`test_graphql_adapter.py` updated; every no-response / construction-bug
test asserts the new class + kind + `can_retry`.
- [x] Full test suite passes locally.
- [x] mypy clean on `arcade-core`, `arcade-tdk`, `arcade-mcp-server`.
- [x] Smoke-tested 21 exception routing cases end-to-end against real
httpx / requests exceptions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes core error classification and retryability for
`httpx`/`requests`/GraphQL transport failures, which can affect tool
retry behavior and telemetry. Risk is mitigated by extensive new/updated
tests covering the new mappings and privacy expectations.
> 
> **Overview**
> **Improves error adapter behavior to be more semantically correct and
privacy-safe.** The HTTP adapter now distinguishes real HTTP responses
(`UpstreamError`/`UpstreamRateLimitError`) from no-response failures
(`NetworkTransportError` with `ErrorKind` + retryability) and from
client construction/local TLS issues (`FatalToolError`).
> 
> **Reduces sensitive data exposure in agent-facing messages.**
Status-based errors now emit standardized messages derived from status
phrase/class, while preserving raw exception detail in
`developer_message`; Google/Microsoft/Slack fallback paths similarly
switch to `unhandled <ExceptionType>` messages and move `str(exc)` into
`developer_message`. GraphQL transport connection/protocol errors are
reclassified from `UpstreamError` (502) to `NetworkTransportError`, and
transport/server messages are standardized.
> 
> Bumps `arcade-tdk` version to `3.8.0` and expands/updates the SDK test
suite to assert new classes, `kind`, `can_retry`, request metadata
extraction, and privacy behavior.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
1041cb1bec4fa3b0bae3e7c6b860b84cf376cf9a. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 20:32:17 -03:00

218 lines
8 KiB
Python

import logging
from datetime import datetime, timezone
from typing import Any
from urllib.parse import urlparse
from arcade_core.errors import (
ToolRuntimeError,
UpstreamError,
UpstreamRateLimitError,
)
logger = logging.getLogger(__name__)
class MicrosoftGraphErrorAdapter:
"""Error adapter for Microsoft Graph SDK (msgraph-sdk)."""
slug = "_microsoft_graph"
def from_exception(self, exc: Exception) -> ToolRuntimeError | None:
"""
Translate a Microsoft Graph SDK exception into a ToolRuntimeError.
"""
# Lazy import kiota abstractions to avoid import errors for toolkits that don't use msgraph-sdk
try:
from kiota_abstractions import api_error
except ImportError:
logger.info(
f"'kiota-abstractions' is not installed in the toolkit's environment, "
f"so the '{self.slug}' adapter was not used to handle the upstream error"
)
return None
# Try API errors first
result = self._handle_api_errors(exc, api_error)
if result:
return result
# Failsafe for any unhandled Microsoft Graph SDK errors that are not mapped above
if (
hasattr(exc, "__module__")
and exc.__module__
and ("msgraph" in exc.__module__ or "kiota" in exc.__module__)
):
logger.warning(
"Unknown Microsoft Graph SDK error encountered: %r. "
"Falling back to generic UpstreamError.",
exc,
exc_info=True,
)
return UpstreamError(
message=f"Upstream Microsoft Graph error: unhandled {exc.__class__.__name__}.",
status_code=500,
developer_message=str(exc),
extra={
"service": self.slug,
"error_type": exc.__class__.__name__,
},
)
# Not a Microsoft Graph SDK error
return None
def _sanitize_uri(self, uri: str) -> str:
"""Strip query params and fragments from URI for privacy."""
parsed = urlparse(uri)
return f"{parsed.scheme}://{parsed.netloc.strip('/')}/{parsed.path.strip('/')}"
def _get_retry_after_milliseconds(self, error: Any) -> int:
"""
Extract retry-after from Microsoft Graph API errors.
Returns milliseconds to wait before retry.
Defaults to 1000ms if not found.
Args:
error: The APIError to parse
Returns:
The number of milliseconds to wait before retry
"""
if hasattr(error, "response") and hasattr(error.response, "headers"):
headers = error.response.headers
retry_after = headers.get("Retry-After", headers.get("retry-after"))
if retry_after:
try:
# If it's a number, it's seconds
if retry_after.isdigit():
return int(retry_after) * 1000
# Otherwise try to parse as date
dt = datetime.strptime(retry_after, "%a, %d %b %Y %H:%M:%S %Z")
return int((dt - datetime.now(timezone.utc)).total_seconds() * 1000)
except Exception:
logger.warning(
f"Failed to parse retry-after header: {retry_after}. Defaulting to 1000ms."
)
return 1000
return 1000
def _extract_error_details(self, error: Any) -> tuple[str, str | None]:
"""
Extract error message and developer details from Microsoft Graph APIError.
Microsoft Graph errors always have this structure:
{
"error": {
"code": "string",
"message": "string",
"innerError": {
"code": "string",
"request-id": "string",
"date": "string"
}
}
}
Args:
error: The APIError to extract details from
Returns:
Tuple of (user_message, developer_message)
"""
message = "Unknown Microsoft Graph error"
code = "UnknownError"
inner_error = None
# Extract error details
if hasattr(error, "error") and error.error:
if hasattr(error.error, "message"):
message = error.error.message or message
if hasattr(error.error, "code"):
code = error.error.code or code
if hasattr(error.error, "inner_error"):
inner_error = error.error.inner_error
user_message = f"Upstream Microsoft Graph API error: {message}"
developer_message = f"Microsoft Graph error code: {code}"
# Add inner error details if present
if inner_error:
inner_error_details = self._format_inner_error_details(inner_error)
if inner_error_details:
developer_message += f" - Inner error: {inner_error_details}"
return user_message, developer_message
def _format_inner_error_details(self, inner_error: Any) -> str:
"""Format inner error details into a readable string."""
inner_details = []
if hasattr(inner_error, "code") and inner_error.code:
inner_details.append(f"code: {inner_error.code}")
if getattr(inner_error, "request-id", None):
inner_details.append(f"request-id: {getattr(inner_error, 'request-id')}")
elif hasattr(inner_error, "request_id") and inner_error.request_id:
inner_details.append(f"request-id: {inner_error.request_id}")
if hasattr(inner_error, "client_request_id") and inner_error.client_request_id:
inner_details.append(f"client-request-id: {inner_error.client_request_id}")
if hasattr(inner_error, "date") and inner_error.date:
inner_details.append(f"date: {inner_error.date}")
return ", ".join(inner_details)
def _map_api_error(self, error: Any) -> ToolRuntimeError | None:
"""Map Microsoft Graph APIError to appropriate ToolRuntimeError."""
status_code = 500 # Default to server error
if hasattr(error, "response") and error.response and hasattr(error.response, "status_code"):
status_code = error.response.status_code
elif hasattr(error, "response_status_code") and isinstance(
getattr(error, "response_status_code", None), int
):
status_code = error.response_status_code
message, developer_message = self._extract_error_details(error)
extra = {
"service": self.slug,
}
# Try to extract request details if available
if (
hasattr(error, "response")
and error.response
and hasattr(error.response, "url")
and error.response.url
):
extra["endpoint"] = self._sanitize_uri(str(error.response.url))
error_code = "UnknownError"
if hasattr(error, "error") and error.error and hasattr(error.error, "code"):
error_code = error.error.code
extra["error_code"] = error_code
# Special case for rate limiting (429) and quota exceeded (503 with specific error codes)
if status_code == 429 or (
status_code == 503 and error_code in ["TooManyRequests", "ServiceUnavailable"]
):
return UpstreamRateLimitError(
retry_after_ms=self._get_retry_after_milliseconds(error),
message=message,
developer_message=developer_message,
extra=extra,
)
return UpstreamError(
message=message,
status_code=status_code,
developer_message=developer_message,
extra=extra,
)
def _handle_api_errors(self, exc: Exception, api_error_module: Any) -> ToolRuntimeError | None:
"""Handle APIError and its subclasses."""
if isinstance(exc, api_error_module.APIError):
return self._map_api_error(exc)
return None