TOO-627: Improve error messages for agents and Datadog (#814)
## Summary - Improve tool call error messages across 4 libraries (arcade-core, arcade-tdk, arcade-mcp-server, arcade-serve) so agents can self-correct and Datadog can facet on structured fields - Guard empty error messages, enrich input validation errors with field-level detail, fix `@tool` decorator fallback formatting, surface `additional_prompt_content` in MCP responses, and add structured log extras for Datadog - Addresses the 3 worst error patterns: generic "Error in tool input deserialization", bare `KeyError` values, and empty `FatalToolError` messages **Linear:** TOO-627 **Plan:** `docs/plans/2026-04-08-improve-error-messages-handoff.md` ## Tasks - [ ] Task 1: Guard empty error messages (arcade-core) - [ ] Task 2: Enrich input validation error messages (arcade-core) - [ ] Task 3: Improve `@tool` decorator error fallback (arcade-tdk) - [ ] Task 4: Fix MCP agent-facing error response (arcade-mcp-server) - [ ] Task 5: Add structured log extras in BaseWorker (arcade-serve) - [ ] Task 6: Add structured log extras in MCP server (arcade-mcp-server) ## Test plan - [ ] Each task has dedicated unit tests verifying the new behavior - [ ] `make test` passes after all tasks - [ ] `make check` (ruff + mypy) passes - [ ] Verify the 3 worst error patterns now produce actionable messages 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Touches cross-library error formatting and logging behavior used in production tool execution paths; while mostly additive/guardrails, it changes agent-visible messages and Datadog log facets, which could impact client expectations and alerting. > > **Overview** > Improves tool-call error handling across core/runtime, MCP transport, worker transport, and the TDK to make agent-visible failures more actionable while *reducing sensitive-data leakage*. > > In `arcade-core`, empty error messages now get placeholders, `ToolOutputFactory.fail*` defaults blank messages, and input validation errors are rewritten as field-level summaries that intentionally omit rejected values (avoiding Pydantic echo of secrets). The `@tool` fallback in `arcade-tdk` no longer surfaces `str(exception)` to agents; it returns exception *type-only* in `message` while preserving full detail in `developer_message`. > > Adds a shared `build_tool_error_log_extra` helper and updates `arcade-serve` + `arcade-mcp-server` to emit consistent structured WARNING logs (`error_*`, `tool_name`, optional toolkit/version) for Datadog, while MCP error responses now append `additional_prompt_content` and force `structuredContent=None` on failures per spec. Includes extensive new tests and bumps package versions (`arcade-core` 4.6.2, `arcade-tdk` 3.6.1, `arcade-mcp-server` 1.19.3, `arcade-serve` 3.2.3). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit e5c7ebcaf56176cfbd8e6d1f2b6295352abd0ec0. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
05682d54fe
commit
1492c80fc5
20 changed files with 1102 additions and 29 deletions
|
|
@ -69,7 +69,10 @@ class ToolkitError(Exception, ABC):
|
|||
The error with the context added to the message.
|
||||
"""
|
||||
prefix = self.create_message_prefix(name)
|
||||
self.message = f"{prefix}{self.message}" # type: ignore[has-type]
|
||||
# Substitute placeholder when message is empty so the prefixed result
|
||||
# always has a non-empty diagnostic body.
|
||||
body = self.message if self.message and self.message.strip() else "(no details provided)" # type: ignore[has-type]
|
||||
self.message = f"{prefix}{body}"
|
||||
if hasattr(self, "developer_message") and self.developer_message: # type: ignore[has-type]
|
||||
self.developer_message = f"{prefix}{self.developer_message}" # type: ignore[has-type]
|
||||
|
||||
|
|
@ -153,6 +156,13 @@ class ToolRuntimeError(ToolError, RuntimeError):
|
|||
Any failure starting from when the tool call begins until the tool call returns.
|
||||
|
||||
Note: This class should typically not be instantiated directly, but rather subclassed.
|
||||
|
||||
Runtime error payload contract:
|
||||
- ``message`` is agent/client-facing. It is returned to tool consumers and can be shown to
|
||||
agents or end users, so it should contain only intentional, safe context.
|
||||
- ``developer_message`` is for server-side diagnostics. It is preserved in structured
|
||||
error payloads and logs for debugging, and may contain more implementation detail than
|
||||
``message``.
|
||||
"""
|
||||
|
||||
kind: ErrorKind = ErrorKind.TOOL_RUNTIME_FATAL
|
||||
|
|
@ -167,6 +177,16 @@ class ToolRuntimeError(ToolError, RuntimeError):
|
|||
*,
|
||||
extra: dict[str, Any] | None = None,
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
message: Safe summary intended for the agent/client. This value may be surfaced
|
||||
directly in tool-call error responses, so do not include secrets, tokens, raw API
|
||||
responses, or other sensitive details unless you explicitly want them exposed.
|
||||
developer_message: Additional server-side debugging context. Use this for verbose
|
||||
diagnostics such as raw upstream errors, stack-oriented clues, or internal
|
||||
identifiers that should not be the primary message shown to the agent/client.
|
||||
extra: Structured metadata to preserve alongside the error.
|
||||
"""
|
||||
super().__init__(message)
|
||||
self.message = message
|
||||
self.developer_message = developer_message # type: ignore[assignment]
|
||||
|
|
|
|||
|
|
@ -101,9 +101,23 @@ class ToolExecutor:
|
|||
inputs = input_model(**kwargs)
|
||||
|
||||
except ValidationError as e:
|
||||
# IMPORTANT: do NOT include err["input"] or str(e) anywhere in the
|
||||
# surfaced message/developer_message. Pydantic's ``str(e)`` and
|
||||
# ``err["input"]`` echo the offending input value verbatim — which
|
||||
# may contain user secrets (passwords, tokens, PII). Both fields
|
||||
# below intentionally carry only field path + reason + Pydantic
|
||||
# error type code, never the rejected value itself.
|
||||
summary = "; ".join(
|
||||
f"{'.'.join(str(loc) for loc in err['loc']) or '<root>'}: {err['msg']}"
|
||||
for err in e.errors()
|
||||
)
|
||||
developer_summary = "; ".join(
|
||||
f"{'.'.join(str(loc) for loc in err['loc']) or '<root>'}[{err['type']}]"
|
||||
for err in e.errors()
|
||||
)
|
||||
raise ToolInputError(
|
||||
message="Error in tool input deserialization",
|
||||
developer_message=str(e),
|
||||
message=f"Invalid input: {summary}",
|
||||
developer_message=f"Pydantic validation failed: {developer_summary}",
|
||||
) from e
|
||||
|
||||
return inputs
|
||||
|
|
|
|||
60
libs/arcade-core/arcade_core/log_extras.py
Normal file
60
libs/arcade-core/arcade_core/log_extras.py
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
"""Shared helper for tool-call-error structured-log extras.
|
||||
|
||||
Field names (``error_*``, ``tool_name``, ``toolkit_version``) are load-bearing
|
||||
for Datadog dashboards — renaming is a breaking change for downstream alerts.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
from arcade_core.schema import ToolCallError
|
||||
|
||||
|
||||
def build_tool_error_log_extra(
|
||||
error: ToolCallError,
|
||||
*,
|
||||
tool_name: str,
|
||||
toolkit_name: str | None = None,
|
||||
toolkit_version: str | None = None,
|
||||
**additional: Any,
|
||||
) -> dict[str, Any]:
|
||||
"""Build the structured ``extra`` dict for a failed-tool-call WARNING log.
|
||||
|
||||
Args:
|
||||
error: The ``ToolCallError``. ``kind`` is normalized via ``.value``
|
||||
so Datadog facets on a stable string, not ``"ErrorKind.X"``.
|
||||
tool_name: Resolved tool identifier — callers should pass the same
|
||||
value OTel spans / metrics use (e.g. ``tool_fqname.name``) so
|
||||
logs correlate across signals.
|
||||
toolkit_name: Optional toolkit identifier.
|
||||
toolkit_version: Optional resolved toolkit version.
|
||||
**additional: Caller-specific extras (e.g. ``execution_id``). Cannot
|
||||
override canonical fields.
|
||||
|
||||
Returns:
|
||||
Dict suitable for ``logger.warning(..., extra=<dict>)``. Optional
|
||||
*kwargs* are omitted when not supplied so Datadog can distinguish
|
||||
"unset" from "set-to-None"; canonical ``error_*`` fields are always
|
||||
present (``None`` when unset on the error).
|
||||
"""
|
||||
kind_value = error.kind.value if hasattr(error.kind, "value") else str(error.kind)
|
||||
|
||||
extra: dict[str, Any] = {
|
||||
"error_kind": kind_value,
|
||||
"error_message": error.message,
|
||||
"error_developer_message": error.developer_message,
|
||||
"error_status_code": error.status_code,
|
||||
"error_can_retry": error.can_retry,
|
||||
"tool_name": tool_name,
|
||||
}
|
||||
if toolkit_name is not None:
|
||||
extra["toolkit_name"] = toolkit_name
|
||||
if toolkit_version is not None:
|
||||
extra["toolkit_version"] = toolkit_version
|
||||
|
||||
for k, v in additional.items():
|
||||
if k in extra:
|
||||
continue
|
||||
extra[k] = v
|
||||
return extra
|
||||
|
|
@ -71,6 +71,8 @@ class ToolOutputFactory:
|
|||
status_code: int | None = None,
|
||||
extra: dict[str, Any] | None = None,
|
||||
) -> ToolCallOutput:
|
||||
if not message or not message.strip():
|
||||
message = "Unspecified error during tool execution"
|
||||
return ToolCallOutput(
|
||||
error=ToolCallError(
|
||||
message=message,
|
||||
|
|
@ -103,7 +105,8 @@ class ToolOutputFactory:
|
|||
DEPRECATED: Use ToolOutputFactory.fail instead.
|
||||
This method will be removed in version 3.0.0
|
||||
"""
|
||||
|
||||
if not message or not message.strip():
|
||||
message = "Unspecified error during tool execution"
|
||||
return ToolCallOutput(
|
||||
error=ToolCallError(
|
||||
message=message,
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
[project]
|
||||
name = "arcade-core"
|
||||
version = "4.6.1"
|
||||
version = "4.6.2"
|
||||
description = "Arcade Core - Core library for Arcade platform"
|
||||
readme = "README.md"
|
||||
license = { text = "MIT" }
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@
|
|||
|
||||
import logging
|
||||
import sys
|
||||
from typing import ClassVar
|
||||
|
||||
from loguru import logger
|
||||
|
||||
|
|
@ -11,15 +12,80 @@ class LoguruInterceptHandler(logging.Handler):
|
|||
|
||||
This handler bridges the standard Python logging module with Loguru,
|
||||
ensuring all logs (from both systems) use the same formatting.
|
||||
Preserves custom 'extra' fields (used for structured logging).
|
||||
"""
|
||||
|
||||
# Standard fields populated on every ``logging.LogRecord`` by the stdlib
|
||||
# logging module itself. They are NOT custom extras passed via
|
||||
# ``logger.warning(..., extra={...})`` and forwarding them to loguru would
|
||||
# both clobber loguru's own keys and pollute the structured log payload
|
||||
# with redundant data.
|
||||
#
|
||||
# The set is the union of:
|
||||
# * Attributes assigned in ``logging.LogRecord.__init__`` (CPython source:
|
||||
# Lib/logging/__init__.py — ``LogRecord.__init__``). This covers
|
||||
# ``name, msg, args, levelname, levelno, pathname, filename, module,
|
||||
# exc_info, exc_text, stack_info, lineno, funcName, created, msecs,
|
||||
# relativeCreated, thread, threadName, processName, process``.
|
||||
# * ``taskName`` — added in Python 3.12 by ``LogRecord.__init__`` for
|
||||
# asyncio task identification. Included unconditionally so the set
|
||||
# stays correct on 3.10-3.11 (key just won't be present at runtime).
|
||||
# * ``message`` — set lazily by ``LogRecord.getMessage()``; appears on
|
||||
# ``record.__dict__`` after formatting.
|
||||
# * ``asctime`` — set lazily by formatters that reference ``%(asctime)s``.
|
||||
#
|
||||
# Anything else in ``record.__dict__`` is, by stdlib convention, a custom
|
||||
# extra passed via the ``extra={...}`` kwarg, and IS forwarded to loguru.
|
||||
_STANDARD_FIELDS: ClassVar[set[str]] = {
|
||||
"name",
|
||||
"msg",
|
||||
"args",
|
||||
"created",
|
||||
"msecs",
|
||||
"levelname",
|
||||
"levelno",
|
||||
"pathname",
|
||||
"filename",
|
||||
"module",
|
||||
"exc_info",
|
||||
"exc_text",
|
||||
"stack_info",
|
||||
"lineno",
|
||||
"funcName",
|
||||
"relativeCreated",
|
||||
"thread",
|
||||
"threadName",
|
||||
"processName",
|
||||
"process",
|
||||
"taskName",
|
||||
"message",
|
||||
"asctime",
|
||||
}
|
||||
|
||||
def emit(self, record: logging.LogRecord) -> None:
|
||||
# Resolve the loguru level: prefer the named level (DEBUG/INFO/WARNING/
|
||||
# ERROR/CRITICAL — all built into loguru), and for custom or otherwise
|
||||
# unknown levels fall back to the integer ``levelno``. We pass an int
|
||||
# (not ``str(levelno)``) because loguru only resolves stringified
|
||||
# levels via name lookup — passing ``"42"`` raises ValueError, while
|
||||
# passing the integer ``42`` is accepted directly per loguru's API.
|
||||
level: str | int
|
||||
try:
|
||||
level = logger.level(record.levelname).name
|
||||
except ValueError:
|
||||
level = str(record.levelno)
|
||||
level = record.levelno
|
||||
|
||||
logger.opt(exception=record.exc_info).log(level, record.getMessage())
|
||||
# Extract custom extra fields from the log record.
|
||||
# These are fields added via logger.warning(..., extra={...}).
|
||||
# By preserving them and passing to loguru via bind(), they become
|
||||
# part of the log record's extra dict, enabling Datadog faceting.
|
||||
extras = {k: v for k, v in record.__dict__.items() if k not in self._STANDARD_FIELDS}
|
||||
|
||||
# Bind extras to loguru context, then log the message.
|
||||
# logger.bind() merges extras into the record["extra"] dict,
|
||||
# making them available to sinks (e.g., Datadog, ELK) that support
|
||||
# structured logging.
|
||||
logger.bind(**extras).opt(exception=record.exc_info).log(level, record.getMessage())
|
||||
|
||||
|
||||
def intercept_standard_logging() -> None:
|
||||
|
|
|
|||
|
|
@ -22,8 +22,9 @@ from typing import Any, Callable, cast
|
|||
from arcade_core.auth_tokens import get_valid_access_token
|
||||
from arcade_core.catalog import MaterializedTool, ToolCatalog
|
||||
from arcade_core.executor import ToolExecutor
|
||||
from arcade_core.log_extras import build_tool_error_log_extra
|
||||
from arcade_core.network.org_transport import build_org_scoped_async_http_client
|
||||
from arcade_core.schema import ToolAuthorizationContext, ToolContext
|
||||
from arcade_core.schema import ToolAuthorizationContext, ToolCallError, ToolContext
|
||||
from arcade_core.schema import ToolAuthRequirement as CoreToolAuthRequirement
|
||||
from arcadepy import ArcadeError, AsyncArcade
|
||||
from arcadepy.types.auth_authorize_params import AuthRequirement, AuthRequirementOauth2
|
||||
|
|
@ -927,10 +928,21 @@ class MCPServer:
|
|||
),
|
||||
)
|
||||
else:
|
||||
error = result.error or "Error calling tool"
|
||||
content = convert_to_mcp_content(str(error))
|
||||
error = result.error
|
||||
if error:
|
||||
error_text = error.message
|
||||
if error.additional_prompt_content:
|
||||
error_text += f"\n\n{error.additional_prompt_content}"
|
||||
content = convert_to_mcp_content(error_text)
|
||||
self._log_tool_call_error(tool_name, error)
|
||||
else:
|
||||
content = convert_to_mcp_content("Error calling tool")
|
||||
|
||||
self._tracker.track_tool_call(False, "error during tool execution")
|
||||
# NOTE: structuredContent must be None on error responses.
|
||||
# Per the MCP spec, structuredContent MUST validate against outputSchema —
|
||||
# but error payloads will violate a tool's declared output schema.
|
||||
# The error message is conveyed via ``content`` (TextContent) instead.
|
||||
return JSONRPCResponse(
|
||||
id=message.id,
|
||||
result=CallToolResult(
|
||||
|
|
@ -980,6 +992,13 @@ class MCPServer:
|
|||
},
|
||||
)
|
||||
|
||||
def _log_tool_call_error(self, tool_name: str, error: ToolCallError) -> None:
|
||||
"""Emit a structured WARNING log for a failed tool call."""
|
||||
logger.warning(
|
||||
f"Tool {tool_name} error: {error.message}",
|
||||
extra=build_tool_error_log_extra(error, tool_name=tool_name),
|
||||
)
|
||||
|
||||
def _create_error_response(
|
||||
self, message: CallToolRequest, tool_response: dict[str, Any]
|
||||
) -> JSONRPCResponse[CallToolResult]:
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ build-backend = "hatchling.build"
|
|||
|
||||
[project]
|
||||
name = "arcade-mcp-server"
|
||||
version = "1.19.2"
|
||||
version = "1.19.3"
|
||||
description = "Model Context Protocol (MCP) server framework for Arcade.dev"
|
||||
readme = "README.md"
|
||||
authors = [{ name = "Arcade.dev" }]
|
||||
|
|
|
|||
|
|
@ -6,6 +6,7 @@ from typing import Any, Callable, ClassVar
|
|||
|
||||
from arcade_core.catalog import ToolCatalog, Toolkit
|
||||
from arcade_core.executor import ToolExecutor
|
||||
from arcade_core.log_extras import build_tool_error_log_extra
|
||||
from arcade_core.schema import (
|
||||
ToolCallRequest,
|
||||
ToolCallResponse,
|
||||
|
|
@ -109,8 +110,9 @@ class BaseWorker(Worker):
|
|||
try:
|
||||
materialized_tool = self.catalog.get_tool(tool_fqname)
|
||||
except KeyError:
|
||||
# Use resolved fqname components to match OTel span / metric labels.
|
||||
raise ValueError(
|
||||
f"Tool {tool_fqname} not found in catalog with toolkit version {tool_request.tool.version}."
|
||||
f"Tool {tool_fqname.name} not found in catalog with toolkit version {tool_fqname.toolkit_version}."
|
||||
)
|
||||
|
||||
start_time = time.time()
|
||||
|
|
@ -127,7 +129,8 @@ class BaseWorker(Worker):
|
|||
)
|
||||
execution_id = tool_request.execution_id or ""
|
||||
logger.info(
|
||||
f"{execution_id} | Calling tool: {tool_fqname} version: {tool_request.tool.version}"
|
||||
f"{execution_id} | Calling tool: {tool_fqname.name} "
|
||||
f"version: {tool_fqname.toolkit_version}"
|
||||
)
|
||||
logger.debug(f"{execution_id} | Tool inputs: {tool_request.inputs}")
|
||||
|
||||
|
|
@ -151,21 +154,31 @@ class BaseWorker(Worker):
|
|||
duration_ms = (end_time - start_time) * 1000 # Convert to milliseconds
|
||||
|
||||
if output.error:
|
||||
logger.warning(
|
||||
f"{execution_id} | Tool {tool_fqname} version {tool_request.tool.version} failed"
|
||||
log_extra = build_tool_error_log_extra(
|
||||
output.error,
|
||||
tool_name=str(tool_fqname.name),
|
||||
toolkit_name=str(tool_fqname.toolkit_name),
|
||||
toolkit_version=str(tool_fqname.toolkit_version),
|
||||
execution_id=execution_id,
|
||||
)
|
||||
logger.warning(f"{execution_id} | Tool error: {output.error.message}")
|
||||
logger.warning(
|
||||
f"{execution_id} | Tool developer message: {output.error.developer_message}"
|
||||
f"{execution_id} | Tool {tool_fqname.name} version {tool_fqname.toolkit_version} failed: {output.error.message}",
|
||||
extra=log_extra,
|
||||
)
|
||||
if output.error.developer_message:
|
||||
logger.warning(
|
||||
f"{execution_id} | Developer message: {output.error.developer_message}",
|
||||
)
|
||||
logger.debug(
|
||||
f"{execution_id} | duration: {duration_ms}ms | Tool output: {output.value}"
|
||||
)
|
||||
if output.error.stacktrace:
|
||||
logger.debug(f"{execution_id} | Tool traceback: {output.error.stacktrace}")
|
||||
else:
|
||||
# Match the failure-path identifiers for log correlation.
|
||||
logger.info(
|
||||
f"{execution_id} | Tool {tool_fqname} version {tool_request.tool.version} success"
|
||||
f"{execution_id} | Tool {tool_fqname.name} "
|
||||
f"version {tool_fqname.toolkit_version} success"
|
||||
)
|
||||
logger.debug(
|
||||
f"{execution_id} | duration: {duration_ms}ms | Tool output: {output.value}"
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
[project]
|
||||
name = "arcade-serve"
|
||||
version = "3.2.2"
|
||||
version = "3.2.3"
|
||||
description = "Arcade Serve - Serving infrastructure for Arcade tools and workers"
|
||||
readme = "README.md"
|
||||
license = {text = "MIT"}
|
||||
|
|
|
|||
|
|
@ -88,6 +88,22 @@ def _raise_as_arcade_error(
|
|||
|
||||
Raises:
|
||||
ToolRuntimeError or some subclass thereof
|
||||
|
||||
Data-leak policy (strict):
|
||||
The fallback path below NEVER places ``str(exception)`` content into
|
||||
the agent-facing ``message`` field. Tool authors commonly embed user
|
||||
input in raised exception messages
|
||||
(e.g. ``raise ValueError(f"Bad password: {password}")``) and the
|
||||
framework cannot reliably distinguish secrets/PII from safe context.
|
||||
|
||||
Resolution:
|
||||
* ``message`` (agent-facing) — exception **type only**, no
|
||||
``str(exception)`` content.
|
||||
* ``developer_message`` (server-side logs only — ``error_developer_message``
|
||||
Datadog facet, never returned to the client) — carries the
|
||||
full ``f"{ExceptionType}: {str(exception)}"`` so on-call engineers
|
||||
retain debugging context. Authorized log access is the security
|
||||
boundary, not the agent transport.
|
||||
"""
|
||||
for adapter in adapter_chain:
|
||||
try:
|
||||
|
|
@ -100,9 +116,17 @@ def _raise_as_arcade_error(
|
|||
if isinstance(mapped, ToolRuntimeError):
|
||||
raise mapped from exception
|
||||
|
||||
exc_type = type(exception).__name__
|
||||
exc_str = str(exception).strip()
|
||||
# Agent-facing: type only — never str(exception). See "Data-leak policy".
|
||||
message = f"An unhandled {exc_type} was raised by the tool."
|
||||
# Server-side debugging: full content goes to logs only.
|
||||
developer_message = (
|
||||
f"{exc_type}: {exc_str}" if exc_str else f"{exc_type} (no exception message)"
|
||||
)
|
||||
raise FatalToolError(
|
||||
message=f"{exception!s}",
|
||||
developer_message=f"{exception!s}",
|
||||
message=message,
|
||||
developer_message=developer_message,
|
||||
) from exception
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
[project]
|
||||
name = "arcade-tdk"
|
||||
version = "3.6.0"
|
||||
version = "3.6.1"
|
||||
description = "Arcade TDK - Toolkit Development Kit for building Arcade tools"
|
||||
readme = "README.md"
|
||||
license = { text = "MIT" }
|
||||
|
|
|
|||
140
libs/tests/arcade_mcp_server/test_logging_utils.py
Normal file
140
libs/tests/arcade_mcp_server/test_logging_utils.py
Normal file
|
|
@ -0,0 +1,140 @@
|
|||
"""Integration tests for ``LoguruInterceptHandler``.
|
||||
|
||||
Exercises the full stdlib-logging → handler → loguru → sink path with a real
|
||||
loguru sink (no mocks). The contract under test:
|
||||
|
||||
* ``extra={"foo": "bar"}`` from a stdlib ``logger.warning`` call must reach
|
||||
the loguru sink as ``record["extra"]["foo"] == "bar"`` so downstream
|
||||
structured-log shippers (Datadog, ELK, etc.) can facet on it.
|
||||
* Stdlib-internal LogRecord fields (``levelname``, ``pathname``, ``thread``,
|
||||
``message``, ``asctime``, …) must NOT be forwarded — that would both
|
||||
clobber loguru's own keys and bloat the payload.
|
||||
* Exception info must be preserved and rendered by loguru.
|
||||
* The renderable message body (``record.getMessage()``) must reach the sink
|
||||
intact.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
import pytest
|
||||
from arcade_mcp_server.logging_utils import LoguruInterceptHandler
|
||||
from loguru import logger as loguru_logger
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def loguru_sink():
|
||||
"""Capture every loguru record into a list and tear down at end."""
|
||||
captured: list[dict] = []
|
||||
|
||||
def sink(message): # message is loguru's "Message" wrapping a record dict
|
||||
captured.append(dict(message.record))
|
||||
|
||||
sink_id = loguru_logger.add(sink, level="DEBUG", format="{message}")
|
||||
yield captured
|
||||
loguru_logger.remove(sink_id)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def stdlib_logger_with_intercept():
|
||||
"""A real stdlib logger wired through LoguruInterceptHandler — no mocks."""
|
||||
log = logging.getLogger("arcade_mcp_server.tests.intercept_integration")
|
||||
log.handlers.clear()
|
||||
log.addHandler(LoguruInterceptHandler())
|
||||
log.setLevel(logging.DEBUG)
|
||||
log.propagate = False
|
||||
yield log
|
||||
log.handlers.clear()
|
||||
|
||||
|
||||
def test_extra_fields_reach_loguru_sink(stdlib_logger_with_intercept, loguru_sink):
|
||||
"""The whole point of the handler: extras must survive the bridge."""
|
||||
stdlib_logger_with_intercept.warning(
|
||||
"tool failure",
|
||||
extra={
|
||||
"error_kind": "TOOL_RUNTIME_FATAL",
|
||||
"tool_name": "MyToolkit.MyTool",
|
||||
"error_status_code": 500,
|
||||
},
|
||||
)
|
||||
|
||||
assert len(loguru_sink) == 1
|
||||
extra = loguru_sink[0]["extra"]
|
||||
assert extra["error_kind"] == "TOOL_RUNTIME_FATAL"
|
||||
assert extra["tool_name"] == "MyToolkit.MyTool"
|
||||
assert extra["error_status_code"] == 500
|
||||
|
||||
|
||||
def test_stdlib_internal_fields_not_forwarded(stdlib_logger_with_intercept, loguru_sink):
|
||||
"""``record.__dict__`` carries dozens of stdlib-internal attributes
|
||||
(``levelname``, ``pathname``, ``thread``, ``module``, …). Forwarding any
|
||||
of them via ``logger.bind`` would clobber loguru's own equivalents and
|
||||
pollute the structured payload."""
|
||||
stdlib_logger_with_intercept.warning("hello", extra={"my_extra": "kept"})
|
||||
|
||||
assert len(loguru_sink) == 1
|
||||
extra = loguru_sink[0]["extra"]
|
||||
# The legitimate user-supplied extra survives.
|
||||
assert extra.get("my_extra") == "kept"
|
||||
# None of the stdlib internals leak into the bound extras.
|
||||
for forbidden in (
|
||||
"levelname",
|
||||
"levelno",
|
||||
"pathname",
|
||||
"filename",
|
||||
"module",
|
||||
"lineno",
|
||||
"funcName",
|
||||
"thread",
|
||||
"threadName",
|
||||
"processName",
|
||||
"process",
|
||||
"msg",
|
||||
"args",
|
||||
"exc_info",
|
||||
"exc_text",
|
||||
"stack_info",
|
||||
"created",
|
||||
"msecs",
|
||||
"relativeCreated",
|
||||
"name",
|
||||
):
|
||||
assert forbidden not in extra, f"stdlib field {forbidden!r} leaked into loguru extra"
|
||||
|
||||
|
||||
def test_message_body_preserved(stdlib_logger_with_intercept, loguru_sink):
|
||||
"""``record.getMessage()`` (which performs ``%`` interpolation of args)
|
||||
must reach the sink — losing the renderable body would defeat the bridge."""
|
||||
stdlib_logger_with_intercept.warning("user %s did %s", "alice", "logout")
|
||||
|
||||
assert loguru_sink[0]["message"] == "user alice did logout"
|
||||
|
||||
|
||||
def test_exception_info_preserved(stdlib_logger_with_intercept, loguru_sink):
|
||||
"""``exc_info`` from ``logger.exception(...)`` must propagate so loguru
|
||||
can render the traceback."""
|
||||
try:
|
||||
raise ValueError("a deliberately raised test exception")
|
||||
except ValueError:
|
||||
stdlib_logger_with_intercept.exception("handled exception in test")
|
||||
|
||||
assert len(loguru_sink) == 1
|
||||
exc = loguru_sink[0]["exception"]
|
||||
assert exc is not None
|
||||
# ValueError is a builtin so loguru's exception record carries the type.
|
||||
assert exc.type is ValueError
|
||||
assert "deliberately raised" in str(exc.value)
|
||||
|
||||
|
||||
def test_unknown_level_falls_back_to_numeric(stdlib_logger_with_intercept, loguru_sink):
|
||||
"""A custom numeric level not registered with loguru must still be
|
||||
accepted (handler falls back to ``str(record.levelno)`` when
|
||||
``logger.level(name)`` raises ValueError)."""
|
||||
custom_level_no = 42
|
||||
logging.addLevelName(custom_level_no, "CUSTOM_TEST_LEVEL")
|
||||
stdlib_logger_with_intercept.log(custom_level_no, "custom level emit")
|
||||
|
||||
# The fallback path goes through ``logger.log(str(record.levelno), ...)``,
|
||||
# which loguru accepts as a numeric level. The record reaches the sink.
|
||||
assert any(r["message"] == "custom level emit" for r in loguru_sink)
|
||||
|
|
@ -4,16 +4,18 @@ import asyncio
|
|||
import contextlib
|
||||
import json
|
||||
from typing import Annotated
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
from unittest.mock import AsyncMock, Mock, patch
|
||||
|
||||
import pytest
|
||||
from arcade_core.auth import OAuth2
|
||||
from arcade_core.catalog import MaterializedTool, ToolMeta, create_func_models
|
||||
from arcade_core.errors import ToolRuntimeError
|
||||
from arcade_core.errors import ErrorKind, ToolRuntimeError
|
||||
from arcade_core.schema import (
|
||||
InputParameter,
|
||||
OAuth2Requirement,
|
||||
ToolAuthRequirement,
|
||||
ToolCallError,
|
||||
ToolCallOutput,
|
||||
ToolContext,
|
||||
ToolDefinition,
|
||||
ToolInput,
|
||||
|
|
@ -1799,3 +1801,188 @@ class TestToolMetaExtensionEdgeCases:
|
|||
assert "skipped: tool not found" in caplog.text
|
||||
finally:
|
||||
await server.stop()
|
||||
|
||||
|
||||
class TestToolErrorResponse:
|
||||
"""Test that tool error responses surface structured error data to agents."""
|
||||
|
||||
def _make_error_output(
|
||||
self,
|
||||
message: str = "Something went wrong",
|
||||
developer_message: str | None = None,
|
||||
additional_prompt_content: str | None = None,
|
||||
kind: ErrorKind = ErrorKind.TOOL_RUNTIME_FATAL,
|
||||
) -> ToolCallOutput:
|
||||
return ToolCallOutput(
|
||||
error=ToolCallError(
|
||||
message=message,
|
||||
developer_message=developer_message,
|
||||
additional_prompt_content=additional_prompt_content,
|
||||
kind=kind,
|
||||
)
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tool_error_content_includes_additional_prompt(self, mcp_server):
|
||||
error_output = self._make_error_output(
|
||||
message="Spreadsheet not found",
|
||||
additional_prompt_content="Available options: X, Y",
|
||||
)
|
||||
message = CallToolRequest(
|
||||
jsonrpc="2.0",
|
||||
id=1,
|
||||
method="tools/call",
|
||||
params={"name": "TestToolkit.test_tool", "arguments": {"text": "test"}},
|
||||
)
|
||||
|
||||
with patch(
|
||||
"arcade_mcp_server.server.ToolExecutor.run",
|
||||
new_callable=AsyncMock,
|
||||
return_value=error_output,
|
||||
):
|
||||
response = await mcp_server._handle_call_tool(message)
|
||||
|
||||
assert response.result.isError is True
|
||||
text = response.result.content[0].text
|
||||
assert "Available options: X, Y" in text
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tool_error_structured_content_is_none(self, mcp_server):
|
||||
"""Per MCP spec, structuredContent must be None on error responses so
|
||||
consumers don't attempt to validate an error payload against the tool's
|
||||
declared outputSchema. Error details are conveyed via content text."""
|
||||
error_output = self._make_error_output(message="fail")
|
||||
message = CallToolRequest(
|
||||
jsonrpc="2.0",
|
||||
id=1,
|
||||
method="tools/call",
|
||||
params={"name": "TestToolkit.test_tool", "arguments": {"text": "test"}},
|
||||
)
|
||||
|
||||
with patch(
|
||||
"arcade_mcp_server.server.ToolExecutor.run",
|
||||
new_callable=AsyncMock,
|
||||
return_value=error_output,
|
||||
):
|
||||
response = await mcp_server._handle_call_tool(message)
|
||||
|
||||
assert response.result.isError is True
|
||||
assert response.result.structuredContent is None
|
||||
assert "fail" in response.result.content[0].text
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tool_error_content_no_pydantic_repr(self, mcp_server):
|
||||
error_output = self._make_error_output(message="fail")
|
||||
message = CallToolRequest(
|
||||
jsonrpc="2.0",
|
||||
id=1,
|
||||
method="tools/call",
|
||||
params={"name": "TestToolkit.test_tool", "arguments": {"text": "test"}},
|
||||
)
|
||||
|
||||
with patch(
|
||||
"arcade_mcp_server.server.ToolExecutor.run",
|
||||
new_callable=AsyncMock,
|
||||
return_value=error_output,
|
||||
):
|
||||
response = await mcp_server._handle_call_tool(message)
|
||||
|
||||
text = response.result.content[0].text
|
||||
assert "kind=<ErrorKind" not in text
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tool_error_developer_message_not_in_client_content(self, mcp_server):
|
||||
"""developer_message can contain stack frames/file paths/sensitive data
|
||||
and must never appear in agent-facing content. It is logged structurally
|
||||
for Datadog instead."""
|
||||
error_output = self._make_error_output(
|
||||
message="Something failed",
|
||||
developer_message="Traceback: /home/user/secret/foo.py line 42",
|
||||
)
|
||||
message = CallToolRequest(
|
||||
jsonrpc="2.0",
|
||||
id=1,
|
||||
method="tools/call",
|
||||
params={"name": "TestToolkit.test_tool", "arguments": {"text": "test"}},
|
||||
)
|
||||
|
||||
with patch(
|
||||
"arcade_mcp_server.server.ToolExecutor.run",
|
||||
new_callable=AsyncMock,
|
||||
return_value=error_output,
|
||||
):
|
||||
response = await mcp_server._handle_call_tool(message)
|
||||
|
||||
text = response.result.content[0].text
|
||||
assert "Traceback" not in text
|
||||
assert "/home/user/secret" not in text
|
||||
assert "Details:" not in text
|
||||
|
||||
|
||||
class TestLogToolCallError:
|
||||
"""Direct unit tests for MCPServer._log_tool_call_error.
|
||||
|
||||
The structured ``extra`` dict is the contract Datadog facets on; tests here
|
||||
lock the field names and value sources so accidental renames can't silently
|
||||
break ops dashboards. Tested in isolation (no full request flow needed)."""
|
||||
|
||||
def test_extra_fields_match_contract(self, mcp_server, caplog):
|
||||
import logging
|
||||
|
||||
err = ToolCallError(
|
||||
message="Spreadsheet not found",
|
||||
developer_message="dev: ssn=123",
|
||||
kind=ErrorKind.TOOL_RUNTIME_FATAL,
|
||||
status_code=404,
|
||||
can_retry=False,
|
||||
)
|
||||
with caplog.at_level(logging.WARNING, logger="arcade.mcp"):
|
||||
mcp_server._log_tool_call_error("MyToolkit.MyTool", err)
|
||||
|
||||
record = next(r for r in caplog.records if "MyToolkit.MyTool error" in r.getMessage())
|
||||
# Renderable text is the human-readable summary.
|
||||
assert "Spreadsheet not found" in record.getMessage()
|
||||
# Structured fields — the Datadog contract.
|
||||
assert record.tool_name == "MyToolkit.MyTool"
|
||||
assert record.error_kind == "TOOL_RUNTIME_FATAL"
|
||||
assert record.error_message == "Spreadsheet not found"
|
||||
assert record.error_developer_message == "dev: ssn=123"
|
||||
assert record.error_status_code == 404
|
||||
assert record.error_can_retry is False
|
||||
|
||||
def test_kind_value_used_when_available(self, mcp_server, caplog):
|
||||
import logging
|
||||
|
||||
err = ToolCallError(message="x", kind=ErrorKind.UPSTREAM_RUNTIME_RATE_LIMIT)
|
||||
with caplog.at_level(logging.WARNING, logger="arcade.mcp"):
|
||||
mcp_server._log_tool_call_error("t", err)
|
||||
|
||||
record = next(r for r in caplog.records if "t error" in r.getMessage())
|
||||
# Enum's .value (the string code) is what Datadog facets on, NOT repr().
|
||||
assert record.error_kind == "UPSTREAM_RUNTIME_RATE_LIMIT"
|
||||
assert "ErrorKind." not in record.error_kind
|
||||
|
||||
def test_emits_warning_level(self, mcp_server, caplog):
|
||||
import logging
|
||||
|
||||
err = ToolCallError(message="boom", kind=ErrorKind.TOOL_RUNTIME_FATAL)
|
||||
with caplog.at_level(logging.DEBUG, logger="arcade.mcp"):
|
||||
mcp_server._log_tool_call_error("t", err)
|
||||
|
||||
record = next(r for r in caplog.records if "t error" in r.getMessage())
|
||||
# WARNING (30) is the load-bearing level for ops alerting.
|
||||
assert record.levelno == logging.WARNING
|
||||
|
||||
def test_optional_fields_propagate_none(self, mcp_server, caplog):
|
||||
"""status_code / developer_message default to None and must propagate
|
||||
as None (not be dropped, not be coerced) so Datadog can distinguish
|
||||
'unset' from 'set to falsy'."""
|
||||
import logging
|
||||
|
||||
err = ToolCallError(message="x", kind=ErrorKind.TOOL_RUNTIME_FATAL)
|
||||
with caplog.at_level(logging.WARNING, logger="arcade.mcp"):
|
||||
mcp_server._log_tool_call_error("t", err)
|
||||
|
||||
record = next(r for r in caplog.records if "t error" in r.getMessage())
|
||||
assert record.error_developer_message is None
|
||||
assert record.error_status_code is None
|
||||
|
|
|
|||
47
libs/tests/core/test_errors.py
Normal file
47
libs/tests/core/test_errors.py
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
"""Tests for arcade_core.errors.
|
||||
|
||||
Covers the empty-message guard in ``ToolkitError.with_context()`` — without it,
|
||||
``raise FatalToolError("")`` produces prefixed text like ``"...tool 'foo': "``
|
||||
that carries no diagnostic payload in logs/agent output.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from arcade_core.errors import FatalToolError, RetryableToolError, ToolkitLoadError
|
||||
|
||||
|
||||
@pytest.mark.parametrize("empty_message", ["", " ", "\t", "\n \n"])
|
||||
def test_with_context_empty_message_substitutes_placeholder(empty_message):
|
||||
err = FatalToolError(empty_message).with_context("my_tool")
|
||||
# Prefix is preserved — kind, error type and tool name are still in the message.
|
||||
assert "[TOOL_RUNTIME_FATAL]" in err.message
|
||||
assert "my_tool" in err.message
|
||||
# And the empty body is replaced with a recognizable placeholder so the
|
||||
# message ends with diagnostic content rather than ``": "``.
|
||||
assert "(no details provided)" in err.message
|
||||
assert not err.message.endswith(": ")
|
||||
|
||||
|
||||
def test_with_context_nonempty_message_unchanged():
|
||||
err = FatalToolError("Spreadsheet not found").with_context("get_sheet")
|
||||
assert err.message.endswith(": Spreadsheet not found")
|
||||
assert "(no details provided)" not in err.message
|
||||
|
||||
|
||||
def test_with_context_developer_message_with_empty_message_still_works():
|
||||
# A non-empty developer_message is preserved alongside the placeholder body.
|
||||
err = FatalToolError("", developer_message="trace: foo.py:42").with_context("my_tool")
|
||||
assert "(no details provided)" in err.message
|
||||
assert err.developer_message is not None
|
||||
assert err.developer_message.endswith(": trace: foo.py:42")
|
||||
|
||||
|
||||
def test_with_context_retryable_error_empty_message():
|
||||
err = RetryableToolError(" ").with_context("flaky_tool")
|
||||
assert "[TOOL_RUNTIME_RETRY]" in err.message
|
||||
assert "(no details provided)" in err.message
|
||||
|
||||
|
||||
def test_with_context_toolkit_load_error_empty_message():
|
||||
err = ToolkitLoadError("").with_context("broken_toolkit")
|
||||
assert "broken_toolkit" in err.message
|
||||
assert "(no details provided)" in err.message
|
||||
|
|
@ -185,9 +185,15 @@ for tool_func in tools:
|
|||
{},
|
||||
ToolCallOutput(
|
||||
error=ToolCallError(
|
||||
message="[TOOL_RUNTIME_FATAL] FatalToolError during execution of tool 'unexpected_error_tool': test",
|
||||
message=(
|
||||
"[TOOL_RUNTIME_FATAL] FatalToolError during execution of tool "
|
||||
"'unexpected_error_tool': An unhandled RuntimeError was raised by the tool."
|
||||
),
|
||||
kind=ErrorKind.TOOL_RUNTIME_FATAL,
|
||||
developer_message="[TOOL_RUNTIME_FATAL] FatalToolError during execution of tool 'unexpected_error_tool': test",
|
||||
developer_message=(
|
||||
"[TOOL_RUNTIME_FATAL] FatalToolError during execution of tool "
|
||||
"'unexpected_error_tool': RuntimeError: test"
|
||||
),
|
||||
can_retry=False,
|
||||
status_code=500,
|
||||
)
|
||||
|
|
@ -198,10 +204,10 @@ for tool_func in tools:
|
|||
{"inp": {"test": "test"}}, # takes in a string not a dict
|
||||
ToolCallOutput(
|
||||
error=ToolCallError(
|
||||
message="[TOOL_RUNTIME_BAD_INPUT_VALUE] ToolInputError during execution of tool 'simple_tool': Error in tool input deserialization",
|
||||
message="[TOOL_RUNTIME_BAD_INPUT_VALUE] ToolInputError during execution of tool 'simple_tool': Invalid input: inp:",
|
||||
kind=ErrorKind.TOOL_RUNTIME_BAD_INPUT_VALUE,
|
||||
status_code=400,
|
||||
developer_message=None, # can't gaurantee this will be the same
|
||||
developer_message=None, # can't guarantee this will be the same
|
||||
)
|
||||
),
|
||||
),
|
||||
|
|
@ -323,7 +329,12 @@ async def test_tool_executor(tool_func, inputs, expected_output):
|
|||
|
||||
|
||||
def check_output_error(output_error: ToolCallError, expected_error: ToolCallError):
|
||||
assert output_error.message == expected_error.message, "message mismatch"
|
||||
if "Invalid input:" in expected_error.message:
|
||||
assert output_error.message.startswith(
|
||||
expected_error.message
|
||||
), f"message mismatch: {output_error.message!r} does not start with {expected_error.message!r}"
|
||||
else:
|
||||
assert output_error.message == expected_error.message, "message mismatch"
|
||||
assert output_error.kind == expected_error.kind, "kind mismatch"
|
||||
if expected_error.developer_message:
|
||||
assert (
|
||||
|
|
@ -357,3 +368,70 @@ def check_output(output: ToolCallOutput, expected_output: ToolCallOutput):
|
|||
assert output_log.message == expected_log.message
|
||||
assert output_log.level == expected_log.level
|
||||
assert output_log.subtype == expected_log.subtype
|
||||
|
||||
|
||||
@tool
|
||||
def multi_field_tool(
|
||||
name: Annotated[str, "a name"],
|
||||
age: Annotated[int, "an age"],
|
||||
) -> Annotated[str, "output"]:
|
||||
"""Tool with multiple required fields"""
|
||||
return f"{name} is {age}"
|
||||
|
||||
|
||||
catalog.add_tool(multi_field_tool, "MultiFieldToolkit")
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_multiple_bad_fields_in_input_error():
|
||||
tool_definition = catalog.find_tool_by_func(multi_field_tool)
|
||||
full_tool = catalog.get_tool(tool_definition.get_fully_qualified_name())
|
||||
dummy_context = ToolContext()
|
||||
|
||||
output = await ToolExecutor.run(
|
||||
func=multi_field_tool,
|
||||
definition=tool_definition,
|
||||
input_model=full_tool.input_model,
|
||||
output_model=full_tool.output_model,
|
||||
context=dummy_context,
|
||||
name=123, # wrong type
|
||||
age="not_an_int", # wrong type
|
||||
)
|
||||
|
||||
assert output.error is not None
|
||||
assert "Invalid input:" in output.error.message
|
||||
assert "name" in output.error.message
|
||||
assert "age" in output.error.message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_input_validation_error_does_not_leak_input_values():
|
||||
"""Pydantic's ``str(e)`` and ``err["input"]`` echo offending values back
|
||||
verbatim — which may contain user secrets (passwords, tokens, PII).
|
||||
Neither ``message`` (agent-facing) nor ``developer_message`` (Datadog log
|
||||
facet) must contain the rejected input values."""
|
||||
tool_definition = catalog.find_tool_by_func(multi_field_tool)
|
||||
full_tool = catalog.get_tool(tool_definition.get_fully_qualified_name())
|
||||
dummy_context = ToolContext()
|
||||
|
||||
secret = "SECRET_PASSWORD_DO_NOT_LEAK_42"
|
||||
output = await ToolExecutor.run(
|
||||
func=multi_field_tool,
|
||||
definition=tool_definition,
|
||||
input_model=full_tool.input_model,
|
||||
output_model=full_tool.output_model,
|
||||
context=dummy_context,
|
||||
name=12345, # wrong type, ignored for the leak check
|
||||
age=secret, # wrong type AND a "secret" value we want to ensure is not echoed
|
||||
)
|
||||
|
||||
assert output.error is not None
|
||||
# Field path + reason must be present (so agents can self-correct).
|
||||
assert "age" in output.error.message
|
||||
# But the actual rejected input value must NOT be anywhere in either field.
|
||||
assert secret not in output.error.message
|
||||
assert output.error.developer_message is not None
|
||||
assert secret not in output.error.developer_message
|
||||
# The integer wrong-type value also must not appear.
|
||||
assert "12345" not in output.error.message
|
||||
assert "12345" not in output.error.developer_message
|
||||
|
|
|
|||
126
libs/tests/core/test_log_extras.py
Normal file
126
libs/tests/core/test_log_extras.py
Normal file
|
|
@ -0,0 +1,126 @@
|
|||
"""Tests for ``build_tool_error_log_extra``.
|
||||
|
||||
Locks the Datadog-facet contract shared by the MCP and worker transports.
|
||||
Field names are load-bearing for dashboards.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from arcade_core.errors import ErrorKind
|
||||
from arcade_core.log_extras import build_tool_error_log_extra
|
||||
from arcade_core.schema import ToolCallError
|
||||
|
||||
|
||||
def _err(**kw):
|
||||
defaults = {
|
||||
"message": "Spreadsheet not found",
|
||||
"kind": ErrorKind.TOOL_RUNTIME_FATAL,
|
||||
"developer_message": None,
|
||||
"status_code": 404,
|
||||
"can_retry": False,
|
||||
}
|
||||
defaults.update(kw)
|
||||
return ToolCallError(**defaults)
|
||||
|
||||
|
||||
def test_canonical_field_contract():
|
||||
"""Locks the canonical field names every Datadog dashboard depends on."""
|
||||
extra = build_tool_error_log_extra(_err(developer_message="dev: x"), tool_name="MyTool")
|
||||
assert extra == {
|
||||
"error_kind": "TOOL_RUNTIME_FATAL",
|
||||
"error_message": "Spreadsheet not found",
|
||||
"error_developer_message": "dev: x",
|
||||
"error_status_code": 404,
|
||||
"error_can_retry": False,
|
||||
"tool_name": "MyTool",
|
||||
}
|
||||
|
||||
|
||||
def test_kind_enum_value_used_not_repr():
|
||||
"""``error_kind`` must be the stable string code (Datadog facets on
|
||||
``"TOOL_RUNTIME_FATAL"``, NOT the Python repr ``"ErrorKind.TOOL_RUNTIME_FATAL"``)."""
|
||||
extra = build_tool_error_log_extra(
|
||||
_err(kind=ErrorKind.UPSTREAM_RUNTIME_RATE_LIMIT), tool_name="t"
|
||||
)
|
||||
assert extra["error_kind"] == "UPSTREAM_RUNTIME_RATE_LIMIT"
|
||||
assert "ErrorKind." not in extra["error_kind"]
|
||||
|
||||
|
||||
def test_kind_string_fallback_when_not_enum():
|
||||
"""If something passes a raw string in ``kind`` (not a ToolCallError that
|
||||
would normalize it), the helper still produces a sensible value."""
|
||||
|
||||
class FakeError:
|
||||
kind = "CUSTOM_KIND"
|
||||
message = "x"
|
||||
developer_message = None
|
||||
status_code = None
|
||||
can_retry = False
|
||||
|
||||
extra = build_tool_error_log_extra(FakeError(), tool_name="t")
|
||||
assert extra["error_kind"] == "CUSTOM_KIND"
|
||||
|
||||
|
||||
def test_optional_toolkit_fields_omitted_by_default():
|
||||
"""Optional toolkit fields are only present when supplied — Datadog can
|
||||
distinguish 'not set' from 'set to None'."""
|
||||
extra = build_tool_error_log_extra(_err(), tool_name="MyTool")
|
||||
assert "toolkit_name" not in extra
|
||||
assert "toolkit_version" not in extra
|
||||
|
||||
|
||||
def test_optional_toolkit_fields_included_when_supplied():
|
||||
extra = build_tool_error_log_extra(
|
||||
_err(),
|
||||
tool_name="MyTool",
|
||||
toolkit_name="MyKit",
|
||||
toolkit_version="1.2.3",
|
||||
)
|
||||
assert extra["toolkit_name"] == "MyKit"
|
||||
assert extra["toolkit_version"] == "1.2.3"
|
||||
|
||||
|
||||
def test_additional_extras_merged():
|
||||
"""Worker passes ``execution_id``; MCP server may add others. The helper
|
||||
accepts arbitrary additions via **kwargs."""
|
||||
extra = build_tool_error_log_extra(
|
||||
_err(),
|
||||
tool_name="t",
|
||||
execution_id="exec_42",
|
||||
)
|
||||
assert extra["execution_id"] == "exec_42"
|
||||
|
||||
|
||||
def test_additional_extras_cannot_clobber_canonical_fields():
|
||||
"""A caller passing a key that collides with the canonical contract must
|
||||
NOT be able to override the canonical value — that would let one call site
|
||||
silently change the Datadog facet for every dashboard."""
|
||||
extra = build_tool_error_log_extra(
|
||||
_err(message="canonical"),
|
||||
tool_name="t",
|
||||
error_message="OVERRIDE_ATTEMPT",
|
||||
tool_name_override="ignored too",
|
||||
)
|
||||
assert extra["error_message"] == "canonical"
|
||||
# Non-canonical-name additions still pass through.
|
||||
assert extra["tool_name_override"] == "ignored too"
|
||||
|
||||
|
||||
def test_developer_message_none_propagates():
|
||||
"""``error_developer_message`` is present and explicitly None when the
|
||||
error has no developer_message — Datadog needs to distinguish 'unset'
|
||||
from 'set to None' on this field specifically."""
|
||||
extra = build_tool_error_log_extra(_err(developer_message=None), tool_name="t")
|
||||
assert "error_developer_message" in extra
|
||||
assert extra["error_developer_message"] is None
|
||||
|
||||
|
||||
def test_status_code_none_propagates():
|
||||
extra = build_tool_error_log_extra(_err(status_code=None), tool_name="t")
|
||||
assert "error_status_code" in extra
|
||||
assert extra["error_status_code"] is None
|
||||
|
||||
|
||||
@pytest.mark.parametrize("can_retry", [True, False])
|
||||
def test_can_retry_propagates_as_bool(can_retry):
|
||||
extra = build_tool_error_log_extra(_err(can_retry=can_retry), tool_name="t")
|
||||
assert extra["error_can_retry"] is can_retry
|
||||
|
|
@ -108,6 +108,36 @@ def test_fail(output_factory, message, developer_message):
|
|||
assert output.error.can_retry is False
|
||||
|
||||
|
||||
def test_fail_empty_message_gets_default(output_factory):
|
||||
output = output_factory.fail(message="")
|
||||
assert output.error is not None
|
||||
assert output.error.message == "Unspecified error during tool execution"
|
||||
|
||||
|
||||
def test_fail_whitespace_message_gets_default(output_factory):
|
||||
output = output_factory.fail(message=" ")
|
||||
assert output.error is not None
|
||||
assert output.error.message == "Unspecified error during tool execution"
|
||||
|
||||
|
||||
def test_fail_nonempty_message_unchanged(output_factory):
|
||||
output = output_factory.fail(message="real error")
|
||||
assert output.error is not None
|
||||
assert output.error.message == "real error"
|
||||
|
||||
|
||||
def test_fail_retry_empty_message_gets_default(output_factory):
|
||||
output = output_factory.fail_retry(message="")
|
||||
assert output.error is not None
|
||||
assert output.error.message == "Unspecified error during tool execution"
|
||||
|
||||
|
||||
def test_fail_retry_whitespace_message_gets_default(output_factory):
|
||||
output = output_factory.fail_retry(message=" ")
|
||||
assert output.error is not None
|
||||
assert output.error.message == "Unspecified error during tool execution"
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"message, developer_message, additional_prompt_content, retry_after_ms",
|
||||
[
|
||||
|
|
|
|||
141
libs/tests/tool/test_error_fallback.py
Normal file
141
libs/tests/tool/test_error_fallback.py
Normal file
|
|
@ -0,0 +1,141 @@
|
|||
from typing import Annotated
|
||||
|
||||
import pytest
|
||||
from arcade_core.catalog import ToolCatalog
|
||||
from arcade_core.executor import ToolExecutor
|
||||
from arcade_core.schema import ToolContext
|
||||
from arcade_tdk import tool
|
||||
|
||||
|
||||
catalog = ToolCatalog()
|
||||
|
||||
|
||||
@tool
|
||||
def keyerror_tool() -> Annotated[str, "output"]:
|
||||
"""Tool that raises a bare KeyError"""
|
||||
raise KeyError("x")
|
||||
|
||||
|
||||
@tool
|
||||
def empty_exception_tool() -> Annotated[str, "output"]:
|
||||
"""Tool that raises an exception with no message"""
|
||||
raise Exception()
|
||||
|
||||
|
||||
@tool
|
||||
def fallback_with_secret_in_exception_tool() -> Annotated[str, "output"]:
|
||||
"""Tool that raises an exception whose ``str()`` contains a fake secret.
|
||||
|
||||
Used to verify the strict data-leak policy: the secret must NEVER appear
|
||||
in the agent-facing ``message``. It is allowed in ``developer_message``,
|
||||
which is server-side only (Datadog facet, never returned to the MCP client).
|
||||
"""
|
||||
raise RuntimeError("Bad credentials for alice: password=hunter2_SECRET_PII")
|
||||
|
||||
|
||||
catalog.add_tool(keyerror_tool, "ErrorFallbackToolkit")
|
||||
catalog.add_tool(empty_exception_tool, "ErrorFallbackToolkit")
|
||||
catalog.add_tool(fallback_with_secret_in_exception_tool, "ErrorFallbackToolkit")
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fallback_message_contains_type_but_not_exception_string():
|
||||
"""The agent-facing ``message`` must include the exception **type**
|
||||
(so agents know what class of failure occurred) but must NOT include
|
||||
``str(exception)`` content — that's what the data-leak policy reserves
|
||||
for ``developer_message``."""
|
||||
tool_definition = catalog.find_tool_by_func(keyerror_tool)
|
||||
full_tool = catalog.get_tool(tool_definition.get_fully_qualified_name())
|
||||
|
||||
output = await ToolExecutor.run(
|
||||
func=keyerror_tool,
|
||||
definition=tool_definition,
|
||||
input_model=full_tool.input_model,
|
||||
output_model=full_tool.output_model,
|
||||
context=ToolContext(),
|
||||
)
|
||||
|
||||
assert output.error is not None
|
||||
# Type IS present (safe: class names are source-defined).
|
||||
assert "KeyError" in output.error.message
|
||||
# The hint to use FatalToolError is present.
|
||||
assert "FatalToolError" in output.error.message
|
||||
# The exception string content (``str(KeyError('x'))`` == ``"'x'"``) is NOT.
|
||||
# Stricter: no quoted argument representation should appear.
|
||||
assert "'x'" not in output.error.message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fallback_message_never_leaks_exception_str_content():
|
||||
"""Strict data-leak policy: even if the tool author embeds secrets in
|
||||
``str(exception)``, the agent-facing ``message`` MUST NOT contain them.
|
||||
The full content is preserved in ``developer_message`` (server-side
|
||||
logs only) so on-call engineers retain debugging context."""
|
||||
tool_definition = catalog.find_tool_by_func(fallback_with_secret_in_exception_tool)
|
||||
full_tool = catalog.get_tool(tool_definition.get_fully_qualified_name())
|
||||
|
||||
output = await ToolExecutor.run(
|
||||
func=fallback_with_secret_in_exception_tool,
|
||||
definition=tool_definition,
|
||||
input_model=full_tool.input_model,
|
||||
output_model=full_tool.output_model,
|
||||
context=ToolContext(),
|
||||
)
|
||||
|
||||
assert output.error is not None
|
||||
|
||||
# ── Agent-facing channel: nothing from str(exception) leaks ──
|
||||
assert "hunter2_SECRET_PII" not in output.error.message
|
||||
assert "alice" not in output.error.message
|
||||
assert "password=" not in output.error.message
|
||||
assert "Bad credentials" not in output.error.message
|
||||
# Type still present so the agent knows what failed.
|
||||
assert "RuntimeError" in output.error.message
|
||||
|
||||
# ── Server-side log channel: full content preserved for debugging ──
|
||||
assert output.error.developer_message is not None
|
||||
assert "hunter2_SECRET_PII" in output.error.developer_message
|
||||
assert "RuntimeError" in output.error.developer_message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fallback_developer_message_carries_full_exception_content():
|
||||
"""``developer_message`` (server-side log only) is where verbose exception
|
||||
content lives — type + str(exception) — so engineers can debug."""
|
||||
tool_definition = catalog.find_tool_by_func(keyerror_tool)
|
||||
full_tool = catalog.get_tool(tool_definition.get_fully_qualified_name())
|
||||
|
||||
output = await ToolExecutor.run(
|
||||
func=keyerror_tool,
|
||||
definition=tool_definition,
|
||||
input_model=full_tool.input_model,
|
||||
output_model=full_tool.output_model,
|
||||
context=ToolContext(),
|
||||
)
|
||||
|
||||
assert output.error is not None
|
||||
assert output.error.developer_message is not None
|
||||
assert "KeyError" in output.error.developer_message
|
||||
# The KeyError argument IS present in developer_message (vs absent in message).
|
||||
assert "'x'" in output.error.developer_message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fallback_empty_exception_developer_message_marks_no_details():
|
||||
"""Empty exceptions (``raise Exception()``) get a ``(no exception message)``
|
||||
marker in ``developer_message`` so on-call engineers can distinguish
|
||||
'nothing to log' from 'log was lost'."""
|
||||
tool_definition = catalog.find_tool_by_func(empty_exception_tool)
|
||||
full_tool = catalog.get_tool(tool_definition.get_fully_qualified_name())
|
||||
|
||||
output = await ToolExecutor.run(
|
||||
func=empty_exception_tool,
|
||||
definition=tool_definition,
|
||||
input_model=full_tool.input_model,
|
||||
output_model=full_tool.output_model,
|
||||
context=ToolContext(),
|
||||
)
|
||||
|
||||
assert output.error is not None
|
||||
assert output.error.developer_message is not None
|
||||
assert "Exception (no exception message)" in output.error.developer_message
|
||||
|
|
@ -124,6 +124,52 @@ async def test_call_tool_success(base_worker_no_auth):
|
|||
assert response.output.value == 8
|
||||
assert response.output.error is None
|
||||
assert response.execution_id == "test_exec_id"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_call_tool_success_and_error_logs_use_same_tool_identifiers(
|
||||
base_worker_no_auth, caplog
|
||||
):
|
||||
"""Success and error log lines must use identical tool identifier strings
|
||||
so logs can be correlated with a single grep pattern."""
|
||||
import logging
|
||||
|
||||
base_worker_no_auth.register_tool(sample_tool, toolkit_name="test_kit")
|
||||
base_worker_no_auth.register_tool(error_tool, toolkit_name="test_kit")
|
||||
|
||||
success_req = ToolCallRequest(
|
||||
execution_id="exec_consistency_ok",
|
||||
tool=ToolReference(toolkit="TestKit", name="SampleTool"),
|
||||
inputs={"a": 1, "b": 2},
|
||||
)
|
||||
error_req = ToolCallRequest(
|
||||
execution_id="exec_consistency_err",
|
||||
tool=ToolReference(toolkit="TestKit", name="ErrorTool"),
|
||||
inputs={},
|
||||
)
|
||||
|
||||
with caplog.at_level(logging.DEBUG, logger="arcade_serve.core.base"):
|
||||
await base_worker_no_auth.call_tool(success_req)
|
||||
await base_worker_no_auth.call_tool(error_req)
|
||||
|
||||
success_line = next(
|
||||
r for r in caplog.records if "exec_consistency_ok" in r.getMessage() and "success" in r.getMessage()
|
||||
)
|
||||
error_line = next(
|
||||
r for r in caplog.records if "exec_consistency_err" in r.getMessage() and "failed:" in r.getMessage()
|
||||
)
|
||||
# Both must use the bare tool name (".name"), NOT the full ``Toolkit.Tool`` fqname.
|
||||
assert "Tool SampleTool " in success_line.getMessage()
|
||||
assert "Tool ErrorTool " in error_line.getMessage()
|
||||
# Neither line should contain the full-fqname form ``TestKit.SampleTool``.
|
||||
assert "TestKit.SampleTool" not in success_line.getMessage()
|
||||
assert "TestKit.ErrorTool" not in error_line.getMessage()
|
||||
# Both must use the same "version <X>" word — proves the same source
|
||||
# (``tool_fqname.toolkit_version``) is read on both paths.
|
||||
assert "version " in success_line.getMessage()
|
||||
assert "version " in error_line.getMessage()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_call_tool_execution_error(base_worker_no_auth):
|
||||
# Tool is now defined at module level
|
||||
|
|
@ -147,6 +193,65 @@ async def test_call_tool_execution_error(base_worker_no_auth):
|
|||
assert response.output.error is not None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_call_tool_error_log_text_matches_structured_extras(base_worker_no_auth, caplog):
|
||||
"""The primary failure warning's f-string must use the same resolved
|
||||
``tool_fqname.name`` / ``tool_fqname.toolkit_version`` values that
|
||||
``log_extra`` exposes — otherwise the human-readable text and the
|
||||
Datadog facets disagree on which tool/version produced the error.
|
||||
Previously the f-string used ``tool_request.tool.version`` (the *requested*
|
||||
version, often ``None``) while the extras used the resolved version."""
|
||||
base_worker_no_auth.register_tool(error_tool, toolkit_name="error_kit")
|
||||
tool_request = ToolCallRequest(
|
||||
execution_id="exec_log_check",
|
||||
tool=ToolReference(toolkit="ErrorKit", name="ErrorTool"),
|
||||
inputs={},
|
||||
)
|
||||
|
||||
with caplog.at_level("WARNING", logger="arcade_serve.core.base"):
|
||||
await base_worker_no_auth.call_tool(tool_request)
|
||||
|
||||
primary = next(
|
||||
r for r in caplog.records if "exec_log_check" in r.getMessage() and "failed:" in r.getMessage()
|
||||
)
|
||||
# Text and structured extra must agree on name + version.
|
||||
assert "Tool ErrorTool " in primary.getMessage()
|
||||
assert getattr(primary, "tool_name", None) == "ErrorTool"
|
||||
extra_version = getattr(primary, "toolkit_version", None)
|
||||
assert f"version {extra_version}" in primary.getMessage()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_call_tool_error_secondary_log_carries_full_exception_content(
|
||||
base_worker_no_auth, caplog
|
||||
):
|
||||
"""Under the strict data-leak policy, the @tool fallback puts the verbose
|
||||
``str(exception)`` content into ``developer_message`` (server-side only,
|
||||
never returned to the MCP client). The secondary ``"Developer message: ..."``
|
||||
warning must therefore fire and carry that full content so on-call
|
||||
engineers retain debugging context — the channel where leakage WOULD
|
||||
matter (agent-facing ``message``) is covered by the dedicated leak tests
|
||||
in ``libs/tests/tool/test_error_fallback.py``."""
|
||||
base_worker_no_auth.register_tool(error_tool, toolkit_name="error_kit")
|
||||
tool_request = ToolCallRequest(
|
||||
execution_id="exec_dev_msg",
|
||||
tool=ToolReference(toolkit="ErrorKit", name="ErrorTool"),
|
||||
inputs={},
|
||||
)
|
||||
|
||||
with caplog.at_level("WARNING", logger="arcade_serve.core.base"):
|
||||
await base_worker_no_auth.call_tool(tool_request)
|
||||
|
||||
secondary = [
|
||||
r for r in caplog.records
|
||||
if "exec_dev_msg" in r.getMessage() and "Developer message:" in r.getMessage()
|
||||
]
|
||||
assert len(secondary) == 1, "secondary 'Developer message:' log should fire once"
|
||||
# The full exception content is in the secondary log (and in Datadog facets).
|
||||
assert "ValueError" in secondary[0].getMessage()
|
||||
assert "Something went wrong" in secondary[0].getMessage()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_call_tool_not_found(base_worker_no_auth):
|
||||
# Use ToolReference without version for lookup consistency
|
||||
|
|
|
|||
Loading…
Reference in a new issue