arcade-mcp

Author SHA1 Message Date

Author	SHA1	Message	Date
Eric Gustin	113d0d3086	CLI Usage (#593 ) TLDR; The philosophy of CLI usage is "fire and forget" and "best effort". You can opt out by setting `ARCADE_USAGE_TRACKING=0`. We are capturing two events: `CLI execution succeeded` and `CLI execution failed`. Reporting to PostHog is a short lived (maximum 10 seconds) subprocess that does not block the main CLI execution process. `~/.arcade/usage.json` persists two values `anon_id` and `linked_principal_id`. The logged in status of the CLI user determines which ID is used. Upon `arcade login`, the `anon_id` is aliased with `linked_principal_id`. Upon `arcade logout` the `linked_principal_id` is removed and the `anon_id` is rotated. ## CLI Usage Tracking - How It Works The usage tracking system implements an identity management and event tracking pipeline. Here's how the pieces work together: ### Identity State Management (`usage.json`) The system maintains a persistent identity file at `~/.arcade/usage.json` with this structure: ```json { "anon_id": "uuid", "linked_principal_id": "uuid" \| null } ``` Key mechanics: - `anon_id`: Generated once on first CLI use and persists across sessions. This UUID tracks all anonymous activity. - `linked_principal_id`: Initially `null`. Once the user logs in and we successfully alias their identity, this field stores their `principal_id` to indicate this `anon_id` has been linked. - Atomic writes: All updates use a temp file + atomic rename pattern to prevent corruption from concurrent CLI processes - File locking: Uses `fcntl` (Unix) to coordinate reads/writes across multiple simultaneous CLI invocations - In-memory cache: The `UsageIdentity` class caches the loaded data to avoid repeated file I/O within a single CLI invocation ### Identity Resolution Flow When tracking an event, the system determines the `distinct_id` (who to attribute the event to) via this waterfall: 1. Check `linked_principal_id` in `usage.json` - If present → use it (user was previously aliased) - This is the fastest path and avoids API calls 2. Fetch `principal_id` from Arcade Cloud API - Makes HTTP request to `/api/v1/auth/validate` with the user's API key from `~/.arcade/credentials.yaml` - If authenticated → returns `principal_id` - Has 2s timeout for responsiveness 3. Fall back to `anon_id` - If not authenticated or API call fails → use anonymous ID - Marks event with `is_anon=True` flag ### The Aliasing Lifecycle PostHog aliasing links anonymous activity to authenticated users. Here's the state machine: #### Stage 1: Anonymous User ``` usage.json: { "anon_id": "abc-123", "linked_principal_id": null } All events → sent with distinct_id="abc-123" and is_anon=True ``` #### Stage 2: Login Event 1. User runs `arcade login` 2. Command completes successfully (auth token saved) 3. `CommandTracker` detects successful login 4. Fetches `principal_id` from API 5. Checks `should_alias()` → returns `True` because `linked_principal_id` is `null` 6. Calls `alias()` synchronously (blocking): ```python posthog.alias(previous_id="abc-123", distinct_id="zyx-321") ``` 7. Updates `usage.json`: ```json { "anon_id": "abc-123", "linked_principal_id": "zyx-321" } ``` 8. PostHog backend merges all events with `distinct_id="abc-123"` into the user profile for `"zyx-321"` #### Stage 3: Authenticated User ``` usage.json: { "anon_id": "abc-123", "linked_principal_id": "zyx-321" } All events → sent with distinct_id="zyx-321" and is_anon=False ``` - Events are directly attributed to the authenticated user - No more API calls needed (uses cached `linked_principal_id`) #### Stage 4: Logout Event 1. User runs `arcade logout` 2. Logout event is sent with the authenticated `distinct_id` 3. `CommandTracker` detects successful logout 4. Rotates identity by calling `reset_to_anonymous()`: ```json { "anon_id": "xyz-789", "linked_principal_id": null } ``` 5. New `anon_id` prevents cross-contamination if another user logs in ### Critical Constraint: Alias Timing PostHog requires that `alias()` is called BEFORE any events are sent with the new `distinct_id`. This is why: - `alias()` is synchronous (blocking): Guarantees it completes before the login success event is sent - Subsequent events use `linked_principal_id`: Once aliased, all future events use the authenticated ID - Lazy aliasing: If a user authenticates via another mechanism (not through `arcade login`), the system detects this on the next command and performs aliasing before sending that command's event ### Event Capture Pipeline When `CommandTracker.track_command_execution()` is called: 1. Resolve identity → determines `distinct_id` and `is_anon` flag 2. Build event properties: ```python { "command_name": "toolkit.run", "cli_version": "1.2.3", "python_version": "3.11.0", "os_type": "Darwin", "os_release": "23.4.0", "duration": 1250.42, # milliseconds "error_message": "..." # if failed } ``` 3. Call `UsageService.capture()`: - Serializes event data to JSON - Spawns detached subprocess: `python -m arcade_cli.usage` - Passes data via `ARCADE_USAGE_EVENT_DATA` env var - Returns immediately (non-blocking) 4. Detached subprocess (`__main__.py`): - Runs independently, survives parent CLI exit - Deserializes event data - If `is_anon=True`, sets `$process_person_profile=False` (tells PostHog not to create a full profile) - Sends event to PostHog with 5s timeout - Exits (hard exit after 10s max via timeout thread) ### Concurrency Handling Multiple CLI processes can run simultaneously. The system handles this via: - File locking on `usage.json` (shared lock for reads, exclusive for writes) - Atomic writes via temp files ensure incomplete writes never corrupt the file - Idempotent aliasing: `should_alias()` prevents redundant alias calls ### Edge Cases Handled 1. Side-channel authentication: User authenticates outside of `arcade login` (e.g., manually editing credentials) - Detected via "lazy aliasing" check on every command - Performs alias if `linked_principal_id` doesn't match current `principal_id` 2. API failures during identity fetch: Falls back to anonymous tracking - 2s timeout prevents hanging - Silent failure doesn't disrupt CLI 3. PostHog merge restrictions: Can't alias returning users who already have a profile - System stores `linked_principal_id` to avoid retrying impossible aliases - New users (never logged in before) get full history stitched 4. Multiple accounts on same machine: Logout rotates `anon_id` - User A's anonymous activity won't leak into User B's profile ### Privacy & Performance - Opt-out: `ARCADE_USAGE_TRACKING=0` disables all tracking - Non-blocking: Events never slow down CLI (detached subprocess) - Anonymous profiles: `$process_person_profile=False` for `anon_id` events minimizes data collection - Silent failures: Network issues or PostHog errors never surface to users	2025-10-03 10:15:08 -07:00
Sterling Dreyer	f4480c3945	Fix `arcade worker list` endpoints (#504 ) We weren't checking that the engine version of the worker was the same as the cloud version that we were comparing against and incorrectly saying the URL was wrong Before <img width="1447" height="340" alt="Screenshot 2025-07-21 at 1 55 13 PM" src="https://github.com/user-attachments/assets/cf39ce9f-0c86-45fd-a68e-c92369876292" /> After <img width="1454" height="308" alt="Screenshot 2025-07-21 at 1 55 07 PM" src="https://github.com/user-attachments/assets/efcfe6c8-b892-45f7-bf4c-71edc66c8325" />	2025-07-21 14:43:58 -07:00
Sam Partee	b6b4cd0a4c	🏗️ Restructure: Multi-Package Architecture + uv Migration (#412 ) ### Overview Major restructuring from monolithic `arcade-ai` package to modular library architecture with standardized uv-based dependency management. ![arcade-ai Monorepo (2)](https://github.com/user-attachments/assets/25f102b0-bb87-4a04-9701-d227d05664b1) ### New Package Structure - `arcade-tdk` - Lightweight toolkit development kit (core decorators, auth) - `arcade-core` - Core execution engine and catalog functionality - `arcade-serve` - FastAPI/MCP server components - `arcade-ai` - Meta package that includes CLI functionality. Optionally include evals via the `evals` extra. Optionally include all packages via the `all` extra. ### Key Benefits - Lighter Dependencies: Toolkits now depend only on `arcade-tdk` (~2 deps) vs full `arcade-ai` (~30+ deps) - Faster Builds: uv provides 10-100x faster dependency resolution and installation - Better Modularity: Clear separation of concerns, consumers import only what they need - Standard Tooling: Eliminates custom poetry scripts, uses standard Python packaging ### Migration Impact - All 20 toolkits converted from poetry → uv with `arcade-tdk` dependencies plus `arcade-ai[evals]` and `arcade-serve` dev dependencies. When developing locally, devs should install toolkits via `make install-local`. - Modern Python 3.10+ type hints throughout - Standardized build system with hatchling backend - Enhanced Makefile with robust toolkit management commands - Removed `arcade dev` CLI command - Reduce the number of files created by `arcade new` and add an option to not generate a tests and evals folder. This foundation enables faster development cycles and cleaner dependency chains for the growing toolkit ecosystem. ### Todo After this PR is merged - [ ] Post-merge workflow(s) (release & publish containers, etc) - [ ] Release order plan. @EricGustin suggests releasing in the following order: 1. `arcade-core` version 0.1.0 2. `arcade-serve` version 0.1.0 and `arcade-tdk` version 0.1.0 3. `arcade-ai` version 2.0.0 4. Patch release for all toolkits (all changes in toolkits are internal refactors) - [ ] [Update docs](https://github.com/ArcadeAI/docs/pull/318) --------- Co-authored-by: Eric Gustin <eric@arcade.dev> Co-authored-by: Eric Gustin <34000337+EricGustin@users.noreply.github.com>	2025-06-11 16:48:17 -07:00

Eric Gustin

113d0d3086

CLI Usage (#593 )

TLDR; 

The philosophy of CLI usage is "fire and forget" and "best effort". You
can opt out by setting `ARCADE_USAGE_TRACKING=0`.

We are capturing two events: `CLI execution succeeded` and `CLI
execution failed`. Reporting to PostHog is a short lived (maximum 10
seconds) subprocess that does not block the main CLI execution process.

`~/.arcade/usage.json` persists two values `anon_id` and
`linked_principal_id`. The logged in status of the CLI user determines
which ID is used. Upon `arcade login`, the `anon_id` is aliased with
`linked_principal_id`. Upon `arcade logout` the `linked_principal_id` is
removed and the `anon_id` is rotated.

## CLI Usage Tracking - How It Works

The usage tracking system implements an identity management and event
tracking pipeline. Here's how the pieces work together:

### **Identity State Management (`usage.json`)**

The system maintains a persistent identity file at
`~/.arcade/usage.json` with this structure:
```json
{
  "anon_id": "uuid",
  "linked_principal_id": "uuid" | null
}
```

**Key mechanics:**
- **`anon_id`**: Generated once on first CLI use and persists across
sessions. This UUID tracks all anonymous activity.
- **`linked_principal_id`**: Initially `null`. Once the user logs in and
we successfully alias their identity, this field stores their
`principal_id` to indicate this `anon_id` has been linked.
- **Atomic writes**: All updates use a temp file + atomic rename pattern
to prevent corruption from concurrent CLI processes
- **File locking**: Uses `fcntl` (Unix) to coordinate reads/writes
across multiple simultaneous CLI invocations
- **In-memory cache**: The `UsageIdentity` class caches the loaded data
to avoid repeated file I/O within a single CLI invocation

### **Identity Resolution Flow**

When tracking an event, the system determines the `distinct_id` (who to
attribute the event to) via this waterfall:

1. **Check `linked_principal_id`** in `usage.json`
   - If present → use it (user was previously aliased)
   - This is the fastest path and avoids API calls

2. **Fetch `principal_id` from Arcade Cloud API**
- Makes HTTP request to `/api/v1/auth/validate` with the user's API key
from `~/.arcade/credentials.yaml`
   - If authenticated → returns `principal_id`
   - Has 2s timeout for responsiveness

3. **Fall back to `anon_id`**
   - If not authenticated or API call fails → use anonymous ID
   - Marks event with `is_anon=True` flag

### **The Aliasing Lifecycle**

PostHog aliasing links anonymous activity to authenticated users. Here's
the state machine:

#### **Stage 1: Anonymous User**
```
usage.json: { "anon_id": "abc-123", "linked_principal_id": null }
All events → sent with distinct_id="abc-123" and is_anon=True
```

#### **Stage 2: Login Event**
1. User runs `arcade login`
2. Command completes successfully (auth token saved)
3. `CommandTracker` detects successful login
4. Fetches `principal_id` from API
5. Checks `should_alias()` → returns `True` because
`linked_principal_id` is `null`
6. **Calls `alias()` synchronously** (blocking):
   ```python
   posthog.alias(previous_id="abc-123", distinct_id="zyx-321")
   ```
7. Updates `usage.json`:
   ```json
   { "anon_id": "abc-123", "linked_principal_id": "zyx-321" }
   ```
8. PostHog backend merges all events with `distinct_id="abc-123"` into
the user profile for `"zyx-321"`

#### **Stage 3: Authenticated User**
```
usage.json: { "anon_id": "abc-123", "linked_principal_id": "zyx-321" }
All events → sent with distinct_id="zyx-321" and is_anon=False
```
- Events are directly attributed to the authenticated user
- No more API calls needed (uses cached `linked_principal_id`)

#### **Stage 4: Logout Event**
1. User runs `arcade logout`
2. Logout event is sent with the authenticated `distinct_id`
3. `CommandTracker` detects successful logout
4. **Rotates identity** by calling `reset_to_anonymous()`:
   ```json
   { "anon_id": "xyz-789", "linked_principal_id": null }
   ```
5. New `anon_id` prevents cross-contamination if another user logs in

### **Critical Constraint: Alias Timing**

PostHog requires that `alias()` is called **BEFORE** any events are sent
with the new `distinct_id`. This is why:
- **`alias()` is synchronous (blocking)**: Guarantees it completes
before the login success event is sent
- **Subsequent events use `linked_principal_id`**: Once aliased, all
future events use the authenticated ID
- **Lazy aliasing**: If a user authenticates via another mechanism (not
through `arcade login`), the system detects this on the next command and
performs aliasing before sending that command's event

### **Event Capture Pipeline**

When `CommandTracker.track_command_execution()` is called:

1. **Resolve identity** → determines `distinct_id` and `is_anon` flag
2. **Build event properties**:
   ```python
   {
     "command_name": "toolkit.run",
     "cli_version": "1.2.3",
     "python_version": "3.11.0",
     "os_type": "Darwin",
     "os_release": "23.4.0",
     "duration": 1250.42,  # milliseconds
     "error_message": "..."  # if failed
   }
   ```
3. **Call `UsageService.capture()`**:
   - Serializes event data to JSON
   - Spawns detached subprocess: `python -m arcade_cli.usage`
   - Passes data via `ARCADE_USAGE_EVENT_DATA` env var
   - **Returns immediately** (non-blocking)

4. **Detached subprocess (`__main__.py`)**:
   - Runs independently, survives parent CLI exit
   - Deserializes event data
- If `is_anon=True`, sets `$process_person_profile=False` (tells PostHog
not to create a full profile)
   - Sends event to PostHog with 5s timeout
   - Exits (hard exit after 10s max via timeout thread)

### **Concurrency Handling**

Multiple CLI processes can run simultaneously. The system handles this
via:
- **File locking** on `usage.json` (shared lock for reads, exclusive for
writes)
- **Atomic writes** via temp files ensure incomplete writes never
corrupt the file
- **Idempotent aliasing**: `should_alias()` prevents redundant alias
calls

### **Edge Cases Handled**

1. **Side-channel authentication**: User authenticates outside of
`arcade login` (e.g., manually editing credentials)
   - Detected via "lazy aliasing" check on every command
- Performs alias if `linked_principal_id` doesn't match current
`principal_id`

2. **API failures during identity fetch**: Falls back to anonymous
tracking
   - 2s timeout prevents hanging
   - Silent failure doesn't disrupt CLI

3. **PostHog merge restrictions**: Can't alias returning users who
already have a profile
- System stores `linked_principal_id` to avoid retrying impossible
aliases
   - New users (never logged in before) get full history stitched

4. **Multiple accounts on same machine**: Logout rotates `anon_id`
   - User A's anonymous activity won't leak into User B's profile

### **Privacy & Performance**

- **Opt-out**: `ARCADE_USAGE_TRACKING=0` disables all tracking
- **Non-blocking**: Events never slow down CLI (detached subprocess)
- **Anonymous profiles**: `$process_person_profile=False` for `anon_id`
events minimizes data collection
- **Silent failures**: Network issues or PostHog errors never surface to
users

2025-10-03 10:15:08 -07:00

Sterling Dreyer

f4480c3945

Fix arcade worker list endpoints (#504 )

We weren't checking that the engine version of the worker was the same
as the cloud version that we were comparing against and incorrectly
saying the URL was wrong

Before
<img width="1447" height="340" alt="Screenshot 2025-07-21 at 1 55 13 PM"
src="https://github.com/user-attachments/assets/cf39ce9f-0c86-45fd-a68e-c92369876292"
/>

After
<img width="1454" height="308" alt="Screenshot 2025-07-21 at 1 55 07 PM"
src="https://github.com/user-attachments/assets/efcfe6c8-b892-45f7-bf4c-71edc66c8325"
/>

2025-07-21 14:43:58 -07:00

Sam Partee

b6b4cd0a4c

🏗️ Restructure: Multi-Package Architecture + uv Migration (#412 )

### Overview
Major restructuring from monolithic `arcade-ai` package to modular
library architecture with standardized uv-based dependency management.

![arcade-ai Monorepo
(2)](https://github.com/user-attachments/assets/25f102b0-bb87-4a04-9701-d227d05664b1)

### New Package Structure
- **`arcade-tdk`** - Lightweight toolkit development kit (core
decorators, auth)
- **`arcade-core`** - Core execution engine and catalog functionality  
- **`arcade-serve`** - FastAPI/MCP server components
- **`arcade-ai`** - Meta package that includes CLI functionality.
Optionally include evals via the `evals` extra. Optionally include all
packages via the `all` extra.

### Key Benefits
- **Lighter Dependencies**: Toolkits now depend only on `arcade-tdk` (~2
deps) vs full `arcade-ai` (~30+ deps)
- **Faster Builds**: uv provides 10-100x faster dependency resolution
and installation
- **Better Modularity**: Clear separation of concerns, consumers import
only what they need
- **Standard Tooling**: Eliminates custom poetry scripts, uses standard
Python packaging

### Migration Impact
- All 20 toolkits converted from poetry → uv with `arcade-tdk`
dependencies plus `arcade-ai[evals]` and `arcade-serve` dev
dependencies. When developing locally, devs should install toolkits via
`make install-local`.
- Modern Python 3.10+ type hints throughout
- Standardized build system with hatchling backend
- Enhanced Makefile with robust toolkit management commands
- Removed `arcade dev` CLI command
- Reduce the number of files created by `arcade new` and add an option
to not generate a tests and evals folder.

This foundation enables faster development cycles and cleaner dependency
chains for the growing toolkit ecosystem.

### Todo After this PR is merged
- [ ] Post-merge workflow(s) (release & publish containers, etc)
- [ ] Release order plan. @EricGustin suggests releasing in the
following order:
    1. `arcade-core` version 0.1.0
    2. `arcade-serve` version 0.1.0 and `arcade-tdk` version 0.1.0
    3. `arcade-ai` version 2.0.0
4. Patch release for all toolkits (all changes in toolkits are internal
refactors)
- [ ] [Update docs](https://github.com/ArcadeAI/docs/pull/318)

---------

Co-authored-by: Eric Gustin <eric@arcade.dev>
Co-authored-by: Eric Gustin <34000337+EricGustin@users.noreply.github.com>

2025-06-11 16:48:17 -07:00

Renamed from arcade/arcade/cli/worker.py (Browse further)

3 commits