TLDR;
The philosophy of CLI usage is "fire and forget" and "best effort". You
can opt out by setting `ARCADE_USAGE_TRACKING=0`.
We are capturing two events: `CLI execution succeeded` and `CLI
execution failed`. Reporting to PostHog is a short lived (maximum 10
seconds) subprocess that does not block the main CLI execution process.
`~/.arcade/usage.json` persists two values `anon_id` and
`linked_principal_id`. The logged in status of the CLI user determines
which ID is used. Upon `arcade login`, the `anon_id` is aliased with
`linked_principal_id`. Upon `arcade logout` the `linked_principal_id` is
removed and the `anon_id` is rotated.
## CLI Usage Tracking - How It Works
The usage tracking system implements an identity management and event
tracking pipeline. Here's how the pieces work together:
### **Identity State Management (`usage.json`)**
The system maintains a persistent identity file at
`~/.arcade/usage.json` with this structure:
```json
{
"anon_id": "uuid",
"linked_principal_id": "uuid" | null
}
```
**Key mechanics:**
- **`anon_id`**: Generated once on first CLI use and persists across
sessions. This UUID tracks all anonymous activity.
- **`linked_principal_id`**: Initially `null`. Once the user logs in and
we successfully alias their identity, this field stores their
`principal_id` to indicate this `anon_id` has been linked.
- **Atomic writes**: All updates use a temp file + atomic rename pattern
to prevent corruption from concurrent CLI processes
- **File locking**: Uses `fcntl` (Unix) to coordinate reads/writes
across multiple simultaneous CLI invocations
- **In-memory cache**: The `UsageIdentity` class caches the loaded data
to avoid repeated file I/O within a single CLI invocation
### **Identity Resolution Flow**
When tracking an event, the system determines the `distinct_id` (who to
attribute the event to) via this waterfall:
1. **Check `linked_principal_id`** in `usage.json`
- If present → use it (user was previously aliased)
- This is the fastest path and avoids API calls
2. **Fetch `principal_id` from Arcade Cloud API**
- Makes HTTP request to `/api/v1/auth/validate` with the user's API key
from `~/.arcade/credentials.yaml`
- If authenticated → returns `principal_id`
- Has 2s timeout for responsiveness
3. **Fall back to `anon_id`**
- If not authenticated or API call fails → use anonymous ID
- Marks event with `is_anon=True` flag
### **The Aliasing Lifecycle**
PostHog aliasing links anonymous activity to authenticated users. Here's
the state machine:
#### **Stage 1: Anonymous User**
```
usage.json: { "anon_id": "abc-123", "linked_principal_id": null }
All events → sent with distinct_id="abc-123" and is_anon=True
```
#### **Stage 2: Login Event**
1. User runs `arcade login`
2. Command completes successfully (auth token saved)
3. `CommandTracker` detects successful login
4. Fetches `principal_id` from API
5. Checks `should_alias()` → returns `True` because
`linked_principal_id` is `null`
6. **Calls `alias()` synchronously** (blocking):
```python
posthog.alias(previous_id="abc-123", distinct_id="zyx-321")
```
7. Updates `usage.json`:
```json
{ "anon_id": "abc-123", "linked_principal_id": "zyx-321" }
```
8. PostHog backend merges all events with `distinct_id="abc-123"` into
the user profile for `"zyx-321"`
#### **Stage 3: Authenticated User**
```
usage.json: { "anon_id": "abc-123", "linked_principal_id": "zyx-321" }
All events → sent with distinct_id="zyx-321" and is_anon=False
```
- Events are directly attributed to the authenticated user
- No more API calls needed (uses cached `linked_principal_id`)
#### **Stage 4: Logout Event**
1. User runs `arcade logout`
2. Logout event is sent with the authenticated `distinct_id`
3. `CommandTracker` detects successful logout
4. **Rotates identity** by calling `reset_to_anonymous()`:
```json
{ "anon_id": "xyz-789", "linked_principal_id": null }
```
5. New `anon_id` prevents cross-contamination if another user logs in
### **Critical Constraint: Alias Timing**
PostHog requires that `alias()` is called **BEFORE** any events are sent
with the new `distinct_id`. This is why:
- **`alias()` is synchronous (blocking)**: Guarantees it completes
before the login success event is sent
- **Subsequent events use `linked_principal_id`**: Once aliased, all
future events use the authenticated ID
- **Lazy aliasing**: If a user authenticates via another mechanism (not
through `arcade login`), the system detects this on the next command and
performs aliasing before sending that command's event
### **Event Capture Pipeline**
When `CommandTracker.track_command_execution()` is called:
1. **Resolve identity** → determines `distinct_id` and `is_anon` flag
2. **Build event properties**:
```python
{
"command_name": "toolkit.run",
"cli_version": "1.2.3",
"python_version": "3.11.0",
"os_type": "Darwin",
"os_release": "23.4.0",
"duration": 1250.42, # milliseconds
"error_message": "..." # if failed
}
```
3. **Call `UsageService.capture()`**:
- Serializes event data to JSON
- Spawns detached subprocess: `python -m arcade_cli.usage`
- Passes data via `ARCADE_USAGE_EVENT_DATA` env var
- **Returns immediately** (non-blocking)
4. **Detached subprocess (`__main__.py`)**:
- Runs independently, survives parent CLI exit
- Deserializes event data
- If `is_anon=True`, sets `$process_person_profile=False` (tells PostHog
not to create a full profile)
- Sends event to PostHog with 5s timeout
- Exits (hard exit after 10s max via timeout thread)
### **Concurrency Handling**
Multiple CLI processes can run simultaneously. The system handles this
via:
- **File locking** on `usage.json` (shared lock for reads, exclusive for
writes)
- **Atomic writes** via temp files ensure incomplete writes never
corrupt the file
- **Idempotent aliasing**: `should_alias()` prevents redundant alias
calls
### **Edge Cases Handled**
1. **Side-channel authentication**: User authenticates outside of
`arcade login` (e.g., manually editing credentials)
- Detected via "lazy aliasing" check on every command
- Performs alias if `linked_principal_id` doesn't match current
`principal_id`
2. **API failures during identity fetch**: Falls back to anonymous
tracking
- 2s timeout prevents hanging
- Silent failure doesn't disrupt CLI
3. **PostHog merge restrictions**: Can't alias returning users who
already have a profile
- System stores `linked_principal_id` to avoid retrying impossible
aliases
- New users (never logged in before) get full history stitched
4. **Multiple accounts on same machine**: Logout rotates `anon_id`
- User A's anonymous activity won't leak into User B's profile
### **Privacy & Performance**
- **Opt-out**: `ARCADE_USAGE_TRACKING=0` disables all tracking
- **Non-blocking**: Events never slow down CLI (detached subprocess)
- **Anonymous profiles**: `$process_person_profile=False` for `anon_id`
events minimizes data collection
- **Silent failures**: Network issues or PostHog errors never surface to
users
207 lines
6.6 KiB
Python
207 lines
6.6 KiB
Python
"""
|
|
Identity management for PostHog analytics tracking.
|
|
|
|
Handles anonymous/authenticated identity tracking with PostHog aliasing,
|
|
supporting pre-login anonymous tracking, post-login identity stitching,
|
|
and logout identity rotation.
|
|
"""
|
|
|
|
import fcntl
|
|
import json
|
|
import os
|
|
import tempfile
|
|
import uuid
|
|
from typing import Any
|
|
|
|
import httpx
|
|
import yaml
|
|
from arcade_cli.constants import ARCADE_CONFIG_PATH, CREDENTIALS_FILE_PATH
|
|
from arcade_cli.usage.constants import (
|
|
KEY_ANON_ID,
|
|
KEY_LINKED_PRINCIPAL_ID,
|
|
TIMEOUT_ARCADE_API,
|
|
USAGE_FILE_NAME,
|
|
)
|
|
|
|
|
|
class UsageIdentity:
|
|
"""Manages user identity for PostHog analytics tracking."""
|
|
|
|
def __init__(self) -> None:
|
|
self.usage_file_path = os.path.join(ARCADE_CONFIG_PATH, USAGE_FILE_NAME)
|
|
self._data: dict[str, Any] | None = None
|
|
|
|
def load_or_create(self) -> dict[str, Any]:
|
|
"""Load or create usage.json file with atomic writes and file locking.
|
|
|
|
Returns:
|
|
dict: The usage data containing anon_id and optionally linked_email
|
|
"""
|
|
if self._data is not None:
|
|
return self._data
|
|
|
|
os.makedirs(ARCADE_CONFIG_PATH, exist_ok=True)
|
|
|
|
if os.path.exists(self.usage_file_path):
|
|
try:
|
|
with open(self.usage_file_path) as f:
|
|
# lock file
|
|
if os.name != "nt": # Unix-like systems
|
|
fcntl.flock(f.fileno(), fcntl.LOCK_SH)
|
|
try:
|
|
data = json.load(f)
|
|
if isinstance(data, dict) and KEY_ANON_ID in data:
|
|
self._data = data
|
|
return self._data
|
|
finally:
|
|
# unlock file
|
|
if os.name != "nt":
|
|
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
|
|
except Exception: # noqa: S110
|
|
pass
|
|
|
|
new_data = {KEY_ANON_ID: str(uuid.uuid4()), KEY_LINKED_PRINCIPAL_ID: None}
|
|
|
|
self._write_atomic(new_data)
|
|
self._data = new_data
|
|
return self._data
|
|
|
|
def _write_atomic(self, data: dict[str, Any]) -> None:
|
|
"""Write data atomically to usage.json file
|
|
|
|
Args:
|
|
data: The data to write to the usage file
|
|
"""
|
|
# Create temp file in same directory for atomic rename
|
|
temp_fd, temp_path = tempfile.mkstemp(
|
|
dir=ARCADE_CONFIG_PATH, prefix=".usage_", suffix=".tmp"
|
|
)
|
|
|
|
try:
|
|
with os.fdopen(temp_fd, "w") as f:
|
|
# lock file
|
|
if os.name != "nt": # Unix-like systems
|
|
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
|
|
try:
|
|
json.dump(data, f, indent=2)
|
|
f.flush()
|
|
os.fsync(f.fileno()) # ensure data is written to disk
|
|
finally:
|
|
if os.name != "nt":
|
|
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
|
|
|
|
os.rename(temp_path, self.usage_file_path)
|
|
except Exception:
|
|
# clean up
|
|
import contextlib
|
|
|
|
with contextlib.suppress(OSError):
|
|
os.unlink(temp_path)
|
|
raise
|
|
|
|
def get_distinct_id(self) -> str:
|
|
"""Get distinct_id based on authentication state.
|
|
|
|
We use principal_id for authenticated users and anon_id for anonymous users.
|
|
|
|
Returns:
|
|
str: Principal ID if authenticated, otherwise anon_id
|
|
"""
|
|
data = self.load_or_create()
|
|
|
|
# Check if we have a persisted principal_id first
|
|
linked_principal_id = data.get(KEY_LINKED_PRINCIPAL_ID)
|
|
if linked_principal_id:
|
|
return str(linked_principal_id)
|
|
|
|
# Try to fetch principal_id from API if not persisted
|
|
principal_id = self.get_principal_id()
|
|
if principal_id:
|
|
return principal_id
|
|
|
|
# Fall back to anon_id if not authenticated
|
|
return str(data[KEY_ANON_ID])
|
|
|
|
def get_principal_id(self) -> str | None:
|
|
"""Fetch principal_id from Arcade Cloud API.
|
|
|
|
Returns:
|
|
str | None: Principal ID if authenticated and API call succeeds, None otherwise
|
|
"""
|
|
if not os.path.exists(CREDENTIALS_FILE_PATH):
|
|
return None
|
|
|
|
try:
|
|
with open(CREDENTIALS_FILE_PATH) as f:
|
|
config = yaml.safe_load(f)
|
|
|
|
cloud_config = config.get("cloud", {})
|
|
api_key = cloud_config.get("api", {}).get("key")
|
|
|
|
if not api_key:
|
|
return None
|
|
|
|
response = httpx.get(
|
|
"https://cloud.arcade.dev/api/v1/auth/validate",
|
|
headers={"accept": "application/json", "Authorization": f"Bearer {api_key}"},
|
|
timeout=TIMEOUT_ARCADE_API,
|
|
)
|
|
|
|
if response.status_code == 200:
|
|
data = response.json()
|
|
principal_id = data.get("data", {}).get("principal_id")
|
|
return str(principal_id) if principal_id else None
|
|
|
|
except Exception: # noqa: S110
|
|
# Silent failure - don't disrupt CLI
|
|
pass
|
|
|
|
return None
|
|
|
|
def should_alias(self) -> bool:
|
|
"""Check if PostHog alias is needed.
|
|
|
|
Alias is needed when the user is authenticated,
|
|
but the retrieved principal_id doesn't match the persisted linked_principal_id
|
|
|
|
Returns:
|
|
bool: True if user is authenticated but not yet aliased
|
|
"""
|
|
data = self.load_or_create()
|
|
principal_id = self.get_principal_id()
|
|
|
|
return principal_id is not None and principal_id != data.get(KEY_LINKED_PRINCIPAL_ID)
|
|
|
|
def reset_to_anonymous(self) -> None:
|
|
"""Generate new anonymous ID and clear linked principal_id.
|
|
|
|
Used after logout to prevent cross-contamination between multiple
|
|
accounts on the same machine
|
|
"""
|
|
# Create fresh data with only anon_id
|
|
new_data = {KEY_ANON_ID: str(uuid.uuid4()), KEY_LINKED_PRINCIPAL_ID: None}
|
|
|
|
self._write_atomic(new_data)
|
|
self._data = new_data
|
|
|
|
def set_linked_principal_id(self, principal_id: str) -> None:
|
|
"""Update linked_principal_id in usage.json.
|
|
|
|
Args:
|
|
principal_id: The principal_id to link to the current anon_id
|
|
"""
|
|
data = self.load_or_create()
|
|
data[KEY_LINKED_PRINCIPAL_ID] = principal_id
|
|
|
|
self._write_atomic(data)
|
|
self._data = data
|
|
|
|
@property
|
|
def anon_id(self) -> str:
|
|
"""Get the current anonymous ID.
|
|
|
|
Returns:
|
|
str: The anonymous ID
|
|
"""
|
|
data = self.load_or_create()
|
|
return str(data[KEY_ANON_ID])
|