TLDR;
The philosophy of CLI usage is "fire and forget" and "best effort". You
can opt out by setting `ARCADE_USAGE_TRACKING=0`.
We are capturing two events: `CLI execution succeeded` and `CLI
execution failed`. Reporting to PostHog is a short lived (maximum 10
seconds) subprocess that does not block the main CLI execution process.
`~/.arcade/usage.json` persists two values `anon_id` and
`linked_principal_id`. The logged in status of the CLI user determines
which ID is used. Upon `arcade login`, the `anon_id` is aliased with
`linked_principal_id`. Upon `arcade logout` the `linked_principal_id` is
removed and the `anon_id` is rotated.
## CLI Usage Tracking - How It Works
The usage tracking system implements an identity management and event
tracking pipeline. Here's how the pieces work together:
### **Identity State Management (`usage.json`)**
The system maintains a persistent identity file at
`~/.arcade/usage.json` with this structure:
```json
{
"anon_id": "uuid",
"linked_principal_id": "uuid" | null
}
```
**Key mechanics:**
- **`anon_id`**: Generated once on first CLI use and persists across
sessions. This UUID tracks all anonymous activity.
- **`linked_principal_id`**: Initially `null`. Once the user logs in and
we successfully alias their identity, this field stores their
`principal_id` to indicate this `anon_id` has been linked.
- **Atomic writes**: All updates use a temp file + atomic rename pattern
to prevent corruption from concurrent CLI processes
- **File locking**: Uses `fcntl` (Unix) to coordinate reads/writes
across multiple simultaneous CLI invocations
- **In-memory cache**: The `UsageIdentity` class caches the loaded data
to avoid repeated file I/O within a single CLI invocation
### **Identity Resolution Flow**
When tracking an event, the system determines the `distinct_id` (who to
attribute the event to) via this waterfall:
1. **Check `linked_principal_id`** in `usage.json`
- If present → use it (user was previously aliased)
- This is the fastest path and avoids API calls
2. **Fetch `principal_id` from Arcade Cloud API**
- Makes HTTP request to `/api/v1/auth/validate` with the user's API key
from `~/.arcade/credentials.yaml`
- If authenticated → returns `principal_id`
- Has 2s timeout for responsiveness
3. **Fall back to `anon_id`**
- If not authenticated or API call fails → use anonymous ID
- Marks event with `is_anon=True` flag
### **The Aliasing Lifecycle**
PostHog aliasing links anonymous activity to authenticated users. Here's
the state machine:
#### **Stage 1: Anonymous User**
```
usage.json: { "anon_id": "abc-123", "linked_principal_id": null }
All events → sent with distinct_id="abc-123" and is_anon=True
```
#### **Stage 2: Login Event**
1. User runs `arcade login`
2. Command completes successfully (auth token saved)
3. `CommandTracker` detects successful login
4. Fetches `principal_id` from API
5. Checks `should_alias()` → returns `True` because
`linked_principal_id` is `null`
6. **Calls `alias()` synchronously** (blocking):
```python
posthog.alias(previous_id="abc-123", distinct_id="zyx-321")
```
7. Updates `usage.json`:
```json
{ "anon_id": "abc-123", "linked_principal_id": "zyx-321" }
```
8. PostHog backend merges all events with `distinct_id="abc-123"` into
the user profile for `"zyx-321"`
#### **Stage 3: Authenticated User**
```
usage.json: { "anon_id": "abc-123", "linked_principal_id": "zyx-321" }
All events → sent with distinct_id="zyx-321" and is_anon=False
```
- Events are directly attributed to the authenticated user
- No more API calls needed (uses cached `linked_principal_id`)
#### **Stage 4: Logout Event**
1. User runs `arcade logout`
2. Logout event is sent with the authenticated `distinct_id`
3. `CommandTracker` detects successful logout
4. **Rotates identity** by calling `reset_to_anonymous()`:
```json
{ "anon_id": "xyz-789", "linked_principal_id": null }
```
5. New `anon_id` prevents cross-contamination if another user logs in
### **Critical Constraint: Alias Timing**
PostHog requires that `alias()` is called **BEFORE** any events are sent
with the new `distinct_id`. This is why:
- **`alias()` is synchronous (blocking)**: Guarantees it completes
before the login success event is sent
- **Subsequent events use `linked_principal_id`**: Once aliased, all
future events use the authenticated ID
- **Lazy aliasing**: If a user authenticates via another mechanism (not
through `arcade login`), the system detects this on the next command and
performs aliasing before sending that command's event
### **Event Capture Pipeline**
When `CommandTracker.track_command_execution()` is called:
1. **Resolve identity** → determines `distinct_id` and `is_anon` flag
2. **Build event properties**:
```python
{
"command_name": "toolkit.run",
"cli_version": "1.2.3",
"python_version": "3.11.0",
"os_type": "Darwin",
"os_release": "23.4.0",
"duration": 1250.42, # milliseconds
"error_message": "..." # if failed
}
```
3. **Call `UsageService.capture()`**:
- Serializes event data to JSON
- Spawns detached subprocess: `python -m arcade_cli.usage`
- Passes data via `ARCADE_USAGE_EVENT_DATA` env var
- **Returns immediately** (non-blocking)
4. **Detached subprocess (`__main__.py`)**:
- Runs independently, survives parent CLI exit
- Deserializes event data
- If `is_anon=True`, sets `$process_person_profile=False` (tells PostHog
not to create a full profile)
- Sends event to PostHog with 5s timeout
- Exits (hard exit after 10s max via timeout thread)
### **Concurrency Handling**
Multiple CLI processes can run simultaneously. The system handles this
via:
- **File locking** on `usage.json` (shared lock for reads, exclusive for
writes)
- **Atomic writes** via temp files ensure incomplete writes never
corrupt the file
- **Idempotent aliasing**: `should_alias()` prevents redundant alias
calls
### **Edge Cases Handled**
1. **Side-channel authentication**: User authenticates outside of
`arcade login` (e.g., manually editing credentials)
- Detected via "lazy aliasing" check on every command
- Performs alias if `linked_principal_id` doesn't match current
`principal_id`
2. **API failures during identity fetch**: Falls back to anonymous
tracking
- 2s timeout prevents hanging
- Silent failure doesn't disrupt CLI
3. **PostHog merge restrictions**: Can't alias returning users who
already have a profile
- System stores `linked_principal_id` to avoid retrying impossible
aliases
- New users (never logged in before) get full history stitched
4. **Multiple accounts on same machine**: Logout rotates `anon_id`
- User A's anonymous activity won't leak into User B's profile
### **Privacy & Performance**
- **Opt-out**: `ARCADE_USAGE_TRACKING=0` disables all tracking
- **Non-blocking**: Events never slow down CLI (detached subprocess)
- **Anonymous profiles**: `$process_person_profile=False` for `anon_id`
events minimizes data collection
- **Silent failures**: Network issues or PostHog errors never surface to
users
383 lines
14 KiB
Python
383 lines
14 KiB
Python
import json
|
|
import uuid
|
|
from pathlib import Path
|
|
from unittest.mock import MagicMock, patch
|
|
|
|
import pytest
|
|
import yaml
|
|
from arcade_cli.usage.identity import UsageIdentity
|
|
|
|
|
|
@pytest.fixture
|
|
def temp_config_path(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
|
|
"""Setup temporary config directory for testing."""
|
|
config_dir = tmp_path / ".arcade"
|
|
config_dir.mkdir()
|
|
credentials_file = config_dir / "credentials.yaml"
|
|
|
|
monkeypatch.setattr("arcade_cli.usage.identity.ARCADE_CONFIG_PATH", str(config_dir))
|
|
monkeypatch.setattr("arcade_cli.usage.identity.CREDENTIALS_FILE_PATH", str(credentials_file))
|
|
|
|
return config_dir
|
|
|
|
|
|
@pytest.fixture
|
|
def identity(temp_config_path: Path) -> UsageIdentity:
|
|
"""Create a UsageIdentity instance with temp config path."""
|
|
# NOTE: Although temp_config_path is directly used, it's required to ensure that
|
|
# this fixture depends on the temp_config_path fixture to apply the monkeypatch
|
|
# before creating the UsageIdentity instance
|
|
return UsageIdentity()
|
|
|
|
|
|
class TestLoadOrCreate:
|
|
"""Tests for load_or_create() method."""
|
|
|
|
def test_creates_new_file_when_not_exists(
|
|
self, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that load_or_create creates a new usage.json file when it doesn't exist."""
|
|
data = identity.load_or_create()
|
|
|
|
assert "anon_id" in data
|
|
assert data["linked_principal_id"] is None
|
|
assert uuid.UUID(data["anon_id"]) # Validate UUID format
|
|
|
|
# Verify file was created
|
|
usage_file = temp_config_path / "usage.json"
|
|
assert usage_file.exists()
|
|
|
|
def test_loads_existing_file(self, identity: UsageIdentity, temp_config_path: Path) -> None:
|
|
"""Test that load_or_create loads existing usage.json file."""
|
|
existing_data = {"anon_id": str(uuid.uuid4()), "linked_principal_id": "user-123"}
|
|
usage_file = temp_config_path / "usage.json"
|
|
usage_file.write_text(json.dumps(existing_data))
|
|
|
|
data = identity.load_or_create()
|
|
|
|
assert data["anon_id"] == existing_data["anon_id"]
|
|
assert data["linked_principal_id"] == existing_data["linked_principal_id"]
|
|
|
|
def test_caches_data_after_first_load(self, identity: UsageIdentity) -> None:
|
|
"""Test that load_or_create caches data after first load."""
|
|
first_data = identity.load_or_create()
|
|
second_data = identity.load_or_create()
|
|
|
|
# Should return the same object b/c it's cached
|
|
assert first_data is second_data
|
|
|
|
def test_creates_new_data_on_corrupted_json(
|
|
self, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that load_or_create creates new data if JSON is corrupted."""
|
|
usage_file = temp_config_path / "usage.json"
|
|
usage_file.write_text("{ invalid json }")
|
|
|
|
data = identity.load_or_create()
|
|
|
|
assert "anon_id" in data
|
|
assert data["linked_principal_id"] is None
|
|
|
|
def test_creates_new_data_on_missing_anon_id(
|
|
self, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that load_or_create creates new data if anon_id is missing."""
|
|
usage_file = temp_config_path / "usage.json"
|
|
usage_file.write_text(json.dumps({"some_other_key": "value"}))
|
|
|
|
data = identity.load_or_create()
|
|
|
|
assert "anon_id" in data
|
|
assert data["linked_principal_id"] is None
|
|
|
|
def test_creates_new_data_on_non_dict_json(
|
|
self, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that load_or_create creates new data if JSON is not a dict."""
|
|
usage_file = temp_config_path / "usage.json"
|
|
usage_file.write_text(json.dumps(["not", "a", "dict"]))
|
|
|
|
data = identity.load_or_create()
|
|
|
|
assert "anon_id" in data
|
|
assert isinstance(data, dict)
|
|
|
|
|
|
class TestWriteAtomic:
|
|
"""Tests for _write_atomic() method."""
|
|
|
|
def test_writes_data_to_file(self, identity: UsageIdentity, temp_config_path: Path) -> None:
|
|
"""Test that _write_atomic writes data to usage.json."""
|
|
test_data = {"anon_id": str(uuid.uuid4()), "linked_principal_id": "user-456"}
|
|
|
|
identity._write_atomic(test_data)
|
|
|
|
usage_file = temp_config_path / "usage.json"
|
|
assert usage_file.exists()
|
|
|
|
with usage_file.open() as f:
|
|
loaded_data = json.load(f)
|
|
|
|
assert loaded_data == test_data
|
|
|
|
def test_atomic_write_cleans_up_on_failure(
|
|
self, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that _write_atomic cleans up temp file on failure."""
|
|
with (
|
|
patch(
|
|
"tempfile.mkstemp", return_value=(999, str(temp_config_path / ".usage_temp.tmp"))
|
|
),
|
|
patch("os.fdopen", side_effect=Exception("Write failed")),
|
|
):
|
|
with pytest.raises(Exception, match="Write failed"):
|
|
identity._write_atomic({"anon_id": "test"})
|
|
|
|
# Verify no temp files are left behind
|
|
temp_files = list(temp_config_path.glob(".usage_*.tmp"))
|
|
assert len(temp_files) == 0
|
|
|
|
|
|
class TestGetDistinctId:
|
|
"""Tests for get_distinct_id() method."""
|
|
|
|
def test_returns_linked_principal_id_when_persisted(
|
|
self, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that get_distinct_id returns persisted linked_principal_id."""
|
|
usage_file = temp_config_path / "usage.json"
|
|
usage_file.write_text(
|
|
json.dumps({"anon_id": str(uuid.uuid4()), "linked_principal_id": "persisted-user-123"})
|
|
)
|
|
|
|
distinct_id = identity.get_distinct_id()
|
|
|
|
assert distinct_id == "persisted-user-123"
|
|
|
|
@patch("arcade_cli.usage.identity.UsageIdentity.get_principal_id")
|
|
def test_returns_principal_id_from_api_when_not_persisted(
|
|
self, mock_get_principal: MagicMock, identity: UsageIdentity
|
|
) -> None:
|
|
"""Test that get_distinct_id fetches principal_id from API when not persisted."""
|
|
mock_get_principal.return_value = "api-user-456"
|
|
|
|
distinct_id = identity.get_distinct_id()
|
|
|
|
assert distinct_id == "api-user-456"
|
|
mock_get_principal.assert_called_once()
|
|
|
|
@patch("arcade_cli.usage.identity.UsageIdentity.get_principal_id")
|
|
def test_returns_anon_id_when_not_authenticated(
|
|
self, mock_get_principal: MagicMock, identity: UsageIdentity
|
|
) -> None:
|
|
"""Test that get_distinct_id returns anon_id when not authenticated."""
|
|
mock_get_principal.return_value = None
|
|
|
|
distinct_id = identity.get_distinct_id()
|
|
data = identity.load_or_create()
|
|
|
|
assert distinct_id == data["anon_id"]
|
|
|
|
|
|
class TestGetPrincipalId:
|
|
"""Tests for get_principal_id() method."""
|
|
|
|
def test_returns_none_when_credentials_file_not_exists(self, identity: UsageIdentity) -> None:
|
|
"""Test that get_principal_id returns None when credentials file doesn't exist."""
|
|
principal_id = identity.get_principal_id()
|
|
|
|
assert principal_id is None
|
|
|
|
@patch("httpx.get")
|
|
def test_returns_principal_id_on_successful_api_call(
|
|
self, mock_get: MagicMock, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that get_principal_id returns principal_id from API."""
|
|
# Create credentials file
|
|
credentials_file = temp_config_path / "credentials.yaml"
|
|
credentials_file.write_text(yaml.dump({"cloud": {"api": {"key": "test-api-key"}}}))
|
|
|
|
# Mock API response
|
|
mock_response = MagicMock()
|
|
mock_response.status_code = 200
|
|
mock_response.json.return_value = {"data": {"principal_id": "api-principal-123"}}
|
|
mock_get.return_value = mock_response
|
|
|
|
principal_id = identity.get_principal_id()
|
|
|
|
assert principal_id == "api-principal-123"
|
|
mock_get.assert_called_once_with(
|
|
"https://cloud.arcade.dev/api/v1/auth/validate",
|
|
headers={"accept": "application/json", "Authorization": "Bearer test-api-key"},
|
|
timeout=2.0,
|
|
)
|
|
|
|
@patch("httpx.get")
|
|
def test_returns_none_on_api_failure(
|
|
self, mock_get: MagicMock, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that get_principal_id returns None on API failure."""
|
|
credentials_file = temp_config_path / "credentials.yaml"
|
|
credentials_file.write_text(yaml.dump({"cloud": {"api": {"key": "test-api-key"}}}))
|
|
|
|
mock_get.side_effect = Exception("Network error")
|
|
|
|
principal_id = identity.get_principal_id()
|
|
|
|
assert principal_id is None
|
|
|
|
def test_returns_none_when_api_key_missing(
|
|
self, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that get_principal_id returns None when API key is missing."""
|
|
credentials_file = temp_config_path / "credentials.yaml"
|
|
credentials_file.write_text(yaml.dump({"cloud": {}}))
|
|
|
|
principal_id = identity.get_principal_id()
|
|
|
|
assert principal_id is None
|
|
|
|
@patch("httpx.get")
|
|
def test_returns_none_on_non_200_status(
|
|
self, mock_get: MagicMock, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that get_principal_id returns None on non-200 status code."""
|
|
credentials_file = temp_config_path / "credentials.yaml"
|
|
credentials_file.write_text(yaml.dump({"cloud": {"api": {"key": "test-api-key"}}}))
|
|
|
|
mock_response = MagicMock()
|
|
mock_response.status_code = 401
|
|
mock_get.return_value = mock_response
|
|
|
|
principal_id = identity.get_principal_id()
|
|
|
|
assert principal_id is None
|
|
|
|
|
|
class TestShouldAlias:
|
|
"""Tests for should_alias() method."""
|
|
|
|
@patch("arcade_cli.usage.identity.UsageIdentity.get_principal_id")
|
|
def test_returns_true_when_authenticated_but_not_linked(
|
|
self, mock_get_principal: MagicMock, identity: UsageIdentity
|
|
) -> None:
|
|
"""Test that should_alias returns True when user is authenticated but not yet aliased."""
|
|
mock_get_principal.return_value = "new-principal-id"
|
|
|
|
should_alias = identity.should_alias()
|
|
|
|
assert should_alias is True
|
|
|
|
@patch("arcade_cli.usage.identity.UsageIdentity.get_principal_id")
|
|
def test_returns_false_when_already_linked(
|
|
self, mock_get_principal: MagicMock, identity: UsageIdentity, temp_config_path: Path
|
|
) -> None:
|
|
"""Test that should_alias returns False when principal_id already linked."""
|
|
principal_id = "already-linked-123"
|
|
mock_get_principal.return_value = principal_id
|
|
|
|
usage_file = temp_config_path / "usage.json"
|
|
usage_file.write_text(
|
|
json.dumps({"anon_id": str(uuid.uuid4()), "linked_principal_id": principal_id})
|
|
)
|
|
|
|
should_alias = identity.should_alias()
|
|
|
|
assert should_alias is False
|
|
|
|
@patch("arcade_cli.usage.identity.UsageIdentity.get_principal_id")
|
|
def test_returns_false_when_not_authenticated(
|
|
self, mock_get_principal: MagicMock, identity: UsageIdentity
|
|
) -> None:
|
|
"""Test that should_alias returns False when not authenticated."""
|
|
mock_get_principal.return_value = None
|
|
|
|
should_alias = identity.should_alias()
|
|
|
|
assert should_alias is False
|
|
|
|
|
|
class TestResetToAnonymous:
|
|
"""Tests for reset_to_anonymous() method."""
|
|
|
|
def test_generates_new_anon_id(self, identity: UsageIdentity) -> None:
|
|
"""Test that reset_to_anonymous generates a new anon_id."""
|
|
original_data = identity.load_or_create()
|
|
original_anon_id = original_data["anon_id"]
|
|
|
|
identity.reset_to_anonymous()
|
|
|
|
new_data = identity.load_or_create()
|
|
assert new_data["anon_id"] != original_anon_id
|
|
assert uuid.UUID(new_data["anon_id"]) # Validates UUID format
|
|
|
|
def test_clears_linked_principal_id(self, identity: UsageIdentity) -> None:
|
|
"""Test that reset_to_anonymous clears linked_principal_id."""
|
|
identity.set_linked_principal_id("user-to-clear")
|
|
|
|
identity.reset_to_anonymous()
|
|
|
|
data = identity.load_or_create()
|
|
assert data["linked_principal_id"] is None
|
|
|
|
def test_persists_to_file(self, identity: UsageIdentity, temp_config_path: Path) -> None:
|
|
"""Test that reset_to_anonymous persists changes to file."""
|
|
identity.reset_to_anonymous()
|
|
|
|
usage_file = temp_config_path / "usage.json"
|
|
assert usage_file.exists()
|
|
|
|
with usage_file.open() as f:
|
|
file_data = json.load(f)
|
|
|
|
assert "anon_id" in file_data
|
|
assert file_data["linked_principal_id"] is None
|
|
|
|
|
|
class TestSetLinkedPrincipalId:
|
|
"""Tests for set_linked_principal_id() method."""
|
|
|
|
def test_updates_linked_principal_id(self, identity: UsageIdentity) -> None:
|
|
"""Test that set_linked_principal_id updates the linked_principal_id."""
|
|
identity.load_or_create() # Initialize
|
|
|
|
identity.set_linked_principal_id("new-user-789")
|
|
|
|
data = identity.load_or_create()
|
|
assert data["linked_principal_id"] == "new-user-789"
|
|
|
|
def test_persists_to_file(self, identity: UsageIdentity, temp_config_path: Path) -> None:
|
|
"""Test that set_linked_principal_id persists changes to file."""
|
|
identity.set_linked_principal_id("persisted-user-999")
|
|
|
|
usage_file = temp_config_path / "usage.json"
|
|
with usage_file.open() as f:
|
|
file_data = json.load(f)
|
|
|
|
assert file_data["linked_principal_id"] == "persisted-user-999"
|
|
|
|
def test_updates_cache(self, identity: UsageIdentity) -> None:
|
|
"""Test that set_linked_principal_id updates the cached data."""
|
|
identity.load_or_create()
|
|
|
|
identity.set_linked_principal_id("cached-user-111")
|
|
|
|
# Access _data directly to verify cache updated
|
|
assert identity._data is not None
|
|
assert identity._data["linked_principal_id"] == "cached-user-111"
|
|
|
|
|
|
class TestAnonIdProperty:
|
|
"""Tests for anon_id property."""
|
|
|
|
def test_returns_anon_id(self, identity: UsageIdentity) -> None:
|
|
"""Test that anon_id property returns the anon_id."""
|
|
data = identity.load_or_create()
|
|
|
|
assert identity.anon_id == data["anon_id"]
|
|
|
|
def test_returns_valid_uuid(self, identity: UsageIdentity) -> None:
|
|
"""Test that anon_id property returns a valid UUID."""
|
|
anon_id = identity.anon_id
|
|
|
|
assert uuid.UUID(anon_id) # Validate UUID format
|