francylisboacharuto cb66ad17cc feat: Add skill staleness detection (review tracking, dependency health, schema drift)

Skills go stale as APIs change and data sources move. This adds a three-layer
staleness detection system: review date tracking with git fallback, HTTP health
checks for declared dependencies, and top-level key comparison for schema drift.
All new frontmatter fields are optional — existing skills work unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-27 09:09:48 -03:00

29 KiB

Raw Permalink Blame History

Mandatory Quality Standards

Fundamental Principles

Production-Ready, Not Prototype

Code must work without modifications
Doesn't need "now implement X"
Can be used immediately

Functional, Not Placeholder

Complete code in all functions
No TODO, pass, NotImplementedError
Robust error handling

Useful, Not Generic

Specific and detailed content
Concrete examples, not abstract
Not just external links

Current, Not Stale

Include metadata.created and metadata.last_reviewed dates in frontmatter
Set metadata.review_interval_days (default: 90 days)
Declare external dependencies in metadata.dependencies so health can be checked
Declare expected API response shapes in metadata.schema_expectations for drift detection
Run python3 scripts/staleness_check.py path/to/skill/ periodically to detect stale skills
When publishing to a registry, use python3 scripts/skill_registry.py stale to audit all skills

Standards by File Type

Python Scripts

✅ MANDATORY

1. Complete structure:

#!/usr/bin/env python3
"""Module docstring"""

# Imports
import ...

# Constants
CONST = value

# Classes/Functions
class/def ...

# Main
def main():
    ...

if __name__ == "__main__":
    main()

2. Docstrings:

Module docstring: 3-5 lines
Class docstring: Description + Example
Method docstring: Args, Returns, Raises, Example

3. Type hints:

def function(param1: str, param2: int = 10) -> Dict[str, Any]:
    ...

4. Error handling:

try:
    result = risky_operation()
except SpecificError as e:
    # Handle specifically
    log_error(e)
    raise CustomError(f"Context: {e}")

5. Validations:

def process(data: Dict) -> pd.DataFrame:
    # Validate input
    if not data:
        raise ValueError("Data cannot be empty")

    if 'required_field' not in data:
        raise ValueError("Missing required field")

    # Process
    ...

    # Validate output
    assert len(result) > 0, "Result cannot be empty"
    assert result['value'].notna().all(), "No null values allowed"

    return result

6. Appropriate logging:

import logging

logger = logging.getLogger(__name__)

def fetch_data():
    logger.info("Fetching data from API...")
    # ...
    logger.debug(f"Received {len(data)} records")
    # ...
    logger.error(f"API error: {e}")

❌ FORBIDDEN

# ❌ DON'T DO THIS:

def analyze():
    # TODO: implement analysis
    pass

def process(data):  # ❌ No type hints
    # ❌ No docstring
    result = data  # ❌ No real logic
    return result  # ❌ No validation

def fetch_api(url):
    response = requests.get(url)  # ❌ No timeout
    return response.json()  # ❌ No error handling

✅ DO THIS:

def analyze_yoy(df: pd.DataFrame, commodity: str, year1: int, year2: int) -> Dict:
    """
    Perform year-over-year analysis

    Args:
        df: DataFrame with parsed data
        commodity: Commodity name (e.g., "CORN")
        year1: Current year
        year2: Previous year

    Returns:
        Dict with keys:
            - production_current: float
            - production_previous: float
            - change_percent: float
            - interpretation: str

    Raises:
        ValueError: If data not found for specified years
        DataQualityError: If data fails validation

    Example:
        >>> analyze_yoy(df, "CORN", 2023, 2022)
        {'production_current': 15.3, 'change_percent': 11.7, ...}
    """
    # Validate inputs
    if commodity not in df['commodity'].unique():
        raise ValueError(f"Commodity {commodity} not found in data")

    # Filter data
    df1 = df[(df['commodity'] == commodity) & (df['year'] == year1)]
    df2 = df[(df['commodity'] == commodity) & (df['year'] == year2)]

    if len(df1) == 0 or len(df2) == 0:
        raise ValueError(f"Data not found for {commodity} in {year1} or {year2}")

    # Extract values
    prod1 = df1['production'].iloc[0]
    prod2 = df2['production'].iloc[0]

    # Calculate
    change = prod1 - prod2
    change_pct = (change / prod2) * 100

    # Interpret
    if abs(change_pct) < 2:
        interpretation = "stable"
    elif change_pct > 10:
        interpretation = "significant_increase"
    elif change_pct > 2:
        interpretation = "moderate_increase"
    elif change_pct < -10:
        interpretation = "significant_decrease"
    else:
        interpretation = "moderate_decrease"

    # Return
    return {
        "commodity": commodity,
        "production_current": round(prod1, 1),
        "production_previous": round(prod2, 1),
        "change_absolute": round(change, 1),
        "change_percent": round(change_pct, 1),
        "interpretation": interpretation
    }

SKILL.md

✅ MANDATORY

1. Valid frontmatter:

---
name: agent-name
description: [150-250 words with keywords]
---

2. Size: 5000-7000 words

3. Mandatory sections:

When to use (specific triggers)
Data source (detailed API)
Workflows (complete step-by-step)
Scripts (each one explained)
Analyses (methodologies)
Errors (complete handling)
Validations (mandatory)
Keywords (complete list)
Examples (5+ complete)

4. Detailed workflows:

✅ GOOD:

### Workflow: YoY Comparison

1. **Identify question parameters**
   - Commodity: [extract from question]
   - Years: Current vs previous (or specified)

2. **Fetch data**
   ```bash
   python scripts/fetch_nass.py \
     --commodity CORN \
     --years 2023,2022 \
     --output data/raw/corn_2023_2022.json

Parse

python scripts/parse_nass.py \
  --input data/raw/corn_2023_2022.json \
  --output data/processed/corn.csv

Analyze

python scripts/analyze_nass.py \
  --input data/processed/corn.csv \
  --analysis yoy \
  --commodity CORN \
  --year1 2023 \
  --year2 2022 \
  --output data/analysis/corn_yoy.json

Interpret results

File data/analysis/corn_yoy.json contains:

{
  "production_current": 15.3,
  "change_percent": 11.7,
  "interpretation": "significant_increase"
}

Respond to user: "Corn production grew 11.7% in 2023..."


❌ **BAD**:
```markdown
### Workflow: Comparison

1. Get data
2. Compare
3. Return result

5. Complete examples:

✅ GOOD:

### Example 1: YoY Comparison

**Question**: "How's corn production compared to last year?"

**Executed flow**:
[Specific commands with outputs]

**Generated answer**:
"Corn production in 2023 is 15.3 billion bushels,
growth of 11.7% vs 2022 (13.7 billion). Growth
comes mainly from area increase (+8%) with stable yield."

❌ BAD:

### Example: Comparison

User asks about comparison. Agent compares and responds.

❌ FORBIDDEN

Empty sections
"See documentation"
Workflows without specific commands
Generic examples

References

✅ MANDATORY

1. Useful and self-contained content:

✅ GOOD (references/api-guide.md):

## Endpoint: Get Production Data

**URL**: `GET https://quickstats.nass.usda.gov/api/api_GET/`

**Parameters**:
- `commodity_desc`: Commodity name
  - Example: "CORN", "SOYBEANS"
  - Case-sensitive
- `year`: Desired year
  - Example: 2023
  - Range: 1866-present

**Complete request example**:
```bash
curl -H "X-Api-Key: YOUR_KEY" \
  "https://quickstats.nass.usda.gov/api/api_GET/?commodity_desc=CORN&year=2023&format=JSON"

Expected response:

{
  "data": [
    {
      "year": 2023,
      "commodity_desc": "CORN",
      "value": "15,300,000,000",
      "unit_desc": "BU"
    }
  ]
}

Important fields:

value: Comes as STRING with commas
- Solution: value.replace(',', '')
- Convert to float after


❌ **BAD**:
```markdown
## API Endpoint

For details on how to use the API, consult the official documentation at:
https://quickstats.nass.usda.gov/api

[End of file]

2. Adequate size:

API guide: 1500-2000 words
Analysis methods: 2000-3000 words
Troubleshooting: 1000-1500 words

3. Concrete examples:

Always include examples with real values
Executable code blocks
Expected outputs

❌ FORBIDDEN

"For more information, see [link]"
Sections with only 2-3 lines
Lists without details
Circular references ("see other doc that sees other doc")

Assets (Configs)

✅ MANDATORY

1. Syntactically valid JSON:

# ALWAYS validate:
python -c "import json; json.load(open('config.json'))"

2. Real values:

✅ GOOD:

{
  "api": {
    "base_url": "https://quickstats.nass.usda.gov/api",
    "api_key_env": "NASS_API_KEY",
    "_instructions": "Get free API key from: https://quickstats.nass.usda.gov/api#registration",
    "rate_limit_per_day": 1000,
    "timeout_seconds": 30
  }
}

❌ BAD:

{
  "api": {
    "base_url": "YOUR_API_URL_HERE",
    "api_key": "YOUR_KEY_HERE"
  }
}

3. Inline comments (using _comment or _note):

{
  "_comment": "Differentiated TTL by data type",
  "cache": {
    "ttl_historical_days": 365,
    "_note_historical": "Historical data doesn't change",
    "ttl_current_days": 7,
    "_note_current": "Current year data may be revised"
  }
}

README.md

✅ MANDATORY

1. Complete installation instructions:

✅ GOOD:

## Installation

### 1. Get API Key (Free)

1. Access https://quickstats.nass.usda.gov/api#registration
2. Fill form:
   - Name: [your name]
   - Email: [your email]
   - Purpose: "Personal research"
3. Click "Submit"
4. You'll receive email with API key in ~1 minute
5. Key format: `A1B2C3D4-E5F6-G7H8-I9J0-K1L2M3N4O5P6`

### 2. Configure Environment

**Option A - Export** (temporary):
```bash
export NASS_API_KEY="your_key_here"

Option B - .bashrc/.zshrc (permanent):

echo 'export NASS_API_KEY="your_key_here"' >> ~/.bashrc
source ~/.bashrc

Option C - .env file (per project):

echo "NASS_API_KEY=your_key_here" > .env

3. Install Dependencies

cd nass-usda-agriculture
pip install -r requirements.txt

Requirements:

requests
pandas
numpy


❌ **BAD**:
```markdown
## Installation

1. Get API key from the official website
2. Configure environment
3. Install dependencies
4. Done!

2. Concrete usage examples:

✅ GOOD:

## Examples

### Example 1: Current Production

You: "What's US corn production in 2023?"

Claude: "Corn production in 2023 was 15.3 billion bushels (389 million metric tons)..."


### Example 2: YoY Comparison

You: "Compare soybeans this year vs last year"

Claude: "Soybean production in 2023 is 2.6% below 2022:

2023: 4.165 billion bushels
2022: 4.276 billion bushels
Drop from area (-4.5%), yield improved (+0.8%)"


[3-5 more examples]

❌ BAD:

## Usage

Ask questions about agriculture and the agent will respond.

3. Specific troubleshooting:

✅ GOOD:

### Error: "NASS_API_KEY environment variable not found"

**Cause**: API key not configured

**Step-by-step solution**:
1. Verify key was obtained: https://...
2. Configure environment:
   ```bash
   export NASS_API_KEY="your_key_here"

Verify:
```
echo $NASS_API_KEY
```
Should show your key
If doesn't work, restart terminal

Still not working?

Check for extra spaces in key
Verify key hasn't expired (validity: 1 year)
Re-generate key if needed


---

## Quality Checklist

### Per Python Script

- [ ] Shebang: `#!/usr/bin/env python3`
- [ ] Module docstring (3-5 lines)
- [ ] Organized imports (stdlib, 3rd party, local)
- [ ] Constants at top (if applicable)
- [ ] Type hints in all public functions
- [ ] Docstrings in classes (description + attributes + example)
- [ ] Docstrings in methods (Args, Returns, Raises, Example)
- [ ] Error handling for risky operations
- [ ] Input validations
- [ ] Output validations
- [ ] Appropriate logging
- [ ] Main function with argparse
- [ ] if __name__ == "__main__"
- [ ] Functional code (no TODO/pass)
- [ ] Valid syntax (test: `python -m py_compile script.py`)

### Per SKILL.md

- [ ] Frontmatter with name and description
- [ ] Description 150-250 characters with keywords
- [ ] Size 5000+ words
- [ ] "When to Use" section with specific triggers
- [ ] "Data Source" section detailed
- [ ] Step-by-step workflows with commands
- [ ] Scripts explained individually
- [ ] Analyses documented (objective, methodology)
- [ ] Errors handled (all expected)
- [ ] Validations listed
- [ ] Performance/cache explained
- [ ] Complete keywords
- [ ] Complete examples (5+)

### Per Reference File

- [ ] 1000+ words
- [ ] Useful content (not just links)
- [ ] Concrete examples with real values
- [ ] Executable code blocks
- [ ] Well structured (headings, lists)
- [ ] No empty sections
- [ ] No "TODO: write"

### Per Asset (Config)

- [ ] Syntactically valid JSON (validate!)
- [ ] Real values (not "YOUR_X_HERE" without context)
- [ ] Inline comments (_comment, _note)
- [ ] Instructions for values user must fill
- [ ] Logical and organized structure

### Per README.md

- [ ] Step-by-step installation
- [ ] How to get API key (detailed)
- [ ] How to configure (3 options)
- [ ] How to install dependencies
- [ ] How to install in Claude Code
- [ ] Usage examples (5+)
- [ ] Troubleshooting (10+ problems)
- [ ] License
- [ ] Contact/contribution (if applicable)

### Complete Agent

- [ ] DECISIONS.md documents all choices
- [ ] **VERSION** file created (e.g. 1.0.0)
- [ ] **CHANGELOG.md** created with complete v1.0.0 entry
- [ ] **INSTALACAO.md** with complete didactic tutorial
- [ ] **comprehensive_{domain}_report()** implemented
- [ ] SKILL.md with version in frontmatter metadata
- [ ] 18+ files created
- [ ] ~1500+ lines of Python code
- [ ] ~10,000+ words of documentation
- [ ] 2+ configs
- [ ] requirements.txt
- [ ] .gitignore (if needed)
- [ ] No placeholder/TODO
- [ ] Valid syntax (Python, JSON, YAML)
- [ ] Ready to use (production-ready)

---

## Quality Examples

### Example: Error Handling

❌ **BAD**:
```python
def fetch(url):
    return requests.get(url).json()

✅ GOOD:

def fetch(url: str, timeout: int = 30) -> Dict:
    """
    Fetch data from URL with error handling

    Args:
        url: URL to fetch
        timeout: Timeout in seconds

    Returns:
        JSON response as dict

    Raises:
        NetworkError: If connection fails
        TimeoutError: If request times out
        APIError: If API returns error
    """
    try:
        response = requests.get(url, timeout=timeout)
        response.raise_for_status()

        data = response.json()

        if 'error' in data:
            raise APIError(f"API error: {data['error']}")

        return data

    except requests.Timeout:
        raise TimeoutError(f"Request timed out after {timeout}s")

    except requests.ConnectionError as e:
        raise NetworkError(f"Connection failed: {e}")

    except requests.HTTPError as e:
        if e.response.status_code == 429:
            raise RateLimitError("Rate limit exceeded")
        else:
            raise APIError(f"HTTP {e.response.status_code}: {e}")

Example: Validations

❌ BAD:

def parse(data):
    df = pd.DataFrame(data)
    return df

✅ GOOD:

def parse(data: List[Dict]) -> pd.DataFrame:
    """Parse and validate data"""

    # Validate input
    if not data:
        raise ValueError("Data cannot be empty")

    if not isinstance(data, list):
        raise TypeError(f"Expected list, got {type(data)}")

    # Parse
    df = pd.DataFrame(data)

    # Validate schema
    required_cols = ['year', 'commodity', 'value']
    missing = set(required_cols) - set(df.columns)
    if missing:
        raise ValueError(f"Missing required columns: {missing}")

    # Validate types
    df['year'] = pd.to_numeric(df['year'], errors='raise')
    df['value'] = pd.to_numeric(df['value'], errors='raise')

    # Validate ranges
    current_year = datetime.now().year
    if (df['year'] > current_year).any():
        raise ValueError(f"Future years found (max allowed: {current_year})")

    if (df['value'] < 0).any():
        raise ValueError("Negative values found")

    # Validate no duplicates
    if df.duplicated(subset=['year', 'commodity']).any():
        raise ValueError("Duplicate records found")

    return df

Example: Docstrings

❌ BAD:

def analyze(df, commodity):
    """Analyze data"""
    # ...

✅ GOOD:

def analyze_yoy(
    df: pd.DataFrame,
    commodity: str,
    year1: int,
    year2: int
) -> Dict[str, Any]:
    """
    Perform year-over-year comparison analysis

    Compares production, area, and yield between two years
    and decomposes growth into area vs yield contributions.

    Args:
        df: DataFrame with columns ['year', 'commodity', 'production', 'area', 'yield']
        commodity: Commodity name (e.g., "CORN", "SOYBEANS")
        year1: Current year to compare
        year2: Previous year to compare against

    Returns:
        Dict containing:
            - production_current (float): Production in year1 (million units)
            - production_previous (float): Production in year2
            - change_absolute (float): Absolute change
            - change_percent (float): Percent change
            - decomposition (dict): Area vs yield contribution
            - interpretation (str): "increase", "decrease", or "stable"

    Raises:
        ValueError: If commodity not found in data
        ValueError: If either year not found in data
        DataQualityError: If production != area * yield (tolerance > 1%)

    Example:
        >>> df = pd.DataFrame([
        ...     {'year': 2023, 'commodity': 'CORN', 'production': 15.3, 'area': 94.6, 'yield': 177},
        ...     {'year': 2022, 'commodity': 'CORN', 'production': 13.7, 'area': 89.2, 'yield': 173}
        ... ])
        >>> result = analyze_yoy(df, "CORN", 2023, 2022)
        >>> result['change_percent']
        11.7
    """
    # [Complete implementation]

Dependency Management

Decision Framework

Skills should minimize external dependencies. Every dependency is a maintenance burden, a security surface, and a compatibility risk. Use this decision tree:

Can stdlib do it?
  → Yes: Use stdlib. Done.
  → No: Is there a lightweight pure-Python package (<1MB)?
      → Yes: Use it. Add to requirements.txt.
      → No: Is there a well-maintained popular package?
          → Yes: Use it only if the domain requires it.
          → No: Implement it yourself or redesign the approach.

Stdlib vs. Third-Party Decision Table

Task	Stdlib Solution	When to Use Third-Party
HTTP requests	`urllib.request`	Use `requests` when: complex auth, session management, multipart uploads, or retry logic would require 100+ lines of urllib code
JSON handling	`json`	Never — stdlib is sufficient
CSV parsing	`csv`	Use `pandas` only when: statistical analysis, complex transformations, or DataFrame operations are core to the skill
File paths	`pathlib`	Never — stdlib is sufficient
Date/time	`datetime`	Never — stdlib is sufficient
Regex	`re`	Never — stdlib is sufficient
Hashing	`hashlib`	Never — stdlib is sufficient
Caching	File-based (json + pathlib)	Never for skills — the FileCache pattern in architecture-guide.md is sufficient
Data analysis	Manual calculations	Use `pandas`/`numpy` when: skill is primarily analytical (10+ statistical operations, pivots, aggregations)
PDF generation	Not available	Use `reportlab` or `fpdf2` when PDF output is a core requirement
Web scraping	`urllib` + `html.parser`	Use `beautifulsoup4` when parsing complex/malformed HTML
CLI arguments	`argparse`	Never — stdlib is sufficient
YAML parsing	Manual (the `_parse_frontmatter` pattern)	Use `pyyaml` only if skill needs to parse arbitrary YAML files (not just SKILL.md frontmatter)

requirements.txt Rules

When third-party packages are needed:

# requirements.txt

# Pin major.minor, allow patch updates
requests>=2.31,<3.0
pandas>=2.0,<3.0

# For stdlib-only skills, create an empty requirements.txt with a comment:
# No external dependencies required — this skill uses Python stdlib only.

Rules:

Always create requirements.txt even if empty (document the stdlib-only decision)
Pin major.minor version to avoid breaking changes
Never pin exact patch versions (allows security updates)
Never include dev dependencies (pytest, ruff) — those are for contributors, not users
List only direct dependencies, not transitive ones
Include a comment explaining why each package is needed

Common Dependency Patterns by Skill Type

Skill Type	Typical Dependencies
Data analysis (stocks, agriculture, climate)	`requests`, `pandas`, `numpy`
Report generation	`requests`, `fpdf2` or `reportlab`
Web scraping	`requests`, `beautifulsoup4`
API wrapper	`requests` (or stdlib `urllib`)
Text processing	Stdlib only (`re`, `json`, `csv`)
File format conversion	Stdlib only (or single specialized package)
Database interaction	Stdlib `sqlite3` (or `psycopg2`/`pymysql` for specific DBs)

Testing Strategy

Why Test Generated Skills

Skills are opinionated software that teams rely on daily. A skill that produces wrong calculations, misparses API responses, or silently drops data is worse than no skill at all. Tests catch these issues before the skill reaches users.

What to Test

Focus tests on the parts most likely to break or produce wrong results:

Priority	What to Test	Why
High	Analysis/calculation functions	Wrong math = wrong decisions
High	Data parsing (API response → structured data)	APIs change formats, edge cases in real data
High	Input validation	Bad input should fail clearly, not silently produce garbage
Medium	Output formatting	Reports and summaries should be consistent
Medium	Error handling paths	Verify graceful degradation on API failures, missing data
Low	Cache logic	Only if custom caching is complex
Low	Config loading	Usually trivial

Test Directory Structure

skill-name/
├── scripts/
│   ├── analyze.py
│   ├── fetch.py
│   └── parse.py
├── tests/
│   ├── test_analyze.py       # Unit tests for analysis functions
│   ├── test_parse.py         # Unit tests for parsing logic
│   ├── fixtures/
│   │   ├── sample_api_response.json    # Real API response (anonymized)
│   │   └── sample_parsed_data.csv      # Expected parsed output
│   └── conftest.py           # Shared pytest fixtures

Test Patterns

Pattern 1: Test analysis functions with known inputs/outputs

"""Tests for analyze.py — core calculation functions."""
import pytest
from scripts.analyze import analyze_yoy, calculate_trend

def test_yoy_increase():
    """YoY comparison should detect an increase."""
    result = analyze_yoy(
        current_value=150.0,
        previous_value=100.0,
    )
    assert result["change_percent"] == pytest.approx(50.0)
    assert result["interpretation"] == "significant_increase"

def test_yoy_stable():
    """Changes under 2% should be interpreted as stable."""
    result = analyze_yoy(current_value=101.0, previous_value=100.0)
    assert result["interpretation"] == "stable"

def test_yoy_zero_previous():
    """Division by zero should raise ValueError, not crash."""
    with pytest.raises(ValueError, match="previous value cannot be zero"):
        analyze_yoy(current_value=100.0, previous_value=0.0)

Pattern 2: Test parsing with fixture data

"""Tests for parse.py — API response parsing."""
import json
from pathlib import Path
from scripts.parse import parse_api_response

FIXTURES = Path(__file__).parent / "fixtures"

def test_parse_normal_response():
    """Standard API response should parse to expected structure."""
    raw = json.loads((FIXTURES / "sample_api_response.json").read_text())
    result = parse_api_response(raw)
    assert len(result) > 0
    assert "year" in result[0]
    assert "value" in result[0]

def test_parse_empty_response():
    """Empty API response should return empty list, not crash."""
    result = parse_api_response({"data": []})
    assert result == []

def test_parse_malformed_values():
    """Values with commas (e.g., '15,300,000') should be cleaned."""
    raw = {"data": [{"value": "15,300,000", "year": "2023"}]}
    result = parse_api_response(raw)
    assert result[0]["value"] == 15300000.0

Pattern 3: Mock external API calls

"""Tests for fetch.py — API interaction (mocked)."""
from unittest.mock import patch, MagicMock
from scripts.fetch import fetch_data

@patch("scripts.fetch.urllib.request.urlopen")
def test_fetch_success(mock_urlopen):
    """Successful API call should return parsed JSON."""
    mock_response = MagicMock()
    mock_response.read.return_value = b'{"data": [{"value": "100"}]}'
    mock_response.__enter__ = lambda s: s
    mock_response.__exit__ = MagicMock(return_value=False)
    mock_urlopen.return_value = mock_response

    result = fetch_data(commodity="CORN", year=2023)
    assert "data" in result

@patch("scripts.fetch.urllib.request.urlopen")
def test_fetch_rate_limited(mock_urlopen):
    """429 response should raise RateLimitError."""
    from urllib.error import HTTPError
    mock_urlopen.side_effect = HTTPError(
        url="", code=429, msg="Too Many Requests", hdrs={}, fp=None
    )
    with pytest.raises(Exception, match="[Rr]ate limit"):
        fetch_data(commodity="CORN", year=2023)

Running Tests

# Run all tests
cd skill-name/
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=scripts --cov-report=term-missing

When to Generate Tests

Tests are generated during Phase 5 (Implementation) after the scripts are written:

Write all scripts first (steps 1-5 of Phase 5)
Create tests/ directory with test files for core functions
Create tests/fixtures/ with sample data
Run tests to verify they pass
Include test instructions in README.md

Note: Tests are recommended but not mandatory for v1.0 of a skill. The validation and security scan gates are always mandatory. Tests become critical when:

The skill performs financial calculations (wrong math = real cost)
The skill processes sensitive data (parsing errors = data loss)
Multiple people will maintain the skill (tests prevent regressions)
The skill is being published to the team registry (quality expectation is higher)

Anti-Patterns

Anti-Pattern 1: Partial Implementation

❌ NO:

def yoy_comparison(df, commodity, year1, year2):
    # Implement YoY comparison
    pass

def state_ranking(df, commodity):
    # TODO: implement ranking
    raise NotImplementedError()

✅ YES:

# [Complete and functional code for BOTH functions]

Anti-Pattern 2: Empty References

❌ NO:

# Analysis Methods

## YoY Comparison

This method compares two years.

## Ranking

This method ranks states.

✅ YES:

# Analysis Methods

## YoY Comparison

### Objective
Compare metrics between current and previous year...

### Detailed Methodology

**Formulas**:

Δ X = X(t) - X(t-1) Δ X% = (Δ X / X(t-1)) × 100


**Decomposition** (for production):
[Complete mathematics]

**Interpretation**:
- |Δ| < 2%: Stable
- Δ > 10%: Significant increase
[...]

### Validations
[List]

### Complete Numerical Example
[With real values]

Anti-Pattern 3: Useless Configs

❌ NO:

{
  "api_url": "INSERT_URL",
  "api_key": "INSERT_KEY"
}

✅ YES:

{
  "_comment": "Configuration for NASS USDA Agent",
  "api": {
    "base_url": "https://quickstats.nass.usda.gov/api",
    "_note": "This is the official USDA NASS API base URL",
    "api_key_env": "NASS_API_KEY",
    "_key_instructions": "Get free API key from: https://quickstats.nass.usda.gov/api#registration"
  }
}

Final Validation

Before delivering to user, verify:

Sanity Test

# 1. Python syntax
find scripts -name "*.py" -exec python -m py_compile {} \;

# 2. JSON syntax
python -c "import json; json.load(open('assets/config.json'))"

# 3. Imports make sense
grep -r "^import\|^from" scripts/*.py | sort | uniq
# Verify all libs are: stdlib, requests, pandas, numpy
# No imports of uninstalled libs

# 4. SKILL.md has frontmatter
head -5 SKILL.md | grep "^---$"

# 5. SKILL.md size
wc -w SKILL.md
# Should be > 5000 words

Final Checklist

Syntax check passed (Python, JSON)
No import of non-existent lib
No TODO or pass
SKILL.md > 5000 words
References with content
README with complete instructions
DECISIONS.md created
requirements.txt created

29 KiB Raw Permalink Blame History Unescape Escape

Mandatory Quality Standards

Fundamental Principles

Standards by File Type

Python Scripts

✅ MANDATORY

❌ FORBIDDEN

✅ DO THIS:

SKILL.md

✅ MANDATORY

❌ FORBIDDEN

References

✅ MANDATORY

❌ FORBIDDEN

Assets (Configs)

✅ MANDATORY

README.md

✅ MANDATORY

3. Install Dependencies

Example: Validations

Example: Docstrings

Dependency Management

Decision Framework

Stdlib vs. Third-Party Decision Table

requirements.txt Rules

Common Dependency Patterns by Skill Type

Testing Strategy

Why Test Generated Skills

What to Test

Test Directory Structure

Test Patterns

Running Tests

When to Generate Tests

Anti-Patterns

Anti-Pattern 1: Partial Implementation

Anti-Pattern 2: Empty References

Anti-Pattern 3: Useless Configs

Final Validation

Sanity Test

Final Checklist

29 KiB

Raw Permalink Blame History