agent-skill-creator/references/quality-standards.md
francylisboacharuto cb66ad17cc feat: Add skill staleness detection (review tracking, dependency health, schema drift)
Skills go stale as APIs change and data sources move. This adds a three-layer
staleness detection system: review date tracking with git fallback, HTTP health
checks for declared dependencies, and top-level key comparison for schema drift.
All new frontmatter fields are optional — existing skills work unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 09:09:48 -03:00

29 KiB
Raw Permalink Blame History

Mandatory Quality Standards

Fundamental Principles

Production-Ready, Not Prototype

  • Code must work without modifications
  • Doesn't need "now implement X"
  • Can be used immediately

Functional, Not Placeholder

  • Complete code in all functions
  • No TODO, pass, NotImplementedError
  • Robust error handling

Useful, Not Generic

  • Specific and detailed content
  • Concrete examples, not abstract
  • Not just external links

Current, Not Stale

  • Include metadata.created and metadata.last_reviewed dates in frontmatter
  • Set metadata.review_interval_days (default: 90 days)
  • Declare external dependencies in metadata.dependencies so health can be checked
  • Declare expected API response shapes in metadata.schema_expectations for drift detection
  • Run python3 scripts/staleness_check.py path/to/skill/ periodically to detect stale skills
  • When publishing to a registry, use python3 scripts/skill_registry.py stale to audit all skills

Standards by File Type

Python Scripts

MANDATORY

1. Complete structure:

#!/usr/bin/env python3
"""Module docstring"""

# Imports
import ...

# Constants
CONST = value

# Classes/Functions
class/def ...

# Main
def main():
    ...

if __name__ == "__main__":
    main()

2. Docstrings:

  • Module docstring: 3-5 lines
  • Class docstring: Description + Example
  • Method docstring: Args, Returns, Raises, Example

3. Type hints:

def function(param1: str, param2: int = 10) -> Dict[str, Any]:
    ...

4. Error handling:

try:
    result = risky_operation()
except SpecificError as e:
    # Handle specifically
    log_error(e)
    raise CustomError(f"Context: {e}")

5. Validations:

def process(data: Dict) -> pd.DataFrame:
    # Validate input
    if not data:
        raise ValueError("Data cannot be empty")

    if 'required_field' not in data:
        raise ValueError("Missing required field")

    # Process
    ...

    # Validate output
    assert len(result) > 0, "Result cannot be empty"
    assert result['value'].notna().all(), "No null values allowed"

    return result

6. Appropriate logging:

import logging

logger = logging.getLogger(__name__)

def fetch_data():
    logger.info("Fetching data from API...")
    # ...
    logger.debug(f"Received {len(data)} records")
    # ...
    logger.error(f"API error: {e}")

FORBIDDEN

# ❌ DON'T DO THIS:

def analyze():
    # TODO: implement analysis
    pass

def process(data):  # ❌ No type hints
    # ❌ No docstring
    result = data  # ❌ No real logic
    return result  # ❌ No validation

def fetch_api(url):
    response = requests.get(url)  # ❌ No timeout
    return response.json()  # ❌ No error handling

DO THIS:

def analyze_yoy(df: pd.DataFrame, commodity: str, year1: int, year2: int) -> Dict:
    """
    Perform year-over-year analysis

    Args:
        df: DataFrame with parsed data
        commodity: Commodity name (e.g., "CORN")
        year1: Current year
        year2: Previous year

    Returns:
        Dict with keys:
            - production_current: float
            - production_previous: float
            - change_percent: float
            - interpretation: str

    Raises:
        ValueError: If data not found for specified years
        DataQualityError: If data fails validation

    Example:
        >>> analyze_yoy(df, "CORN", 2023, 2022)
        {'production_current': 15.3, 'change_percent': 11.7, ...}
    """
    # Validate inputs
    if commodity not in df['commodity'].unique():
        raise ValueError(f"Commodity {commodity} not found in data")

    # Filter data
    df1 = df[(df['commodity'] == commodity) & (df['year'] == year1)]
    df2 = df[(df['commodity'] == commodity) & (df['year'] == year2)]

    if len(df1) == 0 or len(df2) == 0:
        raise ValueError(f"Data not found for {commodity} in {year1} or {year2}")

    # Extract values
    prod1 = df1['production'].iloc[0]
    prod2 = df2['production'].iloc[0]

    # Calculate
    change = prod1 - prod2
    change_pct = (change / prod2) * 100

    # Interpret
    if abs(change_pct) < 2:
        interpretation = "stable"
    elif change_pct > 10:
        interpretation = "significant_increase"
    elif change_pct > 2:
        interpretation = "moderate_increase"
    elif change_pct < -10:
        interpretation = "significant_decrease"
    else:
        interpretation = "moderate_decrease"

    # Return
    return {
        "commodity": commodity,
        "production_current": round(prod1, 1),
        "production_previous": round(prod2, 1),
        "change_absolute": round(change, 1),
        "change_percent": round(change_pct, 1),
        "interpretation": interpretation
    }

SKILL.md

MANDATORY

1. Valid frontmatter:

---
name: agent-name
description: [150-250 words with keywords]
---

2. Size: 5000-7000 words

3. Mandatory sections:

  • When to use (specific triggers)
  • Data source (detailed API)
  • Workflows (complete step-by-step)
  • Scripts (each one explained)
  • Analyses (methodologies)
  • Errors (complete handling)
  • Validations (mandatory)
  • Keywords (complete list)
  • Examples (5+ complete)

4. Detailed workflows:

GOOD:

### Workflow: YoY Comparison

1. **Identify question parameters**
   - Commodity: [extract from question]
   - Years: Current vs previous (or specified)

2. **Fetch data**
   ```bash
   python scripts/fetch_nass.py \
     --commodity CORN \
     --years 2023,2022 \
     --output data/raw/corn_2023_2022.json
  1. Parse

    python scripts/parse_nass.py \
      --input data/raw/corn_2023_2022.json \
      --output data/processed/corn.csv
    
  2. Analyze

    python scripts/analyze_nass.py \
      --input data/processed/corn.csv \
      --analysis yoy \
      --commodity CORN \
      --year1 2023 \
      --year2 2022 \
      --output data/analysis/corn_yoy.json
    
  3. Interpret results

    File data/analysis/corn_yoy.json contains:

    {
      "production_current": 15.3,
      "change_percent": 11.7,
      "interpretation": "significant_increase"
    }
    

    Respond to user: "Corn production grew 11.7% in 2023..."


❌ **BAD**:
```markdown
### Workflow: Comparison

1. Get data
2. Compare
3. Return result

5. Complete examples:

GOOD:

### Example 1: YoY Comparison

**Question**: "How's corn production compared to last year?"

**Executed flow**:
[Specific commands with outputs]

**Generated answer**:
"Corn production in 2023 is 15.3 billion bushels,
growth of 11.7% vs 2022 (13.7 billion). Growth
comes mainly from area increase (+8%) with stable yield."

BAD:

### Example: Comparison

User asks about comparison. Agent compares and responds.

FORBIDDEN

  • Empty sections
  • "See documentation"
  • Workflows without specific commands
  • Generic examples

References

MANDATORY

1. Useful and self-contained content:

GOOD (references/api-guide.md):

## Endpoint: Get Production Data

**URL**: `GET https://quickstats.nass.usda.gov/api/api_GET/`

**Parameters**:
- `commodity_desc`: Commodity name
  - Example: "CORN", "SOYBEANS"
  - Case-sensitive
- `year`: Desired year
  - Example: 2023
  - Range: 1866-present

**Complete request example**:
```bash
curl -H "X-Api-Key: YOUR_KEY" \
  "https://quickstats.nass.usda.gov/api/api_GET/?commodity_desc=CORN&year=2023&format=JSON"

Expected response:

{
  "data": [
    {
      "year": 2023,
      "commodity_desc": "CORN",
      "value": "15,300,000,000",
      "unit_desc": "BU"
    }
  ]
}

Important fields:

  • value: Comes as STRING with commas
    • Solution: value.replace(',', '')
    • Convert to float after

❌ **BAD**:
```markdown
## API Endpoint

For details on how to use the API, consult the official documentation at:
https://quickstats.nass.usda.gov/api

[End of file]

2. Adequate size:

  • API guide: 1500-2000 words
  • Analysis methods: 2000-3000 words
  • Troubleshooting: 1000-1500 words

3. Concrete examples:

  • Always include examples with real values
  • Executable code blocks
  • Expected outputs

FORBIDDEN

  • "For more information, see [link]"
  • Sections with only 2-3 lines
  • Lists without details
  • Circular references ("see other doc that sees other doc")

Assets (Configs)

MANDATORY

1. Syntactically valid JSON:

# ALWAYS validate:
python -c "import json; json.load(open('config.json'))"

2. Real values:

GOOD:

{
  "api": {
    "base_url": "https://quickstats.nass.usda.gov/api",
    "api_key_env": "NASS_API_KEY",
    "_instructions": "Get free API key from: https://quickstats.nass.usda.gov/api#registration",
    "rate_limit_per_day": 1000,
    "timeout_seconds": 30
  }
}

BAD:

{
  "api": {
    "base_url": "YOUR_API_URL_HERE",
    "api_key": "YOUR_KEY_HERE"
  }
}

3. Inline comments (using _comment or _note):

{
  "_comment": "Differentiated TTL by data type",
  "cache": {
    "ttl_historical_days": 365,
    "_note_historical": "Historical data doesn't change",
    "ttl_current_days": 7,
    "_note_current": "Current year data may be revised"
  }
}

README.md

MANDATORY

1. Complete installation instructions:

GOOD:

## Installation

### 1. Get API Key (Free)

1. Access https://quickstats.nass.usda.gov/api#registration
2. Fill form:
   - Name: [your name]
   - Email: [your email]
   - Purpose: "Personal research"
3. Click "Submit"
4. You'll receive email with API key in ~1 minute
5. Key format: `A1B2C3D4-E5F6-G7H8-I9J0-K1L2M3N4O5P6`

### 2. Configure Environment

**Option A - Export** (temporary):
```bash
export NASS_API_KEY="your_key_here"

Option B - .bashrc/.zshrc (permanent):

echo 'export NASS_API_KEY="your_key_here"' >> ~/.bashrc
source ~/.bashrc

Option C - .env file (per project):

echo "NASS_API_KEY=your_key_here" > .env

3. Install Dependencies

cd nass-usda-agriculture
pip install -r requirements.txt

Requirements:

  • requests
  • pandas
  • numpy

❌ **BAD**:
```markdown
## Installation

1. Get API key from the official website
2. Configure environment
3. Install dependencies
4. Done!

2. Concrete usage examples:

GOOD:

## Examples

### Example 1: Current Production

You: "What's US corn production in 2023?"

Claude: "Corn production in 2023 was 15.3 billion bushels (389 million metric tons)..."


### Example 2: YoY Comparison

You: "Compare soybeans this year vs last year"

Claude: "Soybean production in 2023 is 2.6% below 2022:

  • 2023: 4.165 billion bushels
  • 2022: 4.276 billion bushels
  • Drop from area (-4.5%), yield improved (+0.8%)"

[3-5 more examples]

BAD:

## Usage

Ask questions about agriculture and the agent will respond.

3. Specific troubleshooting:

GOOD:

### Error: "NASS_API_KEY environment variable not found"

**Cause**: API key not configured

**Step-by-step solution**:
1. Verify key was obtained: https://...
2. Configure environment:
   ```bash
   export NASS_API_KEY="your_key_here"
  1. Verify:
    echo $NASS_API_KEY
    
  2. Should show your key
  3. If doesn't work, restart terminal

Still not working?

  • Check for extra spaces in key
  • Verify key hasn't expired (validity: 1 year)
  • Re-generate key if needed

---

## Quality Checklist

### Per Python Script

- [ ] Shebang: `#!/usr/bin/env python3`
- [ ] Module docstring (3-5 lines)
- [ ] Organized imports (stdlib, 3rd party, local)
- [ ] Constants at top (if applicable)
- [ ] Type hints in all public functions
- [ ] Docstrings in classes (description + attributes + example)
- [ ] Docstrings in methods (Args, Returns, Raises, Example)
- [ ] Error handling for risky operations
- [ ] Input validations
- [ ] Output validations
- [ ] Appropriate logging
- [ ] Main function with argparse
- [ ] if __name__ == "__main__"
- [ ] Functional code (no TODO/pass)
- [ ] Valid syntax (test: `python -m py_compile script.py`)

### Per SKILL.md

- [ ] Frontmatter with name and description
- [ ] Description 150-250 characters with keywords
- [ ] Size 5000+ words
- [ ] "When to Use" section with specific triggers
- [ ] "Data Source" section detailed
- [ ] Step-by-step workflows with commands
- [ ] Scripts explained individually
- [ ] Analyses documented (objective, methodology)
- [ ] Errors handled (all expected)
- [ ] Validations listed
- [ ] Performance/cache explained
- [ ] Complete keywords
- [ ] Complete examples (5+)

### Per Reference File

- [ ] 1000+ words
- [ ] Useful content (not just links)
- [ ] Concrete examples with real values
- [ ] Executable code blocks
- [ ] Well structured (headings, lists)
- [ ] No empty sections
- [ ] No "TODO: write"

### Per Asset (Config)

- [ ] Syntactically valid JSON (validate!)
- [ ] Real values (not "YOUR_X_HERE" without context)
- [ ] Inline comments (_comment, _note)
- [ ] Instructions for values user must fill
- [ ] Logical and organized structure

### Per README.md

- [ ] Step-by-step installation
- [ ] How to get API key (detailed)
- [ ] How to configure (3 options)
- [ ] How to install dependencies
- [ ] How to install in Claude Code
- [ ] Usage examples (5+)
- [ ] Troubleshooting (10+ problems)
- [ ] License
- [ ] Contact/contribution (if applicable)

### Complete Agent

- [ ] DECISIONS.md documents all choices
- [ ] **VERSION** file created (e.g. 1.0.0)
- [ ] **CHANGELOG.md** created with complete v1.0.0 entry
- [ ] **INSTALACAO.md** with complete didactic tutorial
- [ ] **comprehensive_{domain}_report()** implemented
- [ ] SKILL.md with version in frontmatter metadata
- [ ] 18+ files created
- [ ] ~1500+ lines of Python code
- [ ] ~10,000+ words of documentation
- [ ] 2+ configs
- [ ] requirements.txt
- [ ] .gitignore (if needed)
- [ ] No placeholder/TODO
- [ ] Valid syntax (Python, JSON, YAML)
- [ ] Ready to use (production-ready)

---

## Quality Examples

### Example: Error Handling

❌ **BAD**:
```python
def fetch(url):
    return requests.get(url).json()

GOOD:

def fetch(url: str, timeout: int = 30) -> Dict:
    """
    Fetch data from URL with error handling

    Args:
        url: URL to fetch
        timeout: Timeout in seconds

    Returns:
        JSON response as dict

    Raises:
        NetworkError: If connection fails
        TimeoutError: If request times out
        APIError: If API returns error
    """
    try:
        response = requests.get(url, timeout=timeout)
        response.raise_for_status()

        data = response.json()

        if 'error' in data:
            raise APIError(f"API error: {data['error']}")

        return data

    except requests.Timeout:
        raise TimeoutError(f"Request timed out after {timeout}s")

    except requests.ConnectionError as e:
        raise NetworkError(f"Connection failed: {e}")

    except requests.HTTPError as e:
        if e.response.status_code == 429:
            raise RateLimitError("Rate limit exceeded")
        else:
            raise APIError(f"HTTP {e.response.status_code}: {e}")

Example: Validations

BAD:

def parse(data):
    df = pd.DataFrame(data)
    return df

GOOD:

def parse(data: List[Dict]) -> pd.DataFrame:
    """Parse and validate data"""

    # Validate input
    if not data:
        raise ValueError("Data cannot be empty")

    if not isinstance(data, list):
        raise TypeError(f"Expected list, got {type(data)}")

    # Parse
    df = pd.DataFrame(data)

    # Validate schema
    required_cols = ['year', 'commodity', 'value']
    missing = set(required_cols) - set(df.columns)
    if missing:
        raise ValueError(f"Missing required columns: {missing}")

    # Validate types
    df['year'] = pd.to_numeric(df['year'], errors='raise')
    df['value'] = pd.to_numeric(df['value'], errors='raise')

    # Validate ranges
    current_year = datetime.now().year
    if (df['year'] > current_year).any():
        raise ValueError(f"Future years found (max allowed: {current_year})")

    if (df['value'] < 0).any():
        raise ValueError("Negative values found")

    # Validate no duplicates
    if df.duplicated(subset=['year', 'commodity']).any():
        raise ValueError("Duplicate records found")

    return df

Example: Docstrings

BAD:

def analyze(df, commodity):
    """Analyze data"""
    # ...

GOOD:

def analyze_yoy(
    df: pd.DataFrame,
    commodity: str,
    year1: int,
    year2: int
) -> Dict[str, Any]:
    """
    Perform year-over-year comparison analysis

    Compares production, area, and yield between two years
    and decomposes growth into area vs yield contributions.

    Args:
        df: DataFrame with columns ['year', 'commodity', 'production', 'area', 'yield']
        commodity: Commodity name (e.g., "CORN", "SOYBEANS")
        year1: Current year to compare
        year2: Previous year to compare against

    Returns:
        Dict containing:
            - production_current (float): Production in year1 (million units)
            - production_previous (float): Production in year2
            - change_absolute (float): Absolute change
            - change_percent (float): Percent change
            - decomposition (dict): Area vs yield contribution
            - interpretation (str): "increase", "decrease", or "stable"

    Raises:
        ValueError: If commodity not found in data
        ValueError: If either year not found in data
        DataQualityError: If production != area * yield (tolerance > 1%)

    Example:
        >>> df = pd.DataFrame([
        ...     {'year': 2023, 'commodity': 'CORN', 'production': 15.3, 'area': 94.6, 'yield': 177},
        ...     {'year': 2022, 'commodity': 'CORN', 'production': 13.7, 'area': 89.2, 'yield': 173}
        ... ])
        >>> result = analyze_yoy(df, "CORN", 2023, 2022)
        >>> result['change_percent']
        11.7
    """
    # [Complete implementation]

Dependency Management

Decision Framework

Skills should minimize external dependencies. Every dependency is a maintenance burden, a security surface, and a compatibility risk. Use this decision tree:

Can stdlib do it?
  → Yes: Use stdlib. Done.
  → No: Is there a lightweight pure-Python package (<1MB)?
      → Yes: Use it. Add to requirements.txt.
      → No: Is there a well-maintained popular package?
          → Yes: Use it only if the domain requires it.
          → No: Implement it yourself or redesign the approach.

Stdlib vs. Third-Party Decision Table

Task Stdlib Solution When to Use Third-Party
HTTP requests urllib.request Use requests when: complex auth, session management, multipart uploads, or retry logic would require 100+ lines of urllib code
JSON handling json Never — stdlib is sufficient
CSV parsing csv Use pandas only when: statistical analysis, complex transformations, or DataFrame operations are core to the skill
File paths pathlib Never — stdlib is sufficient
Date/time datetime Never — stdlib is sufficient
Regex re Never — stdlib is sufficient
Hashing hashlib Never — stdlib is sufficient
Caching File-based (json + pathlib) Never for skills — the FileCache pattern in architecture-guide.md is sufficient
Data analysis Manual calculations Use pandas/numpy when: skill is primarily analytical (10+ statistical operations, pivots, aggregations)
PDF generation Not available Use reportlab or fpdf2 when PDF output is a core requirement
Web scraping urllib + html.parser Use beautifulsoup4 when parsing complex/malformed HTML
CLI arguments argparse Never — stdlib is sufficient
YAML parsing Manual (the _parse_frontmatter pattern) Use pyyaml only if skill needs to parse arbitrary YAML files (not just SKILL.md frontmatter)

requirements.txt Rules

When third-party packages are needed:

# requirements.txt

# Pin major.minor, allow patch updates
requests>=2.31,<3.0
pandas>=2.0,<3.0

# For stdlib-only skills, create an empty requirements.txt with a comment:
# No external dependencies required — this skill uses Python stdlib only.

Rules:

  • Always create requirements.txt even if empty (document the stdlib-only decision)
  • Pin major.minor version to avoid breaking changes
  • Never pin exact patch versions (allows security updates)
  • Never include dev dependencies (pytest, ruff) — those are for contributors, not users
  • List only direct dependencies, not transitive ones
  • Include a comment explaining why each package is needed

Common Dependency Patterns by Skill Type

Skill Type Typical Dependencies
Data analysis (stocks, agriculture, climate) requests, pandas, numpy
Report generation requests, fpdf2 or reportlab
Web scraping requests, beautifulsoup4
API wrapper requests (or stdlib urllib)
Text processing Stdlib only (re, json, csv)
File format conversion Stdlib only (or single specialized package)
Database interaction Stdlib sqlite3 (or psycopg2/pymysql for specific DBs)

Testing Strategy

Why Test Generated Skills

Skills are opinionated software that teams rely on daily. A skill that produces wrong calculations, misparses API responses, or silently drops data is worse than no skill at all. Tests catch these issues before the skill reaches users.

What to Test

Focus tests on the parts most likely to break or produce wrong results:

Priority What to Test Why
High Analysis/calculation functions Wrong math = wrong decisions
High Data parsing (API response → structured data) APIs change formats, edge cases in real data
High Input validation Bad input should fail clearly, not silently produce garbage
Medium Output formatting Reports and summaries should be consistent
Medium Error handling paths Verify graceful degradation on API failures, missing data
Low Cache logic Only if custom caching is complex
Low Config loading Usually trivial

Test Directory Structure

skill-name/
├── scripts/
│   ├── analyze.py
│   ├── fetch.py
│   └── parse.py
├── tests/
│   ├── test_analyze.py       # Unit tests for analysis functions
│   ├── test_parse.py         # Unit tests for parsing logic
│   ├── fixtures/
│   │   ├── sample_api_response.json    # Real API response (anonymized)
│   │   └── sample_parsed_data.csv      # Expected parsed output
│   └── conftest.py           # Shared pytest fixtures

Test Patterns

Pattern 1: Test analysis functions with known inputs/outputs

"""Tests for analyze.py — core calculation functions."""
import pytest
from scripts.analyze import analyze_yoy, calculate_trend

def test_yoy_increase():
    """YoY comparison should detect an increase."""
    result = analyze_yoy(
        current_value=150.0,
        previous_value=100.0,
    )
    assert result["change_percent"] == pytest.approx(50.0)
    assert result["interpretation"] == "significant_increase"

def test_yoy_stable():
    """Changes under 2% should be interpreted as stable."""
    result = analyze_yoy(current_value=101.0, previous_value=100.0)
    assert result["interpretation"] == "stable"

def test_yoy_zero_previous():
    """Division by zero should raise ValueError, not crash."""
    with pytest.raises(ValueError, match="previous value cannot be zero"):
        analyze_yoy(current_value=100.0, previous_value=0.0)

Pattern 2: Test parsing with fixture data

"""Tests for parse.py — API response parsing."""
import json
from pathlib import Path
from scripts.parse import parse_api_response

FIXTURES = Path(__file__).parent / "fixtures"

def test_parse_normal_response():
    """Standard API response should parse to expected structure."""
    raw = json.loads((FIXTURES / "sample_api_response.json").read_text())
    result = parse_api_response(raw)
    assert len(result) > 0
    assert "year" in result[0]
    assert "value" in result[0]

def test_parse_empty_response():
    """Empty API response should return empty list, not crash."""
    result = parse_api_response({"data": []})
    assert result == []

def test_parse_malformed_values():
    """Values with commas (e.g., '15,300,000') should be cleaned."""
    raw = {"data": [{"value": "15,300,000", "year": "2023"}]}
    result = parse_api_response(raw)
    assert result[0]["value"] == 15300000.0

Pattern 3: Mock external API calls

"""Tests for fetch.py — API interaction (mocked)."""
from unittest.mock import patch, MagicMock
from scripts.fetch import fetch_data

@patch("scripts.fetch.urllib.request.urlopen")
def test_fetch_success(mock_urlopen):
    """Successful API call should return parsed JSON."""
    mock_response = MagicMock()
    mock_response.read.return_value = b'{"data": [{"value": "100"}]}'
    mock_response.__enter__ = lambda s: s
    mock_response.__exit__ = MagicMock(return_value=False)
    mock_urlopen.return_value = mock_response

    result = fetch_data(commodity="CORN", year=2023)
    assert "data" in result

@patch("scripts.fetch.urllib.request.urlopen")
def test_fetch_rate_limited(mock_urlopen):
    """429 response should raise RateLimitError."""
    from urllib.error import HTTPError
    mock_urlopen.side_effect = HTTPError(
        url="", code=429, msg="Too Many Requests", hdrs={}, fp=None
    )
    with pytest.raises(Exception, match="[Rr]ate limit"):
        fetch_data(commodity="CORN", year=2023)

Running Tests

# Run all tests
cd skill-name/
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=scripts --cov-report=term-missing

When to Generate Tests

Tests are generated during Phase 5 (Implementation) after the scripts are written:

  1. Write all scripts first (steps 1-5 of Phase 5)
  2. Create tests/ directory with test files for core functions
  3. Create tests/fixtures/ with sample data
  4. Run tests to verify they pass
  5. Include test instructions in README.md

Note: Tests are recommended but not mandatory for v1.0 of a skill. The validation and security scan gates are always mandatory. Tests become critical when:

  • The skill performs financial calculations (wrong math = real cost)
  • The skill processes sensitive data (parsing errors = data loss)
  • Multiple people will maintain the skill (tests prevent regressions)
  • The skill is being published to the team registry (quality expectation is higher)

Anti-Patterns

Anti-Pattern 1: Partial Implementation

NO:

def yoy_comparison(df, commodity, year1, year2):
    # Implement YoY comparison
    pass

def state_ranking(df, commodity):
    # TODO: implement ranking
    raise NotImplementedError()

YES:

# [Complete and functional code for BOTH functions]

Anti-Pattern 2: Empty References

NO:

# Analysis Methods

## YoY Comparison

This method compares two years.

## Ranking

This method ranks states.

YES:

# Analysis Methods

## YoY Comparison

### Objective
Compare metrics between current and previous year...

### Detailed Methodology

**Formulas**:

Δ X = X(t) - X(t-1) Δ X% = (Δ X / X(t-1)) × 100


**Decomposition** (for production):
[Complete mathematics]

**Interpretation**:
- |Δ| < 2%: Stable
- Δ > 10%: Significant increase
[...]

### Validations
[List]

### Complete Numerical Example
[With real values]

Anti-Pattern 3: Useless Configs

NO:

{
  "api_url": "INSERT_URL",
  "api_key": "INSERT_KEY"
}

YES:

{
  "_comment": "Configuration for NASS USDA Agent",
  "api": {
    "base_url": "https://quickstats.nass.usda.gov/api",
    "_note": "This is the official USDA NASS API base URL",
    "api_key_env": "NASS_API_KEY",
    "_key_instructions": "Get free API key from: https://quickstats.nass.usda.gov/api#registration"
  }
}

Final Validation

Before delivering to user, verify:

Sanity Test

# 1. Python syntax
find scripts -name "*.py" -exec python -m py_compile {} \;

# 2. JSON syntax
python -c "import json; json.load(open('assets/config.json'))"

# 3. Imports make sense
grep -r "^import\|^from" scripts/*.py | sort | uniq
# Verify all libs are: stdlib, requests, pandas, numpy
# No imports of uninstalled libs

# 4. SKILL.md has frontmatter
head -5 SKILL.md | grep "^---$"

# 5. SKILL.md size
wc -w SKILL.md
# Should be > 5000 words

Final Checklist

  • Syntax check passed (Python, JSON)
  • No import of non-existent lib
  • No TODO or pass
  • SKILL.md > 5000 words
  • References with content
  • README with complete instructions
  • DECISIONS.md created
  • requirements.txt created