Skills go stale as APIs change and data sources move. This adds a three-layer staleness detection system: review date tracking with git fallback, HTTP health checks for declared dependencies, and top-level key comparison for schema drift. All new frontmatter fields are optional — existing skills work unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
29 KiB
Mandatory Quality Standards
Fundamental Principles
Production-Ready, Not Prototype
- Code must work without modifications
- Doesn't need "now implement X"
- Can be used immediately
Functional, Not Placeholder
- Complete code in all functions
- No TODO, pass, NotImplementedError
- Robust error handling
Useful, Not Generic
- Specific and detailed content
- Concrete examples, not abstract
- Not just external links
Current, Not Stale
- Include
metadata.createdandmetadata.last_revieweddates in frontmatter - Set
metadata.review_interval_days(default: 90 days) - Declare external dependencies in
metadata.dependenciesso health can be checked - Declare expected API response shapes in
metadata.schema_expectationsfor drift detection - Run
python3 scripts/staleness_check.py path/to/skill/periodically to detect stale skills - When publishing to a registry, use
python3 scripts/skill_registry.py staleto audit all skills
Standards by File Type
Python Scripts
✅ MANDATORY
1. Complete structure:
#!/usr/bin/env python3
"""Module docstring"""
# Imports
import ...
# Constants
CONST = value
# Classes/Functions
class/def ...
# Main
def main():
...
if __name__ == "__main__":
main()
2. Docstrings:
- Module docstring: 3-5 lines
- Class docstring: Description + Example
- Method docstring: Args, Returns, Raises, Example
3. Type hints:
def function(param1: str, param2: int = 10) -> Dict[str, Any]:
...
4. Error handling:
try:
result = risky_operation()
except SpecificError as e:
# Handle specifically
log_error(e)
raise CustomError(f"Context: {e}")
5. Validations:
def process(data: Dict) -> pd.DataFrame:
# Validate input
if not data:
raise ValueError("Data cannot be empty")
if 'required_field' not in data:
raise ValueError("Missing required field")
# Process
...
# Validate output
assert len(result) > 0, "Result cannot be empty"
assert result['value'].notna().all(), "No null values allowed"
return result
6. Appropriate logging:
import logging
logger = logging.getLogger(__name__)
def fetch_data():
logger.info("Fetching data from API...")
# ...
logger.debug(f"Received {len(data)} records")
# ...
logger.error(f"API error: {e}")
❌ FORBIDDEN
# ❌ DON'T DO THIS:
def analyze():
# TODO: implement analysis
pass
def process(data): # ❌ No type hints
# ❌ No docstring
result = data # ❌ No real logic
return result # ❌ No validation
def fetch_api(url):
response = requests.get(url) # ❌ No timeout
return response.json() # ❌ No error handling
✅ DO THIS:
def analyze_yoy(df: pd.DataFrame, commodity: str, year1: int, year2: int) -> Dict:
"""
Perform year-over-year analysis
Args:
df: DataFrame with parsed data
commodity: Commodity name (e.g., "CORN")
year1: Current year
year2: Previous year
Returns:
Dict with keys:
- production_current: float
- production_previous: float
- change_percent: float
- interpretation: str
Raises:
ValueError: If data not found for specified years
DataQualityError: If data fails validation
Example:
>>> analyze_yoy(df, "CORN", 2023, 2022)
{'production_current': 15.3, 'change_percent': 11.7, ...}
"""
# Validate inputs
if commodity not in df['commodity'].unique():
raise ValueError(f"Commodity {commodity} not found in data")
# Filter data
df1 = df[(df['commodity'] == commodity) & (df['year'] == year1)]
df2 = df[(df['commodity'] == commodity) & (df['year'] == year2)]
if len(df1) == 0 or len(df2) == 0:
raise ValueError(f"Data not found for {commodity} in {year1} or {year2}")
# Extract values
prod1 = df1['production'].iloc[0]
prod2 = df2['production'].iloc[0]
# Calculate
change = prod1 - prod2
change_pct = (change / prod2) * 100
# Interpret
if abs(change_pct) < 2:
interpretation = "stable"
elif change_pct > 10:
interpretation = "significant_increase"
elif change_pct > 2:
interpretation = "moderate_increase"
elif change_pct < -10:
interpretation = "significant_decrease"
else:
interpretation = "moderate_decrease"
# Return
return {
"commodity": commodity,
"production_current": round(prod1, 1),
"production_previous": round(prod2, 1),
"change_absolute": round(change, 1),
"change_percent": round(change_pct, 1),
"interpretation": interpretation
}
SKILL.md
✅ MANDATORY
1. Valid frontmatter:
---
name: agent-name
description: [150-250 words with keywords]
---
2. Size: 5000-7000 words
3. Mandatory sections:
- When to use (specific triggers)
- Data source (detailed API)
- Workflows (complete step-by-step)
- Scripts (each one explained)
- Analyses (methodologies)
- Errors (complete handling)
- Validations (mandatory)
- Keywords (complete list)
- Examples (5+ complete)
4. Detailed workflows:
✅ GOOD:
### Workflow: YoY Comparison
1. **Identify question parameters**
- Commodity: [extract from question]
- Years: Current vs previous (or specified)
2. **Fetch data**
```bash
python scripts/fetch_nass.py \
--commodity CORN \
--years 2023,2022 \
--output data/raw/corn_2023_2022.json
-
Parse
python scripts/parse_nass.py \ --input data/raw/corn_2023_2022.json \ --output data/processed/corn.csv -
Analyze
python scripts/analyze_nass.py \ --input data/processed/corn.csv \ --analysis yoy \ --commodity CORN \ --year1 2023 \ --year2 2022 \ --output data/analysis/corn_yoy.json -
Interpret results
File
data/analysis/corn_yoy.jsoncontains:{ "production_current": 15.3, "change_percent": 11.7, "interpretation": "significant_increase" }Respond to user: "Corn production grew 11.7% in 2023..."
❌ **BAD**:
```markdown
### Workflow: Comparison
1. Get data
2. Compare
3. Return result
5. Complete examples:
✅ GOOD:
### Example 1: YoY Comparison
**Question**: "How's corn production compared to last year?"
**Executed flow**:
[Specific commands with outputs]
**Generated answer**:
"Corn production in 2023 is 15.3 billion bushels,
growth of 11.7% vs 2022 (13.7 billion). Growth
comes mainly from area increase (+8%) with stable yield."
❌ BAD:
### Example: Comparison
User asks about comparison. Agent compares and responds.
❌ FORBIDDEN
- Empty sections
- "See documentation"
- Workflows without specific commands
- Generic examples
References
✅ MANDATORY
1. Useful and self-contained content:
✅ GOOD (references/api-guide.md):
## Endpoint: Get Production Data
**URL**: `GET https://quickstats.nass.usda.gov/api/api_GET/`
**Parameters**:
- `commodity_desc`: Commodity name
- Example: "CORN", "SOYBEANS"
- Case-sensitive
- `year`: Desired year
- Example: 2023
- Range: 1866-present
**Complete request example**:
```bash
curl -H "X-Api-Key: YOUR_KEY" \
"https://quickstats.nass.usda.gov/api/api_GET/?commodity_desc=CORN&year=2023&format=JSON"
Expected response:
{
"data": [
{
"year": 2023,
"commodity_desc": "CORN",
"value": "15,300,000,000",
"unit_desc": "BU"
}
]
}
Important fields:
value: Comes as STRING with commas- Solution:
value.replace(',', '') - Convert to float after
- Solution:
❌ **BAD**:
```markdown
## API Endpoint
For details on how to use the API, consult the official documentation at:
https://quickstats.nass.usda.gov/api
[End of file]
2. Adequate size:
- API guide: 1500-2000 words
- Analysis methods: 2000-3000 words
- Troubleshooting: 1000-1500 words
3. Concrete examples:
- Always include examples with real values
- Executable code blocks
- Expected outputs
❌ FORBIDDEN
- "For more information, see [link]"
- Sections with only 2-3 lines
- Lists without details
- Circular references ("see other doc that sees other doc")
Assets (Configs)
✅ MANDATORY
1. Syntactically valid JSON:
# ALWAYS validate:
python -c "import json; json.load(open('config.json'))"
2. Real values:
✅ GOOD:
{
"api": {
"base_url": "https://quickstats.nass.usda.gov/api",
"api_key_env": "NASS_API_KEY",
"_instructions": "Get free API key from: https://quickstats.nass.usda.gov/api#registration",
"rate_limit_per_day": 1000,
"timeout_seconds": 30
}
}
❌ BAD:
{
"api": {
"base_url": "YOUR_API_URL_HERE",
"api_key": "YOUR_KEY_HERE"
}
}
3. Inline comments (using _comment or _note):
{
"_comment": "Differentiated TTL by data type",
"cache": {
"ttl_historical_days": 365,
"_note_historical": "Historical data doesn't change",
"ttl_current_days": 7,
"_note_current": "Current year data may be revised"
}
}
README.md
✅ MANDATORY
1. Complete installation instructions:
✅ GOOD:
## Installation
### 1. Get API Key (Free)
1. Access https://quickstats.nass.usda.gov/api#registration
2. Fill form:
- Name: [your name]
- Email: [your email]
- Purpose: "Personal research"
3. Click "Submit"
4. You'll receive email with API key in ~1 minute
5. Key format: `A1B2C3D4-E5F6-G7H8-I9J0-K1L2M3N4O5P6`
### 2. Configure Environment
**Option A - Export** (temporary):
```bash
export NASS_API_KEY="your_key_here"
Option B - .bashrc/.zshrc (permanent):
echo 'export NASS_API_KEY="your_key_here"' >> ~/.bashrc
source ~/.bashrc
Option C - .env file (per project):
echo "NASS_API_KEY=your_key_here" > .env
3. Install Dependencies
cd nass-usda-agriculture
pip install -r requirements.txt
Requirements:
- requests
- pandas
- numpy
❌ **BAD**:
```markdown
## Installation
1. Get API key from the official website
2. Configure environment
3. Install dependencies
4. Done!
2. Concrete usage examples:
✅ GOOD:
## Examples
### Example 1: Current Production
You: "What's US corn production in 2023?"
Claude: "Corn production in 2023 was 15.3 billion bushels (389 million metric tons)..."
### Example 2: YoY Comparison
You: "Compare soybeans this year vs last year"
Claude: "Soybean production in 2023 is 2.6% below 2022:
- 2023: 4.165 billion bushels
- 2022: 4.276 billion bushels
- Drop from area (-4.5%), yield improved (+0.8%)"
[3-5 more examples]
❌ BAD:
## Usage
Ask questions about agriculture and the agent will respond.
3. Specific troubleshooting:
✅ GOOD:
### Error: "NASS_API_KEY environment variable not found"
**Cause**: API key not configured
**Step-by-step solution**:
1. Verify key was obtained: https://...
2. Configure environment:
```bash
export NASS_API_KEY="your_key_here"
- Verify:
echo $NASS_API_KEY - Should show your key
- If doesn't work, restart terminal
Still not working?
- Check for extra spaces in key
- Verify key hasn't expired (validity: 1 year)
- Re-generate key if needed
---
## Quality Checklist
### Per Python Script
- [ ] Shebang: `#!/usr/bin/env python3`
- [ ] Module docstring (3-5 lines)
- [ ] Organized imports (stdlib, 3rd party, local)
- [ ] Constants at top (if applicable)
- [ ] Type hints in all public functions
- [ ] Docstrings in classes (description + attributes + example)
- [ ] Docstrings in methods (Args, Returns, Raises, Example)
- [ ] Error handling for risky operations
- [ ] Input validations
- [ ] Output validations
- [ ] Appropriate logging
- [ ] Main function with argparse
- [ ] if __name__ == "__main__"
- [ ] Functional code (no TODO/pass)
- [ ] Valid syntax (test: `python -m py_compile script.py`)
### Per SKILL.md
- [ ] Frontmatter with name and description
- [ ] Description 150-250 characters with keywords
- [ ] Size 5000+ words
- [ ] "When to Use" section with specific triggers
- [ ] "Data Source" section detailed
- [ ] Step-by-step workflows with commands
- [ ] Scripts explained individually
- [ ] Analyses documented (objective, methodology)
- [ ] Errors handled (all expected)
- [ ] Validations listed
- [ ] Performance/cache explained
- [ ] Complete keywords
- [ ] Complete examples (5+)
### Per Reference File
- [ ] 1000+ words
- [ ] Useful content (not just links)
- [ ] Concrete examples with real values
- [ ] Executable code blocks
- [ ] Well structured (headings, lists)
- [ ] No empty sections
- [ ] No "TODO: write"
### Per Asset (Config)
- [ ] Syntactically valid JSON (validate!)
- [ ] Real values (not "YOUR_X_HERE" without context)
- [ ] Inline comments (_comment, _note)
- [ ] Instructions for values user must fill
- [ ] Logical and organized structure
### Per README.md
- [ ] Step-by-step installation
- [ ] How to get API key (detailed)
- [ ] How to configure (3 options)
- [ ] How to install dependencies
- [ ] How to install in Claude Code
- [ ] Usage examples (5+)
- [ ] Troubleshooting (10+ problems)
- [ ] License
- [ ] Contact/contribution (if applicable)
### Complete Agent
- [ ] DECISIONS.md documents all choices
- [ ] **VERSION** file created (e.g. 1.0.0)
- [ ] **CHANGELOG.md** created with complete v1.0.0 entry
- [ ] **INSTALACAO.md** with complete didactic tutorial
- [ ] **comprehensive_{domain}_report()** implemented
- [ ] SKILL.md with version in frontmatter metadata
- [ ] 18+ files created
- [ ] ~1500+ lines of Python code
- [ ] ~10,000+ words of documentation
- [ ] 2+ configs
- [ ] requirements.txt
- [ ] .gitignore (if needed)
- [ ] No placeholder/TODO
- [ ] Valid syntax (Python, JSON, YAML)
- [ ] Ready to use (production-ready)
---
## Quality Examples
### Example: Error Handling
❌ **BAD**:
```python
def fetch(url):
return requests.get(url).json()
✅ GOOD:
def fetch(url: str, timeout: int = 30) -> Dict:
"""
Fetch data from URL with error handling
Args:
url: URL to fetch
timeout: Timeout in seconds
Returns:
JSON response as dict
Raises:
NetworkError: If connection fails
TimeoutError: If request times out
APIError: If API returns error
"""
try:
response = requests.get(url, timeout=timeout)
response.raise_for_status()
data = response.json()
if 'error' in data:
raise APIError(f"API error: {data['error']}")
return data
except requests.Timeout:
raise TimeoutError(f"Request timed out after {timeout}s")
except requests.ConnectionError as e:
raise NetworkError(f"Connection failed: {e}")
except requests.HTTPError as e:
if e.response.status_code == 429:
raise RateLimitError("Rate limit exceeded")
else:
raise APIError(f"HTTP {e.response.status_code}: {e}")
Example: Validations
❌ BAD:
def parse(data):
df = pd.DataFrame(data)
return df
✅ GOOD:
def parse(data: List[Dict]) -> pd.DataFrame:
"""Parse and validate data"""
# Validate input
if not data:
raise ValueError("Data cannot be empty")
if not isinstance(data, list):
raise TypeError(f"Expected list, got {type(data)}")
# Parse
df = pd.DataFrame(data)
# Validate schema
required_cols = ['year', 'commodity', 'value']
missing = set(required_cols) - set(df.columns)
if missing:
raise ValueError(f"Missing required columns: {missing}")
# Validate types
df['year'] = pd.to_numeric(df['year'], errors='raise')
df['value'] = pd.to_numeric(df['value'], errors='raise')
# Validate ranges
current_year = datetime.now().year
if (df['year'] > current_year).any():
raise ValueError(f"Future years found (max allowed: {current_year})")
if (df['value'] < 0).any():
raise ValueError("Negative values found")
# Validate no duplicates
if df.duplicated(subset=['year', 'commodity']).any():
raise ValueError("Duplicate records found")
return df
Example: Docstrings
❌ BAD:
def analyze(df, commodity):
"""Analyze data"""
# ...
✅ GOOD:
def analyze_yoy(
df: pd.DataFrame,
commodity: str,
year1: int,
year2: int
) -> Dict[str, Any]:
"""
Perform year-over-year comparison analysis
Compares production, area, and yield between two years
and decomposes growth into area vs yield contributions.
Args:
df: DataFrame with columns ['year', 'commodity', 'production', 'area', 'yield']
commodity: Commodity name (e.g., "CORN", "SOYBEANS")
year1: Current year to compare
year2: Previous year to compare against
Returns:
Dict containing:
- production_current (float): Production in year1 (million units)
- production_previous (float): Production in year2
- change_absolute (float): Absolute change
- change_percent (float): Percent change
- decomposition (dict): Area vs yield contribution
- interpretation (str): "increase", "decrease", or "stable"
Raises:
ValueError: If commodity not found in data
ValueError: If either year not found in data
DataQualityError: If production != area * yield (tolerance > 1%)
Example:
>>> df = pd.DataFrame([
... {'year': 2023, 'commodity': 'CORN', 'production': 15.3, 'area': 94.6, 'yield': 177},
... {'year': 2022, 'commodity': 'CORN', 'production': 13.7, 'area': 89.2, 'yield': 173}
... ])
>>> result = analyze_yoy(df, "CORN", 2023, 2022)
>>> result['change_percent']
11.7
"""
# [Complete implementation]
Dependency Management
Decision Framework
Skills should minimize external dependencies. Every dependency is a maintenance burden, a security surface, and a compatibility risk. Use this decision tree:
Can stdlib do it?
→ Yes: Use stdlib. Done.
→ No: Is there a lightweight pure-Python package (<1MB)?
→ Yes: Use it. Add to requirements.txt.
→ No: Is there a well-maintained popular package?
→ Yes: Use it only if the domain requires it.
→ No: Implement it yourself or redesign the approach.
Stdlib vs. Third-Party Decision Table
| Task | Stdlib Solution | When to Use Third-Party |
|---|---|---|
| HTTP requests | urllib.request |
Use requests when: complex auth, session management, multipart uploads, or retry logic would require 100+ lines of urllib code |
| JSON handling | json |
Never — stdlib is sufficient |
| CSV parsing | csv |
Use pandas only when: statistical analysis, complex transformations, or DataFrame operations are core to the skill |
| File paths | pathlib |
Never — stdlib is sufficient |
| Date/time | datetime |
Never — stdlib is sufficient |
| Regex | re |
Never — stdlib is sufficient |
| Hashing | hashlib |
Never — stdlib is sufficient |
| Caching | File-based (json + pathlib) | Never for skills — the FileCache pattern in architecture-guide.md is sufficient |
| Data analysis | Manual calculations | Use pandas/numpy when: skill is primarily analytical (10+ statistical operations, pivots, aggregations) |
| PDF generation | Not available | Use reportlab or fpdf2 when PDF output is a core requirement |
| Web scraping | urllib + html.parser |
Use beautifulsoup4 when parsing complex/malformed HTML |
| CLI arguments | argparse |
Never — stdlib is sufficient |
| YAML parsing | Manual (the _parse_frontmatter pattern) |
Use pyyaml only if skill needs to parse arbitrary YAML files (not just SKILL.md frontmatter) |
requirements.txt Rules
When third-party packages are needed:
# requirements.txt
# Pin major.minor, allow patch updates
requests>=2.31,<3.0
pandas>=2.0,<3.0
# For stdlib-only skills, create an empty requirements.txt with a comment:
# No external dependencies required — this skill uses Python stdlib only.
Rules:
- Always create
requirements.txteven if empty (document the stdlib-only decision) - Pin major.minor version to avoid breaking changes
- Never pin exact patch versions (allows security updates)
- Never include dev dependencies (pytest, ruff) — those are for contributors, not users
- List only direct dependencies, not transitive ones
- Include a comment explaining why each package is needed
Common Dependency Patterns by Skill Type
| Skill Type | Typical Dependencies |
|---|---|
| Data analysis (stocks, agriculture, climate) | requests, pandas, numpy |
| Report generation | requests, fpdf2 or reportlab |
| Web scraping | requests, beautifulsoup4 |
| API wrapper | requests (or stdlib urllib) |
| Text processing | Stdlib only (re, json, csv) |
| File format conversion | Stdlib only (or single specialized package) |
| Database interaction | Stdlib sqlite3 (or psycopg2/pymysql for specific DBs) |
Testing Strategy
Why Test Generated Skills
Skills are opinionated software that teams rely on daily. A skill that produces wrong calculations, misparses API responses, or silently drops data is worse than no skill at all. Tests catch these issues before the skill reaches users.
What to Test
Focus tests on the parts most likely to break or produce wrong results:
| Priority | What to Test | Why |
|---|---|---|
| High | Analysis/calculation functions | Wrong math = wrong decisions |
| High | Data parsing (API response → structured data) | APIs change formats, edge cases in real data |
| High | Input validation | Bad input should fail clearly, not silently produce garbage |
| Medium | Output formatting | Reports and summaries should be consistent |
| Medium | Error handling paths | Verify graceful degradation on API failures, missing data |
| Low | Cache logic | Only if custom caching is complex |
| Low | Config loading | Usually trivial |
Test Directory Structure
skill-name/
├── scripts/
│ ├── analyze.py
│ ├── fetch.py
│ └── parse.py
├── tests/
│ ├── test_analyze.py # Unit tests for analysis functions
│ ├── test_parse.py # Unit tests for parsing logic
│ ├── fixtures/
│ │ ├── sample_api_response.json # Real API response (anonymized)
│ │ └── sample_parsed_data.csv # Expected parsed output
│ └── conftest.py # Shared pytest fixtures
Test Patterns
Pattern 1: Test analysis functions with known inputs/outputs
"""Tests for analyze.py — core calculation functions."""
import pytest
from scripts.analyze import analyze_yoy, calculate_trend
def test_yoy_increase():
"""YoY comparison should detect an increase."""
result = analyze_yoy(
current_value=150.0,
previous_value=100.0,
)
assert result["change_percent"] == pytest.approx(50.0)
assert result["interpretation"] == "significant_increase"
def test_yoy_stable():
"""Changes under 2% should be interpreted as stable."""
result = analyze_yoy(current_value=101.0, previous_value=100.0)
assert result["interpretation"] == "stable"
def test_yoy_zero_previous():
"""Division by zero should raise ValueError, not crash."""
with pytest.raises(ValueError, match="previous value cannot be zero"):
analyze_yoy(current_value=100.0, previous_value=0.0)
Pattern 2: Test parsing with fixture data
"""Tests for parse.py — API response parsing."""
import json
from pathlib import Path
from scripts.parse import parse_api_response
FIXTURES = Path(__file__).parent / "fixtures"
def test_parse_normal_response():
"""Standard API response should parse to expected structure."""
raw = json.loads((FIXTURES / "sample_api_response.json").read_text())
result = parse_api_response(raw)
assert len(result) > 0
assert "year" in result[0]
assert "value" in result[0]
def test_parse_empty_response():
"""Empty API response should return empty list, not crash."""
result = parse_api_response({"data": []})
assert result == []
def test_parse_malformed_values():
"""Values with commas (e.g., '15,300,000') should be cleaned."""
raw = {"data": [{"value": "15,300,000", "year": "2023"}]}
result = parse_api_response(raw)
assert result[0]["value"] == 15300000.0
Pattern 3: Mock external API calls
"""Tests for fetch.py — API interaction (mocked)."""
from unittest.mock import patch, MagicMock
from scripts.fetch import fetch_data
@patch("scripts.fetch.urllib.request.urlopen")
def test_fetch_success(mock_urlopen):
"""Successful API call should return parsed JSON."""
mock_response = MagicMock()
mock_response.read.return_value = b'{"data": [{"value": "100"}]}'
mock_response.__enter__ = lambda s: s
mock_response.__exit__ = MagicMock(return_value=False)
mock_urlopen.return_value = mock_response
result = fetch_data(commodity="CORN", year=2023)
assert "data" in result
@patch("scripts.fetch.urllib.request.urlopen")
def test_fetch_rate_limited(mock_urlopen):
"""429 response should raise RateLimitError."""
from urllib.error import HTTPError
mock_urlopen.side_effect = HTTPError(
url="", code=429, msg="Too Many Requests", hdrs={}, fp=None
)
with pytest.raises(Exception, match="[Rr]ate limit"):
fetch_data(commodity="CORN", year=2023)
Running Tests
# Run all tests
cd skill-name/
python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ --cov=scripts --cov-report=term-missing
When to Generate Tests
Tests are generated during Phase 5 (Implementation) after the scripts are written:
- Write all scripts first (steps 1-5 of Phase 5)
- Create
tests/directory with test files for core functions - Create
tests/fixtures/with sample data - Run tests to verify they pass
- Include test instructions in README.md
Note: Tests are recommended but not mandatory for v1.0 of a skill. The validation and security scan gates are always mandatory. Tests become critical when:
- The skill performs financial calculations (wrong math = real cost)
- The skill processes sensitive data (parsing errors = data loss)
- Multiple people will maintain the skill (tests prevent regressions)
- The skill is being published to the team registry (quality expectation is higher)
Anti-Patterns
Anti-Pattern 1: Partial Implementation
❌ NO:
def yoy_comparison(df, commodity, year1, year2):
# Implement YoY comparison
pass
def state_ranking(df, commodity):
# TODO: implement ranking
raise NotImplementedError()
✅ YES:
# [Complete and functional code for BOTH functions]
Anti-Pattern 2: Empty References
❌ NO:
# Analysis Methods
## YoY Comparison
This method compares two years.
## Ranking
This method ranks states.
✅ YES:
# Analysis Methods
## YoY Comparison
### Objective
Compare metrics between current and previous year...
### Detailed Methodology
**Formulas**:
Δ X = X(t) - X(t-1) Δ X% = (Δ X / X(t-1)) × 100
**Decomposition** (for production):
[Complete mathematics]
**Interpretation**:
- |Δ| < 2%: Stable
- Δ > 10%: Significant increase
[...]
### Validations
[List]
### Complete Numerical Example
[With real values]
Anti-Pattern 3: Useless Configs
❌ NO:
{
"api_url": "INSERT_URL",
"api_key": "INSERT_KEY"
}
✅ YES:
{
"_comment": "Configuration for NASS USDA Agent",
"api": {
"base_url": "https://quickstats.nass.usda.gov/api",
"_note": "This is the official USDA NASS API base URL",
"api_key_env": "NASS_API_KEY",
"_key_instructions": "Get free API key from: https://quickstats.nass.usda.gov/api#registration"
}
}
Final Validation
Before delivering to user, verify:
Sanity Test
# 1. Python syntax
find scripts -name "*.py" -exec python -m py_compile {} \;
# 2. JSON syntax
python -c "import json; json.load(open('assets/config.json'))"
# 3. Imports make sense
grep -r "^import\|^from" scripts/*.py | sort | uniq
# Verify all libs are: stdlib, requests, pandas, numpy
# No imports of uninstalled libs
# 4. SKILL.md has frontmatter
head -5 SKILL.md | grep "^---$"
# 5. SKILL.md size
wc -w SKILL.md
# Should be > 5000 words
Final Checklist
- Syntax check passed (Python, JSON)
- No import of non-existent lib
- No TODO or pass
- SKILL.md > 5000 words
- References with content
- README with complete instructions
- DECISIONS.md created
- requirements.txt created