- Rename "Tier 3 评估" to "综合评估", describe dimensions directly
(tone variance, density rhythm, pacing, readability) without
referencing anti-detection framework
- Reframe composite_score from "0=human, 100=AI" to "0=high quality,
100=issues found"
- Change 5.3 role from "gate control" to "supplementary verification"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add article self-check ("检查一下"): generation report + quality advice
- Record enhance_strategy in history.yaml
- Replace Zhuque test data with persona style descriptions in README
- Update descriptions: anti-AI focus → content quality focus
- Remove stale parameter optimization references
- Sync all trigger words across README, auxiliary functions, and Step 8.3
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Problem: AI articles scored MORE human (avg 26.2) than actual human
articles (avg 44.0) — opposite of 朱雀's judgment. AI was gaming the
linear scoring by over-optimizing broken sentences, self-correction,
paragraph variance, etc.
Fix: Two calibration layers added after raw scoring:
1. Bell-curve scoring for 5 over-optimizable dimensions (broken_sentences,
self_correction, sentence_length_range, paragraph_length_variance,
banned_words). Score peaks at human article average, penalizes both
too-low AND too-high values.
2. Over-optimization penalty: 15% global penalty when 60%+ of checks
score above 0.8, indicating suspiciously "perfect" articles.
Results:
Before: Human avg=44.0, AI avg=26.2 (WRONG direction)
After: Human avg=42.5, AI avg=44.0 (CORRECT direction)
A/B test now agrees with 朱雀 (exemplar version scores better)
Baselines derived from 15 human articles tested on 2026-03-30.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Step 4.6: add quick self-check after writing (banned words, sentence
variance, negative emotion) to fix obvious issues before Step 5
- Step 5.2: tighten rewrite scope to specific sentences only, max 3
fixes per round, reduce max rounds from 3 to 2
- Step 5.3: reduce scoring rewrite from 3 rounds to 2, mark
DONE_WITH_CONCERNS instead of infinite loops when score stays >50
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename closing_style → closing_tendency in all 5 personas, making it
a soft preference rather than a hard constraint
- Add closing variation rule + 6 closing patterns table to writing-guide.md
- Step 4.5: LLM judges best closing from content; checks history.yaml
last 3 articles to avoid repeating the same closing_type
- Step 8.1: record closing_type in history.yaml for dedup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add transition segment to user exemplar injection (was 3 segments,
now 4 to match seeds path)
- Clarify priority chain: playbook > persona > exemplar > writing-guide
- Add exemplar fallback row to error handling table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Seeds demonstrate anti-AI structural patterns (sentence variance, real
negative emotion, self-correction, abrupt closings) without imposing a
specific writing style. Step 4.4 falls back to seeds when the user's
exemplar library is empty.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prompts users to import articles when exemplar library is empty,
without blocking the pipeline.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New script: scripts/extract_exemplar.py
Extracts style fingerprints from human-written articles (opening hook,
emotional peak, transition/self-correction, closing) with statistical
analysis (sentence stddev, vocab temperature, negative ratio, paragraph CV).
Auto-detects category, supports batch import.
- SKILL.md: Add Step 4.4 exemplar injection
Loads matching exemplars by category before writing, injects segments
as few-shot style examples in the prompt.
- learn_edits.py: Auto-grow exemplar library
After user edits, auto-extracts the final version into the exemplar
library if humanness_score <= 50.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Renumber all sub-steps to consistent X.Y format (1a-2→1.2, 4a-0→4.1, 5b-2→5.3)
- Add TaskCreate directive: create 8 tasks at pipeline start, update status per step
- Clean up internal references (Step 3b→3.2, Step 4b→4.3, etc.)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
learn_edits.py: patterns now have type/key/description/rule fields,
confidence auto-computed from occurrences + recency with 30-day decay.
--summarize --json outputs aggregated patterns sorted by confidence.
learn-edits.md: playbook.md format changed from free text to structured
YAML rules with confidence levels. Rules with confidence ≥ 5 become
hard constraints in Step 4, < 5 are soft references, < 2 get pruned.
SKILL.md Step 4: playbook priority now confidence-gated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 8a: write composite_score + writing_config_snapshot to history.yaml,
recording which parameters produced which anti-AI score.
Step 4a-0: before writing, read history for the best-scoring article's
parameter combination and use it as reference for the current article.
This closes the feedback loop: write → score → record → learn → write better.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update references from "7层去AI" to "3层反检测", "9项自检" to "14项",
add diagnose.py to directory tree, add "优化参数" to quick start.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
optimize_loop.py was framework-only (needed external LLM API). The
optimization is now an auxiliary function in SKILL.md driven by the
already-running agent. All references updated across README, CLAUDE.md,
diagnose.py, and writing-config.example.yaml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- 11 checks across 2 tiers (6 statistical + 5 pattern), up from 6
- Continuous 0-1 scores instead of pass/fail booleans
- Each check maps to a writing-config parameter via param field
- New checks: negative emotion ratio, adverb density, vocabulary richness,
sentence length range, self-correction patterns
- New --tier3 flag for agent to pass LLM structural analysis score
- param_scores in JSON output: flat param→score map for optimization
- Standalone mode redistributes weights (T1=62.5%, T2=37.5%)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reorganize anti-AI rules into 3 tiers mapped to detector signals:
- Tier 1 (Statistical): sentence variance, vocab temperature, paragraph
rhythm, emotion polarity, adverb density, style drift
- Tier 2 (Linguistic): banned words, broken sentences, unexpected words,
coherence breaking
- Tier 3 (Content): real data anchoring, specificity, density waves,
dimension randomization
New rules added: emotion polarity distribution (1.4), adverb density
control (1.5), inter-paragraph style drift (1.6), unexpected word
usage (2.3). Each rule now references the detection signal it counters.
writing-config.example.yaml updated with corresponding new parameters.
SKILL.md Step 5 checklist aligned to new structure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add VERSION file (1.2.0)
- SKILL.md Step 1: auto-check for updates on each run
- SKILL.md: add "更新" auxiliary function (git pull)
- README: install via git clone instead of cp/ln
- build_openclaw.py: include VERSION in dist
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>