wewrite

Author	SHA1	Message	Date
wangzhuc	02f5e6d93b	fix: calibrate humanness_score with bell-curve and over-optimization penalty Problem: AI articles scored MORE human (avg 26.2) than actual human articles (avg 44.0) — opposite of 朱雀's judgment. AI was gaming the linear scoring by over-optimizing broken sentences, self-correction, paragraph variance, etc. Fix: Two calibration layers added after raw scoring: 1. Bell-curve scoring for 5 over-optimizable dimensions (broken_sentences, self_correction, sentence_length_range, paragraph_length_variance, banned_words). Score peaks at human article average, penalizes both too-low AND too-high values. 2. Over-optimization penalty: 15% global penalty when 60%+ of checks score above 0.8, indicating suspiciously "perfect" articles. Results: Before: Human avg=44.0, AI avg=26.2 (WRONG direction) After: Human avg=42.5, AI avg=44.0 (CORRECT direction) A/B test now agrees with 朱雀 (exemplar version scores better) Baselines derived from 15 human articles tested on 2026-03-30. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 00:09:14 +08:00
wangzhuc	f7fe44c152	fix: expand negative markers and vocabulary temperature word lists NEGATIVE_MARKERS: 26 → 51 words Added: despair (绝望/迷茫/心累), deception (骗/忽悠/割韭菜/套路), failure (白费/黄了/凉了), self-deprecation (傻/天真/自嗨), sarcasm (呵呵/行吧/真服了), complaint (受够了/苦哈哈) COLD_WORDS: 7 → 25 (技术栈/标准化/护城河/飞轮/底层逻辑/PMF/ROI...) WARM_WORDS: 7 → 15 (老实说/这么说吧/你想啊/有意思的是...) HOT_WORDS: 8 → 19 (凡尔赛/标题党/躺平/摆烂/破防/上头/内耗...) WILD_WORDS: 7 → 17 (苦哈哈/傻乎乎/交学费/踩坑/翻车...) Impact on 15 exemplar articles: neg score avg: 0.15 → 0.27 (+80%) temp_mix: still low on short segments, but full articles now score 0.33-1.00 vs previously 0.00 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 23:23:51 +08:00
wangzhuc	df72e51ea1	feat: rewrite humanness_score.py with continuous scoring and param mapping - 11 checks across 2 tiers (6 statistical + 5 pattern), up from 6 - Continuous 0-1 scores instead of pass/fail booleans - Each check maps to a writing-config parameter via param field - New checks: negative emotion ratio, adverb density, vocabulary richness, sentence length range, self-correction patterns - New --tier3 flag for agent to pass LLM structural analysis score - param_scores in JSON output: flat param→score map for optimization - Standalone mode redistributes weights (T1=62.5%, T2=37.5%) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 19:54:11 +08:00
wangzhuc	8e16c70ead	新增优化循环框架：humanness_score.py + optimize_loop.py 借鉴 Karpathy autoresearch 的 change→score→keep/rollback 模式： - humanness_score.py: 固定打分器，两层评分（客观checklist + 主观读者感） 6项客观检查：禁用词/真实引用/破句/句长方差/段长方差/词汇温度 1项主观LLM判官（stub，需配置API）复合分 0-100（越低越像人） - optimize_loop.py: 迭代框架，通过修改 writing-config.yaml 参数自动生成文章→打分→保留或回滚→记录到 results.tsv Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 23:18:55 +08:00

4 commits