feat: structured edit learning with typed patterns and confidence scoring
learn_edits.py: patterns now have type/key/description/rule fields, confidence auto-computed from occurrences + recency with 30-day decay. --summarize --json outputs aggregated patterns sorted by confidence. learn-edits.md: playbook.md format changed from free text to structured YAML rules with confidence levels. Rules with confidence ≥ 5 become hard constraints in Step 4, < 5 are soft references, < 2 get pruned. SKILL.md Step 4: playbook priority now confidence-gated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
2d3d8e5f54
commit
344f7509f1
3 changed files with 245 additions and 115 deletions
2
SKILL.md
2
SKILL.md
|
|
@ -198,7 +198,7 @@ WebSearch: "{选题关键词} 数据 报告 2025 2026"
|
||||||
|
|
||||||
人格文件定义了:语气浓度、数据呈现方式、情绪弧线、段落节奏、不确定性表达模板等。作为 Step 4c 的硬性约束执行。
|
人格文件定义了:语气浓度、数据呈现方式、情绪弧线、段落节奏、不确定性表达模板等。作为 Step 4c 的硬性约束执行。
|
||||||
|
|
||||||
**优先级**:playbook.md > persona > writing-guide.md。writing-guide 是底线(禁用词等),persona 在此基础上特化风格参数,playbook 是用户个性化的最终覆盖。
|
**优先级**:playbook.md(confidence ≥ 5 的规则)> persona > writing-guide.md。writing-guide 是底线(禁用词等),persona 在此基础上特化风格参数,playbook 中高置信度规则是用户个性化的最终覆盖。playbook 中 confidence < 5 的规则作为软性参考。
|
||||||
|
|
||||||
**4c. 写文章**:
|
**4c. 写文章**:
|
||||||
- H1 标题(20-28 字) + H2 结构,1500-2500 字
|
- H1 标题(20-28 字) + H2 结构,1500-2500 字
|
||||||
|
|
|
||||||
|
|
@ -1,15 +1,15 @@
|
||||||
# 学习人工修改(核心飞轮)
|
# 学习人工修改(核心飞轮)
|
||||||
|
|
||||||
这是 WeWrite 最重要的长期价值。每次用户编辑文章后让系统学习,下一次的初稿就会更接近用户的风格,需要的编辑量越来越少。
|
这是 WeWrite 最重要的长期价值。每次用户编辑文章后让系统学习,下一<EFBFBD><EFBFBD><EFBFBD>的初稿就会更接近用户的风<EFBFBD><EFBFBD>,需要的编辑量越来越少。
|
||||||
|
|
||||||
**飞轮效应**:初稿需要改 30% → 学习 5 次后只需改 15% → 学习 20 次后只需改 5%
|
**飞轮效应**:初稿需要改 30% → 学习 5 次<EFBFBD><EFBFBD>只需改 15% → 学习 20 次后只需改 5%
|
||||||
|
|
||||||
**触发**:用户说"我改了,学习一下"、"学习我的修改"
|
**触发**:用户说"我改了,学习一下"、"学习我的修改"
|
||||||
|
|
||||||
## 1. 获取 draft 和 final
|
## 1. 获取 draft 和 final
|
||||||
|
|
||||||
- **draft**:`output/` 下最新的 .md 文件(按修改时间排序,`ls -t output/*.md | head -1`)
|
- **draft**:`output/` 下最新的 .md 文件(按修改时间排序<EFBFBD><EFBFBD><EFBFBD>`ls -t output/*.md | head -1`)
|
||||||
- **final**:用户提供修改后的版本。主动引导用户:"请把你改好的文章全文粘贴给我,或者告诉我文件路径。如果你是在微信后台编辑器里改的,可以全选复制后直接粘贴到这里。"
|
- **final**:用户提供修改后的版本。主动引<EFBFBD><EFBFBD>用<EFBFBD><EFBFBD>:"请把你改好的文章全文粘贴给我,或<EFBC8C><E68896>告诉我文件路径。如果你是在微信后台编辑器里改的,可以全选复制后直接粘贴到这里。"
|
||||||
|
|
||||||
## 2. 运行 diff 分析
|
## 2. 运行 diff 分析
|
||||||
|
|
||||||
|
|
@ -17,35 +17,72 @@
|
||||||
python3 {skill_dir}/scripts/learn_edits.py --draft {draft_path} --final {final_path}
|
python3 {skill_dir}/scripts/learn_edits.py --draft {draft_path} --final {final_path}
|
||||||
```
|
```
|
||||||
|
|
||||||
## 3. 分析并记录
|
## 3. 分析并记<EFBFBD><EFBFBD> pattern
|
||||||
|
|
||||||
读取脚本输出的 diff 数据,对每个有意义的修改分类:
|
读取脚本输<EFBFBD><EFBFBD><EFBFBD>的 diff 数据和 INSTRUCTIONS FOR AGENT,对每个有意义的修改写入 pattern。
|
||||||
|
|
||||||
- **用词替换**:AI 用了"讲真",人工改成"坦白说"
|
**每个 pattern 必须包含**:
|
||||||
- **段落删除**:人工觉得某段多余
|
- `type`<EFBFBD><EFBFBD>`word_sub` / `para_delete` / `para_add` / `structure` / `title` / `tone` / `expression`
|
||||||
- **段落新增**:人工补充了 AI 没写的内容
|
- `key`:短唯一标识(英文,如 `avoid_jiangzhen`、`shorter_paragraphs`、`more_negative_emotion`)
|
||||||
- **结构调整**:H2 顺序或分段方式的变化
|
- `description`:这次修改是什么(如"把'讲真'替换为'坦白说'")
|
||||||
- **标题修改**:标题风格偏好
|
- `rule`:可执行的写作指令<EFBFBD><EFBFBD><EFBFBD>**必须是祈使句,不是描述句**)
|
||||||
- **语气调整**:整体语气的偏移方向
|
|
||||||
|
|
||||||
将分类结果写入 `lessons/` 下的 diff YAML 文件的 edits 和 patterns 字段。
|
**key 的复用**:如果这次的修改和之前某个 lesson 里的 pattern 是同一种偏好(比如又一次把段落改短了),使用**相同的 key**。这样 `--summarize` 时 occurrences 会累加<E7B4AF><E58AA0><EFBFBD>confidence 自动提升。
|
||||||
|
|
||||||
## 4. 自动触发 Playbook 更新
|
编辑 lesson YAML 文件中的 `patterns` 列表,写入分<E585A5><E58886><EFBFBD>结果。
|
||||||
|
|
||||||
每积累 5 次 lessons,自动触发 playbook 更新:
|
## 4. Playbook 更新
|
||||||
|
|
||||||
|
每积累 5 次 lessons,触发 playbook 更新:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 {skill_dir}/scripts/learn_edits.py --summarize
|
python3 {skill_dir}/scripts/learn_edits.py --summarize --json
|
||||||
```
|
```
|
||||||
|
|
||||||
脚本输出所有 lessons 的汇总数据。**Agent 必须执行以下步骤完成闭环**:
|
读取 JSON 输出,按以下规则更新 `{skill_dir}/playbook.md`:
|
||||||
|
|
||||||
1. 读取 summarize 输出,找出反复出现的 pattern(≥2 次)
|
### playbook.md 格式
|
||||||
2. 读取当前 `{skill_dir}/playbook.md`(如果不存在则从零创建)
|
|
||||||
3. **将 pattern 转化为可执行的写作规则**写入 playbook.md:
|
|
||||||
- 不要写"用户偏好简短段落"(描述性,不可执行)
|
|
||||||
- 要写"段落不超过 80 字,长段必须在 3 句内换行"(指令性,可执行)
|
|
||||||
- 每条规则必须是写作时能直接遵循的具体指令
|
|
||||||
4. 保存 playbook.md
|
|
||||||
|
|
||||||
**验证闭环**:playbook.md 更新后,下次写作时"Playbook 优先"规则会自动加载新 pattern,初稿会反映用户偏好。
|
playbook.md 是 YAML 格式,每条规则带 confidence 和元数据:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# WeWrite Playbook — 从用户<E794A8><E688B7><EFBFBD>辑中学习的写作规则
|
||||||
|
# 由 Agent 自动<E887AA><E58AA8><EFBFBD>护,不要手动编辑
|
||||||
|
# confidence ≥ 5 的规则在 Step 4 写作时作为硬性约束<E7BAA6><E69D9F>行
|
||||||
|
# confidence < 5 的规则作为软性参考
|
||||||
|
|
||||||
|
rules:
|
||||||
|
- key: "shorter_paragraphs"
|
||||||
|
type: "expression"
|
||||||
|
rule: "段落不超过 80 字,长段必须在 3 句内换行"
|
||||||
|
confidence: 7.0
|
||||||
|
occurrences: 4
|
||||||
|
last_seen: "2026-03-28"
|
||||||
|
|
||||||
|
- key: "avoid_jiangzhen"
|
||||||
|
type: "word_sub"
|
||||||
|
rule: "不要使用'讲真',用'坦白说'代替"
|
||||||
|
confidence: 5.0
|
||||||
|
occurrences: 2
|
||||||
|
last_seen: "2026-03-30"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 更新规则
|
||||||
|
|
||||||
|
1. **新增**:summarize 中出<E4B8AD><E587BA>了 playbook 里没有的 key → 直接添加
|
||||||
|
2. **更新**:summarize 中的 confidence/occurrences/rule 比 playbook 里的新 → 用新值覆盖
|
||||||
|
3. **保留**:playbook 中有但 summarize 中没有的规则 → 保留不动(可能是早期学到的,仍然<E4BB8D><E784B6>效)
|
||||||
|
4. **衰减淘汰**:confidence < 2 的规则 → 删除(太旧或不再相关)
|
||||||
|
|
||||||
|
## 5. Step 4 如何使用 playbook
|
||||||
|
|
||||||
|
Step 4 写作时读取 playbook.md:
|
||||||
|
|
||||||
|
- **confidence ≥ 5 的规则**:作为硬性约束执行(和 persona 同级)
|
||||||
|
- **confidence 3-5 的规则**:作为软性参考(倾向遵循但不强制)
|
||||||
|
- **confidence < 3 的规则**:忽略(可能已过时)
|
||||||
|
|
||||||
|
这确保:
|
||||||
|
- 用户反复确认的偏好(高 confidence)被严格执行
|
||||||
|
- 只出现过一次的偏好(低 confidence)不<EFBC89><E4B88D>过度影响
|
||||||
|
- 用户风格变化时,旧规则自然衰减退出
|
||||||
|
|
|
||||||
|
|
@ -3,16 +3,21 @@
|
||||||
Learn from human edits by diffing AI draft vs published final.
|
Learn from human edits by diffing AI draft vs published final.
|
||||||
|
|
||||||
Compares the original AI-generated article with the human-edited version,
|
Compares the original AI-generated article with the human-edited version,
|
||||||
categorizes the changes, and saves lessons to lessons/.
|
computes structured diffs, and saves typed lessons to lessons/.
|
||||||
|
|
||||||
When 5+ lessons accumulate, outputs a prompt for the Agent to update playbook.md.
|
Each lesson has:
|
||||||
|
- type: word_sub / para_delete / para_add / structure / title / tone
|
||||||
|
- occurrences: how many times this pattern has been seen across all lessons
|
||||||
|
- first_seen / last_seen: timestamps for confidence decay
|
||||||
|
- confidence: auto-computed from occurrences + recency
|
||||||
|
|
||||||
|
When summarizing, outputs all patterns with aggregated confidence scores.
|
||||||
|
The Agent uses this to write structured playbook.md rules.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
python3 learn_edits.py --draft path/to/draft.md --final path/to/final.md
|
python3 learn_edits.py --draft path/to/draft.md --final path/to/final.md
|
||||||
python3 learn_edits.py --summarize # summarize all lessons
|
python3 learn_edits.py --summarize # all lessons with confidence
|
||||||
|
python3 learn_edits.py --summarize --json # JSON output for agent
|
||||||
The script does structural analysis; the Agent (LLM) interprets the diffs
|
|
||||||
and writes the lesson YAML + playbook updates.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
|
|
@ -20,13 +25,24 @@ import difflib
|
||||||
import json
|
import json
|
||||||
import re
|
import re
|
||||||
import sys
|
import sys
|
||||||
from datetime import datetime
|
from datetime import datetime, timedelta
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
import yaml
|
import yaml
|
||||||
|
|
||||||
SKILL_DIR = Path(__file__).parent.parent
|
SKILL_DIR = Path(__file__).parent.parent
|
||||||
|
|
||||||
|
# Pattern types with descriptions
|
||||||
|
PATTERN_TYPES = {
|
||||||
|
"word_sub": "用词替换",
|
||||||
|
"para_delete": "段落删除",
|
||||||
|
"para_add": "段落新增",
|
||||||
|
"structure": "结构调整",
|
||||||
|
"title": "标题修改",
|
||||||
|
"tone": "语气调整",
|
||||||
|
"expression": "表达偏好",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
def load_text(path: str) -> str:
|
def load_text(path: str) -> str:
|
||||||
return Path(path).read_text(encoding="utf-8")
|
return Path(path).read_text(encoding="utf-8")
|
||||||
|
|
@ -36,7 +52,6 @@ def split_sections(text: str) -> list[dict]:
|
||||||
"""Split markdown into sections by H2 headers."""
|
"""Split markdown into sections by H2 headers."""
|
||||||
sections = []
|
sections = []
|
||||||
current = {"header": "(intro)", "lines": []}
|
current = {"header": "(intro)", "lines": []}
|
||||||
|
|
||||||
for line in text.split("\n"):
|
for line in text.split("\n"):
|
||||||
if line.strip().startswith("## "):
|
if line.strip().startswith("## "):
|
||||||
if current["lines"] or current["header"] != "(intro)":
|
if current["lines"] or current["header"] != "(intro)":
|
||||||
|
|
@ -44,7 +59,6 @@ def split_sections(text: str) -> list[dict]:
|
||||||
current = {"header": line.strip(), "lines": []}
|
current = {"header": line.strip(), "lines": []}
|
||||||
else:
|
else:
|
||||||
current["lines"].append(line)
|
current["lines"].append(line)
|
||||||
|
|
||||||
sections.append(current)
|
sections.append(current)
|
||||||
return sections
|
return sections
|
||||||
|
|
||||||
|
|
@ -61,44 +75,30 @@ def compute_diff(draft: str, final: str) -> dict:
|
||||||
draft_lines = draft.split("\n")
|
draft_lines = draft.split("\n")
|
||||||
final_lines = final.split("\n")
|
final_lines = final.split("\n")
|
||||||
|
|
||||||
# Line-level diff
|
|
||||||
differ = difflib.unified_diff(draft_lines, final_lines, lineterm="")
|
differ = difflib.unified_diff(draft_lines, final_lines, lineterm="")
|
||||||
diff_lines = list(differ)
|
diff_lines = list(differ)
|
||||||
|
|
||||||
# Categorize changes
|
additions = [l[1:].strip() for l in diff_lines
|
||||||
additions = []
|
if l.startswith("+") and not l.startswith("+++") and l[1:].strip()]
|
||||||
deletions = []
|
deletions = [l[1:].strip() for l in diff_lines
|
||||||
for line in diff_lines:
|
if l.startswith("-") and not l.startswith("---") and l[1:].strip()]
|
||||||
if line.startswith("+") and not line.startswith("+++"):
|
|
||||||
additions.append(line[1:].strip())
|
|
||||||
elif line.startswith("-") and not line.startswith("---"):
|
|
||||||
deletions.append(line[1:].strip())
|
|
||||||
|
|
||||||
# Filter empty lines
|
|
||||||
additions = [l for l in additions if l]
|
|
||||||
deletions = [l for l in deletions if l]
|
|
||||||
|
|
||||||
# Title change
|
|
||||||
draft_title = extract_title(draft)
|
draft_title = extract_title(draft)
|
||||||
final_title = extract_title(final)
|
final_title = extract_title(final)
|
||||||
title_changed = draft_title != final_title
|
|
||||||
|
|
||||||
# Section-level analysis
|
|
||||||
draft_sections = split_sections(draft)
|
draft_sections = split_sections(draft)
|
||||||
final_sections = split_sections(final)
|
final_sections = split_sections(final)
|
||||||
draft_h2s = [s["header"] for s in draft_sections if s["header"] != "(intro)"]
|
draft_h2s = [s["header"] for s in draft_sections if s["header"] != "(intro)"]
|
||||||
final_h2s = [s["header"] for s in final_sections if s["header"] != "(intro)"]
|
final_h2s = [s["header"] for s in final_sections if s["header"] != "(intro)"]
|
||||||
structure_changed = draft_h2s != final_h2s
|
|
||||||
|
|
||||||
# Word count change
|
|
||||||
draft_chars = len(draft.replace("\n", "").replace(" ", ""))
|
draft_chars = len(draft.replace("\n", "").replace(" ", ""))
|
||||||
final_chars = len(final.replace("\n", "").replace(" ", ""))
|
final_chars = len(final.replace("\n", "").replace(" ", ""))
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"title_changed": title_changed,
|
"title_changed": draft_title != final_title,
|
||||||
"draft_title": draft_title,
|
"draft_title": draft_title,
|
||||||
"final_title": final_title,
|
"final_title": final_title,
|
||||||
"structure_changed": structure_changed,
|
"structure_changed": draft_h2s != final_h2s,
|
||||||
"draft_h2s": draft_h2s,
|
"draft_h2s": draft_h2s,
|
||||||
"final_h2s": final_h2s,
|
"final_h2s": final_h2s,
|
||||||
"lines_added": len(additions),
|
"lines_added": len(additions),
|
||||||
|
|
@ -111,22 +111,22 @@ def compute_diff(draft: str, final: str) -> dict:
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def save_diff_for_analysis(diff_result: dict, draft_path: str, final_path: str):
|
def save_lesson(diff_result: dict, draft_path: str, final_path: str) -> Path:
|
||||||
"""Save diff data for Agent to analyze and write lessons."""
|
"""Save structured lesson data for Agent to analyze."""
|
||||||
lessons_dir = SKILL_DIR / "lessons"
|
lessons_dir = SKILL_DIR / "lessons"
|
||||||
lessons_dir.mkdir(parents=True, exist_ok=True)
|
lessons_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
date_str = datetime.now().strftime("%Y-%m-%d")
|
date_str = datetime.now().strftime("%Y-%m-%d")
|
||||||
diff_file = lessons_dir / f"{date_str}-diff.yaml"
|
lesson_file = lessons_dir / f"{date_str}-diff.yaml"
|
||||||
|
|
||||||
# If file exists, append a counter
|
|
||||||
counter = 1
|
counter = 1
|
||||||
while diff_file.exists():
|
while lesson_file.exists():
|
||||||
diff_file = lessons_dir / f"{date_str}-diff-{counter}.yaml"
|
lesson_file = lessons_dir / f"{date_str}-diff-{counter}.yaml"
|
||||||
counter += 1
|
counter += 1
|
||||||
|
|
||||||
data = {
|
data = {
|
||||||
"date": date_str,
|
"date": date_str,
|
||||||
|
"timestamp": datetime.now().isoformat(),
|
||||||
"draft_file": str(draft_path),
|
"draft_file": str(draft_path),
|
||||||
"final_file": str(final_path),
|
"final_file": str(final_path),
|
||||||
"diff_summary": {
|
"diff_summary": {
|
||||||
|
|
@ -138,45 +138,138 @@ def save_diff_for_analysis(diff_result: dict, draft_path: str, final_path: str):
|
||||||
"lines_deleted": diff_result["lines_deleted"],
|
"lines_deleted": diff_result["lines_deleted"],
|
||||||
"char_diff": diff_result["char_diff"],
|
"char_diff": diff_result["char_diff"],
|
||||||
},
|
},
|
||||||
"edits": [], # Agent fills this after analysis
|
# Agent fills these after analyzing the draft and final:
|
||||||
"patterns": [], # Agent fills this after analysis
|
"patterns": [],
|
||||||
|
# Pattern format (Agent writes):
|
||||||
|
# - type: "word_sub" # one of PATTERN_TYPES keys
|
||||||
|
# key: "avoid_jiangzhen" # short unique identifier
|
||||||
|
# description: "把'讲真'替换为'坦白说'"
|
||||||
|
# rule: "不要使用'讲真',用'坦白说'代替" # imperative, executable
|
||||||
}
|
}
|
||||||
|
|
||||||
with open(diff_file, "w", encoding="utf-8") as f:
|
with open(lesson_file, "w", encoding="utf-8") as f:
|
||||||
yaml.dump(data, f, allow_unicode=True, default_flow_style=False)
|
yaml.dump(data, f, allow_unicode=True, default_flow_style=False)
|
||||||
|
|
||||||
return diff_file
|
return lesson_file
|
||||||
|
|
||||||
|
|
||||||
def count_lessons() -> int:
|
def load_all_lessons() -> list[dict]:
|
||||||
"""Count existing lesson files."""
|
"""Load all lesson files."""
|
||||||
lessons_dir = SKILL_DIR / "lessons"
|
lessons_dir = SKILL_DIR / "lessons"
|
||||||
if not lessons_dir.exists():
|
if not lessons_dir.exists():
|
||||||
return 0
|
return []
|
||||||
return len(list(lessons_dir.glob("*-diff*.yaml")))
|
lessons = []
|
||||||
|
for f in sorted(lessons_dir.glob("*-diff*.yaml")):
|
||||||
|
|
||||||
def summarize_lessons():
|
|
||||||
"""Load all lessons and output for Agent to update playbook."""
|
|
||||||
lessons_dir = SKILL_DIR / "lessons"
|
|
||||||
if not lessons_dir.exists():
|
|
||||||
print("No lessons directory found.")
|
|
||||||
return
|
|
||||||
|
|
||||||
lesson_files = sorted(lessons_dir.glob("*-diff*.yaml"))
|
|
||||||
if not lesson_files:
|
|
||||||
print("No lessons found.")
|
|
||||||
return
|
|
||||||
|
|
||||||
all_lessons = []
|
|
||||||
for f in lesson_files:
|
|
||||||
with open(f, "r", encoding="utf-8") as fh:
|
with open(f, "r", encoding="utf-8") as fh:
|
||||||
data = yaml.safe_load(fh)
|
data = yaml.safe_load(fh)
|
||||||
if data:
|
if data:
|
||||||
all_lessons.append(data)
|
lessons.append(data)
|
||||||
|
return lessons
|
||||||
|
|
||||||
print(f"Total lessons: {len(all_lessons)}")
|
|
||||||
print(json.dumps(all_lessons, ensure_ascii=False, indent=2))
|
def compute_confidence(occurrences: int, first_seen: str, last_seen: str) -> float:
|
||||||
|
"""Compute confidence score from frequency and recency.
|
||||||
|
|
||||||
|
Confidence = base_from_occurrences + recency_bonus - age_decay.
|
||||||
|
|
||||||
|
- 1 occurrence = 3 (low, might be one-off)
|
||||||
|
- 2 occurrences = 5 (moderate, likely a preference)
|
||||||
|
- 3+ occurrences = 7+ (high, confirmed preference)
|
||||||
|
- Recency bonus: +1 if last_seen within 7 days
|
||||||
|
- Age decay: -1 per 30 days since last_seen (user style evolves)
|
||||||
|
- Clamped to 1-10
|
||||||
|
"""
|
||||||
|
base = min(8, 2 + occurrences * 2)
|
||||||
|
|
||||||
|
try:
|
||||||
|
last = datetime.fromisoformat(last_seen)
|
||||||
|
days_since = (datetime.now() - last).days
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
days_since = 0
|
||||||
|
|
||||||
|
recency_bonus = 1.0 if days_since <= 7 else 0.0
|
||||||
|
age_decay = max(0, days_since // 30)
|
||||||
|
|
||||||
|
return max(1.0, min(10.0, base + recency_bonus - age_decay))
|
||||||
|
|
||||||
|
|
||||||
|
def aggregate_patterns(lessons: list[dict]) -> list[dict]:
|
||||||
|
"""Aggregate patterns across all lessons. Returns sorted by confidence."""
|
||||||
|
pattern_map = {} # key → aggregated data
|
||||||
|
|
||||||
|
for lesson in lessons:
|
||||||
|
date = lesson.get("date", "")
|
||||||
|
timestamp = lesson.get("timestamp", date)
|
||||||
|
for p in lesson.get("patterns", []):
|
||||||
|
key = p.get("key", "")
|
||||||
|
if not key:
|
||||||
|
continue
|
||||||
|
if key not in pattern_map:
|
||||||
|
pattern_map[key] = {
|
||||||
|
"key": key,
|
||||||
|
"type": p.get("type", "expression"),
|
||||||
|
"description": p.get("description", ""),
|
||||||
|
"rule": p.get("rule", ""),
|
||||||
|
"occurrences": 0,
|
||||||
|
"first_seen": timestamp,
|
||||||
|
"last_seen": timestamp,
|
||||||
|
}
|
||||||
|
entry = pattern_map[key]
|
||||||
|
entry["occurrences"] += 1
|
||||||
|
# Keep the most recent description/rule (may evolve)
|
||||||
|
if p.get("description"):
|
||||||
|
entry["description"] = p["description"]
|
||||||
|
if p.get("rule"):
|
||||||
|
entry["rule"] = p["rule"]
|
||||||
|
# Update timestamps
|
||||||
|
if timestamp < entry["first_seen"]:
|
||||||
|
entry["first_seen"] = timestamp
|
||||||
|
if timestamp > entry["last_seen"]:
|
||||||
|
entry["last_seen"] = timestamp
|
||||||
|
|
||||||
|
# Compute confidence for each
|
||||||
|
results = []
|
||||||
|
for entry in pattern_map.values():
|
||||||
|
entry["confidence"] = round(compute_confidence(
|
||||||
|
entry["occurrences"], entry["first_seen"], entry["last_seen"]
|
||||||
|
), 1)
|
||||||
|
results.append(entry)
|
||||||
|
|
||||||
|
# Sort by confidence descending
|
||||||
|
results.sort(key=lambda x: x["confidence"], reverse=True)
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def summarize_lessons(as_json: bool = False):
|
||||||
|
"""Load all lessons, aggregate patterns, output with confidence scores."""
|
||||||
|
lessons = load_all_lessons()
|
||||||
|
if not lessons:
|
||||||
|
print("No lessons found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
patterns = aggregate_patterns(lessons)
|
||||||
|
|
||||||
|
if as_json:
|
||||||
|
print(json.dumps({
|
||||||
|
"total_lessons": len(lessons),
|
||||||
|
"total_patterns": len(patterns),
|
||||||
|
"patterns": patterns,
|
||||||
|
}, ensure_ascii=False, indent=2))
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"Total lessons: {len(lessons)}")
|
||||||
|
print(f"Unique patterns: {len(patterns)}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
for p in patterns:
|
||||||
|
type_label = PATTERN_TYPES.get(p["type"], p["type"])
|
||||||
|
conf_bar = "█" * int(p["confidence"]) + "░" * (10 - int(p["confidence"]))
|
||||||
|
print(f" {conf_bar} {p['confidence']:4.1f} [{type_label}] {p['key']}")
|
||||||
|
print(f" {p['description']}")
|
||||||
|
if p["rule"]:
|
||||||
|
print(f" → {p['rule']}")
|
||||||
|
print(f" seen {p['occurrences']}x, first {p['first_seen'][:10]}, last {p['last_seen'][:10]}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
|
|
@ -184,21 +277,19 @@ def main():
|
||||||
parser.add_argument("--draft", help="Path to AI draft")
|
parser.add_argument("--draft", help="Path to AI draft")
|
||||||
parser.add_argument("--final", help="Path to human-edited final")
|
parser.add_argument("--final", help="Path to human-edited final")
|
||||||
parser.add_argument("--summarize", action="store_true", help="Summarize all lessons")
|
parser.add_argument("--summarize", action="store_true", help="Summarize all lessons")
|
||||||
|
parser.add_argument("--json", action="store_true", help="JSON output (with --summarize)")
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
if args.summarize:
|
if args.summarize:
|
||||||
summarize_lessons()
|
summarize_lessons(as_json=args.json)
|
||||||
return
|
return
|
||||||
|
|
||||||
if not args.draft or not args.final:
|
if not args.draft or not args.final:
|
||||||
print("Error: --draft and --final required", file=sys.stderr)
|
print("Error: --draft and --final required", file=sys.stderr)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
# Load texts
|
|
||||||
draft = load_text(args.draft)
|
draft = load_text(args.draft)
|
||||||
final = load_text(args.final)
|
final = load_text(args.final)
|
||||||
|
|
||||||
# Compute diff
|
|
||||||
diff_result = compute_diff(draft, final)
|
diff_result = compute_diff(draft, final)
|
||||||
|
|
||||||
# Print summary
|
# Print summary
|
||||||
|
|
@ -230,43 +321,45 @@ def main():
|
||||||
for line in diff_result["additions_sample"][:10]:
|
for line in diff_result["additions_sample"][:10]:
|
||||||
print(f" + {line[:80]}")
|
print(f" + {line[:80]}")
|
||||||
|
|
||||||
# Save for Agent analysis
|
# Save lesson
|
||||||
diff_file = save_diff_for_analysis(diff_result, args.draft, args.final)
|
lesson_file = save_lesson(diff_result, args.draft, args.final)
|
||||||
print(f"\nDiff saved to: {diff_file}")
|
print(f"\nLesson saved to: {lesson_file}")
|
||||||
|
|
||||||
# Check if playbook update should be triggered
|
lesson_count = len(load_all_lessons())
|
||||||
lesson_count = count_lessons()
|
|
||||||
print(f"Total lessons: {lesson_count}")
|
print(f"Total lessons: {lesson_count}")
|
||||||
|
|
||||||
if lesson_count >= 5 and lesson_count % 5 == 0:
|
if lesson_count >= 5 and lesson_count % 5 == 0:
|
||||||
print(f"\n{'='*60}")
|
print(f"\n{'=' * 60}")
|
||||||
print("PLAYBOOK UPDATE TRIGGERED")
|
print("PLAYBOOK UPDATE TRIGGERED")
|
||||||
print(f"{'='*60}")
|
print(f"{'=' * 60}")
|
||||||
print(f"{lesson_count} lessons accumulated. Agent should:")
|
print(f"{lesson_count} lessons. Agent should run:")
|
||||||
print(f"1. Read all lessons: python3 learn_edits.py --summarize")
|
print(f" python3 scripts/learn_edits.py --summarize --json")
|
||||||
print(f"2. Read current playbook: playbook.md")
|
print(f"Then update playbook.md with high-confidence patterns.")
|
||||||
print(f"3. Update playbook with recurring patterns from lessons")
|
|
||||||
|
|
||||||
# Output instructions for Agent
|
# Instructions for Agent
|
||||||
print(f"""
|
print(f"""
|
||||||
{'='*60}
|
{'=' * 60}
|
||||||
INSTRUCTIONS FOR AGENT
|
INSTRUCTIONS FOR AGENT
|
||||||
{'='*60}
|
{'=' * 60}
|
||||||
|
|
||||||
Read the draft and final versions, then analyze the edits:
|
Read the draft and final versions, then for each meaningful edit:
|
||||||
|
|
||||||
1. Read: {args.draft}
|
1. Read: {args.draft}
|
||||||
2. Read: {args.final}
|
2. Read: {args.final}
|
||||||
3. For each meaningful edit, classify it:
|
3. For each edit, add a pattern entry to {lesson_file}:
|
||||||
- type: "用词替换" / "段落删除" / "段落新增" / "结构调整" / "标题修改" / "语气调整"
|
|
||||||
- before: (original text)
|
|
||||||
- after: (edited text)
|
|
||||||
- pattern: (what this tells us about the user's preference)
|
|
||||||
|
|
||||||
4. Update {diff_file} with the edits and patterns lists.
|
patterns:
|
||||||
|
- type: "word_sub" # {' / '.join(PATTERN_TYPES.keys())}
|
||||||
|
key: "short_unique_id" # e.g. "avoid_jiangzhen", "shorter_paragraphs"
|
||||||
|
description: "把'讲真'替换为'坦白说'"
|
||||||
|
rule: "不要使用'讲真',用'坦白说'代替" # imperative, executable
|
||||||
|
|
||||||
5. If this is a recurring pattern (seen in previous lessons too),
|
4. Rules must be imperative (可执行的指令), not descriptive.
|
||||||
consider updating playbook.md.
|
BAD: "用户偏好简短段落"
|
||||||
|
GOOD: "段落不超过 80 字,长段必须在 3 句内换行"
|
||||||
|
|
||||||
|
5. If pattern already exists in previous lessons (same key),
|
||||||
|
confidence will auto-increase on next --summarize.
|
||||||
""")
|
""")
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue