Dream Self-improving — 夜间记忆蒸馏与自我进化

🧠 v4.x 已集成 Long-Term RAG — MetaGPT风格短长记忆合并，条目老化后自动晋升到 RAG 层

现状

Phase 3（✅ 已实现）： OpenClaw Hook hippocampus 监听每条消息，实时写入 memory/logs/ Phase 3（✅ 已实现）： dream.py v4.x 定时蒸馏 + M-FLOW Bundle Search检索 Phase 3（✅ 已实现）： dream.py v4.x Long-Term RAG 长记忆层

核心升级：Long-Term RAG

参考 MetaGPT 的 RoleZeroLongTermMemory 设计，新增短长记忆合并机制：

short-term-recall.json  ←  活跃recall条目（上限200条）
memory/.rag/longterm.jsonl  ←  老旧条目RAG存储

晋升条件：

条目 age > 30天（从最后召回时间算）
且 recallCount < 3（未被频繁召回）

召回流程：

蒸馏前，从当日高权重条目提取关键词
用关键词查询 RAG，召回相关旧记忆
旧记忆注入蒸馏上下文，让 AI 知道"之前有过什么"

效果： 记忆越来越精准，不像以前每次都从零开始。

完整链路

用户对话
   ↓
OpenClaw Hook: message:preprocessed
   ↓
丘脑过滤（Thalamus）→ 杏仁核标记（Amygdala）→ 海马体存储（memory/logs/）
   ↓
cron 触发（早7点/晚10点）
dream.py v4.x
   ↓
[4.5] RAG查询 — 从当日条目提取关键词 → 查询memory/.rag/longterm.jsonl → 注入蒸馏上下文
   ↓
Bundle Search检索（替代简单grep）
   ↓
杏仁核标记融合 → Auditor审计 → 分析皮层模式识别 → 前额叶蒸馏规划
   ↓
[4.6] RAG晋升 — 30天+未召回条目 → 写入longterm.jsonl
   ↓
归档区 → 真相文件写回 → 梦境报告

M-FLOW 核心架构

倒锥知识图谱（Inverted Cone）

所有记忆组织为四层有向图，形成倒锥结构：

          锥尖（容易精确命中）
             ↓
    ┌─────────────────────────┐
    │  L4 Entity             │  ← 用户/项目/系统等实体节点
    │  L3 FacetPoint         │  ← 具体属性、特征、标签
    │  L2 Facet               │  ← 一组相关特征
    │  L1 Episode（锥底）     │  ← 最终返回的知识单元
    └─────────────────────────┘
          锥底（返回给用户）

搜索逻辑（Bundle Search）：

锥尖广撒网：查询向量化后同时在4层搜索，每个集合返回最多100个候选
投影到图中：命中点作为入口，提取周围子图（边+邻居+连接关系）
代价传播：沿边从锥尖向锥底传播，Episode得分 = 所有路径中最小代价

三条核心设计原则：

原则	说明	对应效果
边携带语义	每条边附带自然语言描述，参与检索	不是被动连接，是主动语义过滤器
路径最小代价	一条强证据链就足以证明相关性	不被无关路径稀释分数
惩罚直接命中Episode	直接匹配摘要反而加惩罚	偏好精准锚点路径，防止宽泛匹配

脑区协同架构

① 丘脑（Thalamus）— 注意力门控

过滤纯问候/简单确认，只记录有意义的事件标记类型：event / decision / correction / completed / insight / error

② 杏仁核（Amygdala）— 情绪标记

correction/error/decision/completed/insight 携带 HIGH 权重，优先蒸馏

③ 海马体（Hippocampus）— M-FLOW图存储 + RAG

Phase 1：memory/logs/ 追加日志（Episode层）

Phase 2：构建M-FLOW图结构：

Episode (L1)              ← daily log / topic file
  ↓ semantic edge
Facet (L2)                ← grouping: correction_group, project_xxx
  ↓ semantic edge  
FacetPoint (L3)           ← specific tag: error.timeout, user.pref
  ↓ semantic edge
Entity (L4)               ← user, project, tool, skill

FacetPoint = type + topic + keywords 的向量描述（向量化后参与Bundle Search） 语义边描述 = "这个FacetPoint为什么属于这个Episode" 的自然语言说明

④ 前额叶（Prefrontal Cortex）— Bundle Search + RAG召回 + 蒸馏规划

Bundle Search检索替代简单grep：

查询 → 向量化 → 4层锥形搜索 → 代价传播 → 最小路径Episode

RAG召回（v4.x新增）：

当日关键词 → 查询longterm.jsonl → 召回相关旧记忆 → 注入蒸馏上下文

⑤ 蓝斑核（Locus Coeruleus）— 警觉与新鲜度信号

freshness分数——最近被提及的记忆权重更高

Long-Term RAG Layer 详解

存储结构

memory/
├── .dreams/
│   └── short-term-recall.json   # 活跃recall条目（上限200条）
└── .rag/
    └── longterm.jsonl          # 老旧条目RAG存储（JSONL格式）

晋升机制

# 晋升条件
if age_days > 30 and recall_count < 3:
    promote_to_longterm_rag(entry)

召回机制

# 蒸馏前
keywords = [v['snippet'][:100] for v in tagged.values()][:20]
query = ' '.join(keywords[:5])
rag_results = query_longterm_rag(query, k=5)

# 召回结果注入蒸馏上下文
learnings['LEARNINGS.md'] += f"\n\n## Long-Term Memory (RAG)\n{rag_text}"

手动命令

# 查看短/长记忆状态
python skills/dream-selfimproving/scripts/longterm_rag.py --status

# 手动晋升老条目
python skills/dream-selfimproving/scripts/longterm_rag.py --promote

# 搜索长记忆
python skills/dream-selfimproving/scripts/longterm_rag.py --query "关键词"

Pattern Library

Patterns are reusable response templates extracted from recurring learnings:

memory/patterns/
└── p-xxx.md           # Pattern files with trigger + response

Pattern格式（含M-FLOW元数据）：

---
name: pattern名称
trigger: 什么情况下触发
response: 如何响应
examples: [案例1, 案例2]
created: YYYY-MM-DD
updated: YYYY-MM-DD
# M-FLOW 元数据
entity: pattern          # L4 Entity
facets: [tag1, tag2]    # L3 FacetPoints
episode_id: p-xxx        # L1 Episode
---

Memory Taxonomy & M-FLOW映射

Memory Type	L4 Entity	L3 FacetPoints	L1 Episode
user	user.luyi	role, pref, goal, communication_style	topics/user_*.md
feedback	feedback	correction, error, insight, confirmation	topics/feedback_*.md
project	project.{name}	decision, tool, deadline, context	topics/project_*.md
reference	reference	credential, link, skill, system	topics/reference_*.md
longterm	(RAG)	aged, promoted	.rag/longterm.jsonl

Directory Structure (v4.x)

memory/
├── graph/                         # M-FLOW 知识图谱
│   ├── entities.json              # L4 Entity 节点列表
│   ├── facetpoints.json           # L3 FacetPoint 节点列表
│   ├── facets.json                # L2 Facet 节点列表
│   ├── episodes.json              # L1 Episode 节点列表
│   ├── edges.json                 # 语义边（含描述文本）
│   └── index.json                 # 图索引 + 向量锚点
├── logs/
│   └── YYYY/MM/YYYY-MM-DD.md     # Daily append-only logs (Episode)
├── topics/                        # Distilled topic memories
│   ├── user_xxx.md
│   ├── feedback_xxx.md
│   ├── project_xxx.md
│   └── reference_xxx.md
├── patterns/                      # Pattern Library
│   └── p-xxx.md
├── episodes/                      # Project narratives
├── .dreams/
│   └── short-term-recall.json     # 活跃recall条目（上限200条）
├── .rag/
│   └── longterm.jsonl             # Long-Term RAG（v4.x新增）
├── procedures.md                  # Workflow preferences
├── archive.md                     # Compressed old entries
├── dream-log.md                   # Dream cycle reports
└── MEMORY.md                      # INDEX only

.learnings/                        # self-improving-agent
├── LEARNINGS.md
├── ERRORS.md
└── FEATURE_REQUESTS.md

Health Score (v4.x)

Metric	Weight	Formula
Freshness	0.20	entries_referenced_last_30_days / total
Coverage	0.20	categories_updated_last_14_days / 10
Coherence	0.20	entries_with_semantic_edges / total
Graph Connectivity	0.20	connected_components_ratio
Efficiency	0.10	max(0, 1 - line_count/500)
Reachability	0.10	Bundle Search路径覆盖率

Dream Distillation Steps (v4.x)

When cron triggers:

Bundle Search预热：用今日日志构建临时图结构，快速验证图连通性
Read memory/logs/{date}.md
Read .learnings/LEARNINGS.md, .learnings/ERRORS.md, .learnings/FEATURE_REQUESTS.md
Read MEMORY.md, topic files, graph/index.json, procedures.md for context
Snapshot BEFORE: count entries, decisions, lessons, procedures
[4.5] RAG召回：从当日条目提取关键词 → 查询longterm.jsonl → 注入蒸馏上下文
图增强检索：对每个learnings entry执行Bundle Search，找到相关Episode
Distillation Agent: Run sub-agent on raw entries + learnings + RAG results → produce:
- 3-5 genuine insights ("I learned that...")
- 1-3 tomorrow action items
- 0-3 topic files to write to memory/topics/
- Health metric interpretation
[4.6] RAG晋升：30天+未召回条目 → 写入longterm.jsonl
更新图结构：
- 新Episode写入 graph/episodes.json
- 新FacetPoint写入 graph/facetpoints.json
- 新边写入 graph/edges.json（含语义描述）
Write topic files (from Distillation Agent output)
Update truth files (user_state.md, pending.md)
Update graph/index.json entry metadata + 重新计算向量锚点
Compute health metrics → update graph/index.json stats
Archive eligible entries → append to archive.md
Update MEMORY.md index (max 200 lines)
Snapshot AFTER: calculate deltas
Write dream report to memory/dreams/{date}.md and dream-log.md
[Optional SwarmRecall]: 如果配置了API key，执行云端图同步

Dream Report Format (v4.x)

# 🌙 Dream Report — {date}

## M-FLOW Graph Status
- Entities: N | FacetPoints: N | Episodes: N | Edges: N
- Graph Connectivity: {score}% | Avg Path Cost: {cost:.3f}

## RAG Status
- Short-term recall: N 条 | Long-term: M 条
- Promoted this cycle: N 条

## Health Insights
- {insight based on graph connectivity / Bundle Search coverage}

## Insights ("I Learned")
- {genuine insight 1}
- {genuine insight 2}

## Tomorrow's Focus
- {actionable item 1}
- {actionable item 2}

## Topic Files Written
- {filename}: {title}

## Graph Updates
- New episodes: N
- New semantic edges: N
- Pruned nodes: N

## Analysis
- Recurring errors found: {list}
- Root causes identified: {analysis}
- Bundle Search paths evaluated: {count}

## Patterns Updated
- {pattern_name}: {change}

User Prompts

"dream report" / "梦境报告" → read and display latest dream report
"dream" / "做梦" → run distillation now
"/dream status" → show M-FLOW graph stats, health score, pattern count
"/dream search {query}" → run Bundle Search and show top results
"/dream rag status" → show RAG status (from longterm_rag.py)

Scripts

dream.py — Phase 2 蒸馏脚本（v4.x，M-FLOW Bundle Search + RAG召回/晋升）
update-cron-date.py — 每日 cron 日期注入
graph-builder.py — 从日志构建M-FLOW图结构
bundle-search.py — Bundle Search检索实现
longterm_rag.py — Long-Term RAG 管理脚本（v4.x新增）

Phase 1 启用（hippocampus hook）

Hook 目录： ~/.openclaw/hooks/hippocampus/ 已配置： openclaw.json 中 hooks.internal.entries.hippocampus: enabled: true

功能： 监听 message:preprocessed 事件，自动记录对话到 memory/logs/YYYY/MM/YYYY-MM-DD.md

丘脑过滤规则：

纯问候 / 简单确认（<20字）不记录
高权重标记：correction / error / decision / completed / insight

重启 gateway 后生效：

schtasks /run /tn "OpenClaw Gateway"

M-FLOW vs 旧架构对比

维度	旧架构（平坦检索）	M-FLOW（倒锥图路由）
存储结构	平面文件列表	四层有向图
检索方式	grep / 向量相似度	Bundle Search代价传播
关系表示	简单link引用	带语义描述的边
短长记忆	无分层	30天老化晋升RAG

与MetaGPT对比

维度	MetaGPT RoleZeroLongTermMemory	Dream Long-Term RAG
RAG引擎	Chroma + LLMRanker	JSONL + 关键词匹配
召回触发	memory_k 溢出或用户需求	每次蒸馏前
晋升条件	count > memory_k	age > 30天且 recallCount < 3
向量化	embedding 模型	词袋模型（简化版）
复杂度	依赖 Chroma/llama-index	纯 Python，无外部依赖