memories: tighten consolidation prompt schema and indexing guidance (#12653)

## Summary
- tighten the Phase 2 consolidation prompt for task-oriented `MEMORY.md`
generation
- address Phase 2 under-coverage / "laziness" with stronger workflow +
final-pass checks
- improve recency/ordering behavior for `MEMORY.md` and
`memory_summary.md`
- rewrite `## What's in Memory` as a clearer routing index with explicit
recent-3-day structure

## Key Changes
- `MEMORY.md` schema cleanup:
- align on `## Task <n>` task sections (remove stale `task:`
rule/example references)
  - include `thread_id` in rollout provenance examples
  - compact comma-separated `### keywords` format
- Phase 2 completeness guardrails:
  - chunked INIT coverage pass over `raw_memories.md`
  - incremental net-new indexing / routing steps
- stronger final checks (day ordering, topic coverage, keyword
searchability, accidental duplication)
- Recency / ordering rules:
- clearer scan-order guidance for raw memories (newest-first bias in
incremental mode)
- utility+recency ordering guidance for `MEMORY.md` task groups and
summary topics
  - rebuild recent active window from current `updated_at` coverage
- `## What's in Memory` rewrite:
  - index/routing-layer framing (not a mini-handbook)
  - explicit recent 3 distinct memory-day layout
  - richer recent-topic entries + compact lower-priority routing entries
- clearer `desc` / `learnings` expectations and separation from `##
General Tips`
- Explicitly allow rollout-summary reuse across multiple tasks/blocks
when it supports distinct task angles (with distinct task-local value)

## Notes
- Prompt-template only:
`codex-rs/core/templates/memories/consolidation.md`
- No runtime/code changes

## Validation
- Manual diff review only
This commit is contained in:
zuxin-oai
2026-02-24 01:41:20 -08:00
committed by GitHub
parent 68a7d98363
commit 15f6cfb047

View File

@@ -99,9 +99,11 @@ Phase 2 has two operating styles:
Primary inputs (always read these, if exists):
Under `{{ memory_root }}/`:
- `raw_memories.md`
- mechanical merge of `raw_memories` from Phase 1;
- ordered latest-first; use this recency ordering as a major heuristic when choosing
what to promote, expand, or deprecate;
- mechanical merge of `raw_memories` from Phase 1; ordered latest-first.
- Use this recency ordering as a major heuristic when choosing what to promote, expand, or deprecate.
- Default scan order: top-to-bottom. In INCREMENTAL UPDATE mode, bias attention toward the newest
portion first, then expand to older entries with enough coverage to avoid missing important older
context.
- source of rollout-level metadata needed for MEMORY.md `### rollout_summary_files`
annotations;
you should be able to find `cwd` and `updated_at` there.
@@ -133,6 +135,8 @@ Rules:
signal determine the granularity and depth.
- Quality objective: for high-signal task families, `MEMORY.md` should be materially more
useful than `raw_memories.md` while remaining easy to navigate.
- Ordering objective: surface the most useful and most recently-updated validated memories
near the top of `MEMORY.md` and `memory_summary.md`.
============================================================
1) `MEMORY.md` FORMAT (STRICT)
@@ -166,15 +170,12 @@ Required task-oriented body shape (strict):
## Task 1: <task description, outcome>
task: <specific, searchable task signature; avoid fluff>
### rollout_summary_files
- <rollout_summaries/file1.md> (cwd=<path>, updated_at=<timestamp>, <optional status/usefulness note>)
- <rollout_summaries/file1.md> (cwd=<path>, updated_at=<timestamp>, thread_id=<thread_id>, <optional status/usefulness note>)
### keywords
- <task-local retrieval handles: tool names, error strings, repo concepts, APIs/contracts>
- <keyword1>, <keyword2>, <keyword3>, ... (single comma-separated line; task-local retrieval handles like tool names, error strings, repo concepts, APIs/contracts)
### learnings
@@ -187,8 +188,6 @@ task: <specific, searchable task signature; avoid fluff>
## Task 2: <task description, outcome>
task: <specific, searchable task signature; avoid fluff>
### rollout_summary_files
- ...
@@ -215,7 +214,7 @@ Schema rules (strict):
`## General Tips`.
- Keep all tasks and tips inside the task family implied by the block header.
- Keep entries retrieval-friendly, but not shallow.
- Do not emit placeholder values (`task: task`, `# Task Group: misc`, `scope: general`, etc.).
- Do not emit placeholder values (`# Task Group: misc`, `scope: general`, `## Task 1: task`, etc.).
- B) Task boundaries and clustering
- Primary organization unit is the task (`## Task <n>`), not the rollout file.
- Default mapping: one coherent rollout summary -> one MEMORY block -> one `## Task 1`.
@@ -226,6 +225,11 @@ Schema rules (strict):
task group and the task intent, technical context, and outcome pattern align.
- A single `## Task <n>` section may cite multiple rollout summaries when they are
iterative attempts or follow-up runs for the same task.
- A rollout summary file may appear in multiple `## Task <n>` sections (including across
different `# Task Group` blocks) when the same rollout contains reusable evidence for
distinct task angles; this is allowed.
- If a rollout summary is reused across tasks/blocks, each placement should add distinct
task-local learnings or routing value (not copy-pasted repetition).
- Do not cluster on keyword overlap alone.
- When in doubt, preserve boundaries (separate tasks/blocks) rather than over-cluster.
- C) Provenance and metadata
@@ -237,7 +241,6 @@ Schema rules (strict):
- Major learnings should be traceable to rollout summaries listed in the same task section.
- Order rollout references by freshness and practical usefulness.
- D) Retrieval and references
- `task:` lines must be specific and searchable.
- `### keywords` should be discriminative and task-local (tool names, error strings,
repo concepts, APIs/contracts).
- Put task-specific detail in `## Task <n>` and only deduplicated cross-task guidance in
@@ -246,8 +249,15 @@ Schema rules (strict):
`- Related skill: skills/<skill-name>/SKILL.md`).
- Use lowercase, hyphenated skill folder names.
- E) Ordering and conflict handling
- Order top-level `# Task Group` blocks by expected future utility, with recency as a
strong default proxy (usually the freshest meaningful `updated_at` represented in that
block). The top of `MEMORY.md` should contain the highest-utility / freshest task families.
- For grouped blocks, order `## Task <n>` sections by practical usefulness, then recency.
- Treat `updated_at` as a first-class signal: fresher validated evidence usually wins.
- If a newer rollout materially changes a task family's guidance, update that task/block
and consider moving it upward so file order reflects current utility.
- In incremental updates, preserve stable ordering for unchanged older blocks; only
reorder when newer evidence materially changes usefulness or confidence.
- If evidence conflicts and validation is unclear, preserve the uncertainty explicitly.
- In `## General Tips`, cite task references (`[Task 1]`, `[Task 2]`, etc.) when
merging, deduplicating, or resolving evidence.
@@ -261,7 +271,11 @@ What to write:
`memory_summary.md`.
- `MEMORY.md` should support related-but-not-identical tasks: slightly more general than a
rollout summary, but still operational and concrete.
- Use `raw_memories.md` as the routing layer; deep-dive into `rollout_summaries/*.md` when:
- Use `raw_memories.md` as the routing layer and task inventory.
- Before writing `MEMORY.md`, build a scratch mapping of `rollout_summary_file -> target
task group/task` from the full raw inventory so you can have a better overview.
Note that each rollout summary file can belong to multiple tasks.
- Then deep-dive into `rollout_summaries/*.md` when:
- the task is high-value and needs richer detail,
- multiple rollouts overlap and need conflict/staleness resolution,
- raw memory wording is too terse/ambiguous to consolidate confidently,
@@ -319,12 +333,63 @@ For example, include (when known):
## What's in Memory
This is a compact index to help future agents quickly find details in `MEMORY.md`,
`skills/`, and `rollout_summaries/`.
Organize by topic. Each bullet must include: topic, keywords, and a clear description.
Ordered by utility - which is the most likely to be useful for a future agent.
Do not target a fixed topic count. Cover the real high-signal areas and omit low-signal noise.
Prefer grouping by task family / workflow intent, not by incidental tools alone.
Treat it as a routing/index layer, not a mini-handbook:
- tell future agents what to search first,
- preserve enough specificity to route into the right `MEMORY.md` block quickly.
Recommended format:
Topic selection and quality rules:
- Organize by topic and split the index into a recent high-utility window and older topics.
- Do not target a fixed topic count. Include informative topics and omit low-signal noise.
- Prefer grouping by task family / workflow intent, not by incidental tool overlap alone.
- Order topics by utility, using `updated_at` recency as a strong default proxy unless there is
strong contrary evidence.
- Each topic bullet must include: topic, keywords, and a clear description.
- Keywords must be representative and directly searchable in `MEMORY.md`.
Prefer exact strings that a future agent can grep for (repo/project names, user query phrases,
tool names, error strings, commands, file paths, APIs/contracts). Avoid vague synonyms.
Required subsection structure (in this order):
### <most recent memory day: YYYY-MM-DD>
Recent Active Memory Window behavior (day-ordered):
- Define a "memory day" as a calendar date (derived from `updated_at`) that has at least one
represented memory/rollout in the current memory set.
- Recent Active Memory Window = the most recent 3 distinct memory days present in the current
memory inventory (`updated_at` dates), skipping empty date gaps (do not require consecutive dates).
- If fewer than 3 memory days exist, include all available memory days.
- For each recent-day subsection, prioritize informative, likely-to-recur topics and make
those entries richer (better keywords, clearer descriptions, and useful recent learnings);
do not spend much space on trivial tasks touched that day.
- Preserve routing coverage for `MEMORY.md` in the overall index. If a recent day includes
less useful topics, include shorter/compact entries for routing rather than dropping them.
- If a topic spans multiple recent days, list it under the most recent day it appears; do not
duplicate it under multiple day sections.
- Recent-day entries should be richer than older-topic entries: stronger keywords, clearer
descriptions, and concise recent learnings/change notes.
- Group similar tasks/topics together when it improves routing clarity.
- Do not over cluster topics together, especially when they contain distinct task intents.
Recent-topic format:
- <topic>: <keyword1>, <keyword2>, <keyword3>, ...
- desc: <clear and specific description of what tasks are inside this topic; what future task/user goal this helps with; what kinds of outcomes/artifacts/procedures are covered; and when to search this topic first>
- learnings: <some concise, topic-local recent takeaways / decision triggers / updates worth checking first; include useful specifics, but avoid overlap with `## General Tips` (cross-topic, broadly reusable guidance belongs there)>
### <2nd most recent memory day: YYYY-MM-DD>
Use the same format and keep it informative.
### <3rd most recent memory day: YYYY-MM-DD>
Use the same format and keep it informative.
### Older Memory Topics
All remaining high-signal topics not placed in the recent day subsections.
Avoid duplicating recent topics. Keep these compact and retrieval-oriented.
Older-topic format (compact):
- <topic>: <keyword1>, <keyword2>, <keyword3>, ...
- desc: <clear and specific description of what is inside this topic and when to use it>
@@ -332,10 +397,16 @@ Notes:
- Do not include large snippets; push details into MEMORY.md and rollout summaries.
- Prefer topics/keywords that help a future agent search MEMORY.md efficiently.
- Prefer clear topic taxonomy over verbose drill-down pointers.
- Keep descriptions explicit enough that a future model can decide which keyword cluster
to search first for a new user query.
- Topic descriptions should mention what is inside, when to use it, and what kind of
outcome/procedure depth is available (for example: runbook, diagnostics, reporting, recovery).
- This section is primarily an index to `MEMORY.md`; mention `skills/` / `rollout_summaries/`
only when they materially improve routing.
- Separation rule: recent-topic `learnings` should emphasize topic-local recent deltas,
caveats, and decision triggers; move cross-topic, stable, broadly reusable guidance to
`## General Tips`.
- Coverage guardrail: ensure every top-level `# Task Group` in `MEMORY.md` is represented by
at least one topic bullet in this index (either directly or via a clearly subsuming topic).
- Keep descriptions explicit: what is inside, when to use it, and what kind of
outcome/procedure depth is available (for example: runbook, diagnostics, reporting, recovery),
so a future agent can quickly choose which topic/keyword cluster to search first.
============================================================
3) `skills/` FORMAT (optional)
@@ -413,6 +484,10 @@ WORKFLOW
2) INIT phase behavior:
- Read `raw_memories.md` first, then rollout summaries carefully.
- In INIT mode, do a chunked coverage pass over `raw_memories.md` (top-to-bottom; do not stop
after only the first chunk).
- Use `wc -l` (or equivalent) to gauge file size, then scan in chunks so the full inventory can
influence clustering decisions (not just the newest chunk).
- Build Phase 2 artifacts from scratch:
- produce/refresh `MEMORY.md`
- create initial `skills/*` (optional but highly recommended)
@@ -424,17 +499,32 @@ WORKFLOW
3) INCREMENTAL UPDATE behavior:
- Treat `raw_memories.md` as the primary source of NEW signal.
- Read existing memory files first for continuity.
- Build an index of rollout references already present in existing `MEMORY.md` before
scanning raw memories so you can route net-new evidence into the right blocks.
- Compute net-new candidates from the raw-memory inventory (threads / rollout summaries /
updated evidence not already represented in `MEMORY.md`).
- Integrate new signal into existing artifacts by:
- scanning new raw memories in recency order and identifying which existing blocks they should update
- updating existing knowledge with better/newer evidence
- updating stale or contradicting guidance
- expanding terse old blocks when new summaries/raw memories make the task family clearer
- doing light clustering and merging if needed
- refreshing `MEMORY.md` top-of-file ordering so recent high-utility task families stay easy to find
- rebuilding the `memory_summary.md` recent active window (last 3 memory days) from current `updated_at` coverage
- updating existing skills or adding new skills only when there is clear new reusable procedure
- update `memory_summary.md` last to reflect the final state of the memory folder
- Minimize churn in incremental mode: if an existing `MEMORY.md` block or `## What's in Memory`
topic still reflects the current evidence and points to the same task family / retrieval
target, keep its wording, label, and relative order mostly stable. Rewrite/reorder/rename/
split/merge only when fixing a real problem (staleness, ambiguity, schema drift, wrong
boundaries) or when meaningful new evidence materially improves retrieval clarity/searchability.
4) Evidence deep-dive rule (both modes):
- `raw_memories.md` is the routing layer, not always the final authority for detail.
- Start by inventorying the real files on disk (`rg --files rollout_summaries` or
equivalent) and only open/cite rollout summaries from that set.
- If raw memory mentions a rollout summary file that is missing on disk, do not invent or
guess the file path in `MEMORY.md`; treat it as missing evidence and low confidence.
- When a task family is important, ambiguous, or duplicated across multiple rollouts,
open the relevant `rollout_summaries/*.md` files and extract richer procedural detail,
validation signals, and user feedback before finalizing `MEMORY.md`.
@@ -449,11 +539,22 @@ WORKFLOW
- if multiple summaries overlap for the same thread, keep the best one
7) Final pass:
- remove duplication in memory_summary, skills/, and MEMORY.md
- ensure any referenced skills/summaries actually exist
- ensure MEMORY blocks and "What's in Memory" use a consistent task-oriented taxonomy
- ensure recent important task families are easy to find (description + keywords + topic wording)
- if there is no net-new or higher-quality signal to add, keep changes minimal (no
- remove duplication in memory_summary, skills/, and MEMORY.md
- remove stale or low-signal blocks that are less likely to be useful in the future
- run a global rollout-reference audit on final `MEMORY.md` and fix accidental duplicate
entries / redundant repetition, while preserving intentional multi-task or multi-block
reuse when it adds distinct task-local value
- ensure any referenced skills/summaries actually exist
- ensure MEMORY blocks and "What's in Memory" use a consistent task-oriented taxonomy
- ensure recent important task families are easy to find (description + keywords + topic wording)
- verify `MEMORY.md` block order and `What's in Memory` section order reflect current
utility/recency priorities (especially the recent active memory window)
- verify `## What's in Memory` quality checks:
- recent-day headings are correctly day-ordered
- no accidental duplicate topic bullets across recent-day sections and `### Older Memory Topics`
- topic coverage still represents all top-level `# Task Group` blocks in `MEMORY.md`
- topic keywords are grep-friendly and likely searchable in `MEMORY.md`
- if there is no net-new or higher-quality signal to add, keep changes minimal (no
churn for its own sake).
You should dive deep and make sure you didn't miss any important information that might