mirror of
https://github.com/openai/codex.git
synced 2026-04-24 06:35:50 +00:00
memories: tighten consolidation prompt schema and indexing guidance (#12653)
## Summary - tighten the Phase 2 consolidation prompt for task-oriented `MEMORY.md` generation - address Phase 2 under-coverage / "laziness" with stronger workflow + final-pass checks - improve recency/ordering behavior for `MEMORY.md` and `memory_summary.md` - rewrite `## What's in Memory` as a clearer routing index with explicit recent-3-day structure ## Key Changes - `MEMORY.md` schema cleanup: - align on `## Task <n>` task sections (remove stale `task:` rule/example references) - include `thread_id` in rollout provenance examples - compact comma-separated `### keywords` format - Phase 2 completeness guardrails: - chunked INIT coverage pass over `raw_memories.md` - incremental net-new indexing / routing steps - stronger final checks (day ordering, topic coverage, keyword searchability, accidental duplication) - Recency / ordering rules: - clearer scan-order guidance for raw memories (newest-first bias in incremental mode) - utility+recency ordering guidance for `MEMORY.md` task groups and summary topics - rebuild recent active window from current `updated_at` coverage - `## What's in Memory` rewrite: - index/routing-layer framing (not a mini-handbook) - explicit recent 3 distinct memory-day layout - richer recent-topic entries + compact lower-priority routing entries - clearer `desc` / `learnings` expectations and separation from `## General Tips` - Explicitly allow rollout-summary reuse across multiple tasks/blocks when it supports distinct task angles (with distinct task-local value) ## Notes - Prompt-template only: `codex-rs/core/templates/memories/consolidation.md` - No runtime/code changes ## Validation - Manual diff review only
This commit is contained in:
@@ -99,9 +99,11 @@ Phase 2 has two operating styles:
|
||||
Primary inputs (always read these, if exists):
|
||||
Under `{{ memory_root }}/`:
|
||||
- `raw_memories.md`
|
||||
- mechanical merge of `raw_memories` from Phase 1;
|
||||
- ordered latest-first; use this recency ordering as a major heuristic when choosing
|
||||
what to promote, expand, or deprecate;
|
||||
- mechanical merge of `raw_memories` from Phase 1; ordered latest-first.
|
||||
- Use this recency ordering as a major heuristic when choosing what to promote, expand, or deprecate.
|
||||
- Default scan order: top-to-bottom. In INCREMENTAL UPDATE mode, bias attention toward the newest
|
||||
portion first, then expand to older entries with enough coverage to avoid missing important older
|
||||
context.
|
||||
- source of rollout-level metadata needed for MEMORY.md `### rollout_summary_files`
|
||||
annotations;
|
||||
you should be able to find `cwd` and `updated_at` there.
|
||||
@@ -133,6 +135,8 @@ Rules:
|
||||
signal determine the granularity and depth.
|
||||
- Quality objective: for high-signal task families, `MEMORY.md` should be materially more
|
||||
useful than `raw_memories.md` while remaining easy to navigate.
|
||||
- Ordering objective: surface the most useful and most recently-updated validated memories
|
||||
near the top of `MEMORY.md` and `memory_summary.md`.
|
||||
|
||||
============================================================
|
||||
1) `MEMORY.md` FORMAT (STRICT)
|
||||
@@ -166,15 +170,12 @@ Required task-oriented body shape (strict):
|
||||
|
||||
## Task 1: <task description, outcome>
|
||||
|
||||
task: <specific, searchable task signature; avoid fluff>
|
||||
|
||||
### rollout_summary_files
|
||||
|
||||
- <rollout_summaries/file1.md> (cwd=<path>, updated_at=<timestamp>, <optional status/usefulness note>)
|
||||
- <rollout_summaries/file1.md> (cwd=<path>, updated_at=<timestamp>, thread_id=<thread_id>, <optional status/usefulness note>)
|
||||
|
||||
### keywords
|
||||
|
||||
- <task-local retrieval handles: tool names, error strings, repo concepts, APIs/contracts>
|
||||
- <keyword1>, <keyword2>, <keyword3>, ... (single comma-separated line; task-local retrieval handles like tool names, error strings, repo concepts, APIs/contracts)
|
||||
|
||||
### learnings
|
||||
|
||||
@@ -187,8 +188,6 @@ task: <specific, searchable task signature; avoid fluff>
|
||||
|
||||
## Task 2: <task description, outcome>
|
||||
|
||||
task: <specific, searchable task signature; avoid fluff>
|
||||
|
||||
### rollout_summary_files
|
||||
|
||||
- ...
|
||||
@@ -215,7 +214,7 @@ Schema rules (strict):
|
||||
`## General Tips`.
|
||||
- Keep all tasks and tips inside the task family implied by the block header.
|
||||
- Keep entries retrieval-friendly, but not shallow.
|
||||
- Do not emit placeholder values (`task: task`, `# Task Group: misc`, `scope: general`, etc.).
|
||||
- Do not emit placeholder values (`# Task Group: misc`, `scope: general`, `## Task 1: task`, etc.).
|
||||
- B) Task boundaries and clustering
|
||||
- Primary organization unit is the task (`## Task <n>`), not the rollout file.
|
||||
- Default mapping: one coherent rollout summary -> one MEMORY block -> one `## Task 1`.
|
||||
@@ -226,6 +225,11 @@ Schema rules (strict):
|
||||
task group and the task intent, technical context, and outcome pattern align.
|
||||
- A single `## Task <n>` section may cite multiple rollout summaries when they are
|
||||
iterative attempts or follow-up runs for the same task.
|
||||
- A rollout summary file may appear in multiple `## Task <n>` sections (including across
|
||||
different `# Task Group` blocks) when the same rollout contains reusable evidence for
|
||||
distinct task angles; this is allowed.
|
||||
- If a rollout summary is reused across tasks/blocks, each placement should add distinct
|
||||
task-local learnings or routing value (not copy-pasted repetition).
|
||||
- Do not cluster on keyword overlap alone.
|
||||
- When in doubt, preserve boundaries (separate tasks/blocks) rather than over-cluster.
|
||||
- C) Provenance and metadata
|
||||
@@ -237,7 +241,6 @@ Schema rules (strict):
|
||||
- Major learnings should be traceable to rollout summaries listed in the same task section.
|
||||
- Order rollout references by freshness and practical usefulness.
|
||||
- D) Retrieval and references
|
||||
- `task:` lines must be specific and searchable.
|
||||
- `### keywords` should be discriminative and task-local (tool names, error strings,
|
||||
repo concepts, APIs/contracts).
|
||||
- Put task-specific detail in `## Task <n>` and only deduplicated cross-task guidance in
|
||||
@@ -246,8 +249,15 @@ Schema rules (strict):
|
||||
`- Related skill: skills/<skill-name>/SKILL.md`).
|
||||
- Use lowercase, hyphenated skill folder names.
|
||||
- E) Ordering and conflict handling
|
||||
- Order top-level `# Task Group` blocks by expected future utility, with recency as a
|
||||
strong default proxy (usually the freshest meaningful `updated_at` represented in that
|
||||
block). The top of `MEMORY.md` should contain the highest-utility / freshest task families.
|
||||
- For grouped blocks, order `## Task <n>` sections by practical usefulness, then recency.
|
||||
- Treat `updated_at` as a first-class signal: fresher validated evidence usually wins.
|
||||
- If a newer rollout materially changes a task family's guidance, update that task/block
|
||||
and consider moving it upward so file order reflects current utility.
|
||||
- In incremental updates, preserve stable ordering for unchanged older blocks; only
|
||||
reorder when newer evidence materially changes usefulness or confidence.
|
||||
- If evidence conflicts and validation is unclear, preserve the uncertainty explicitly.
|
||||
- In `## General Tips`, cite task references (`[Task 1]`, `[Task 2]`, etc.) when
|
||||
merging, deduplicating, or resolving evidence.
|
||||
@@ -261,7 +271,11 @@ What to write:
|
||||
`memory_summary.md`.
|
||||
- `MEMORY.md` should support related-but-not-identical tasks: slightly more general than a
|
||||
rollout summary, but still operational and concrete.
|
||||
- Use `raw_memories.md` as the routing layer; deep-dive into `rollout_summaries/*.md` when:
|
||||
- Use `raw_memories.md` as the routing layer and task inventory.
|
||||
- Before writing `MEMORY.md`, build a scratch mapping of `rollout_summary_file -> target
|
||||
task group/task` from the full raw inventory so you can have a better overview.
|
||||
Note that each rollout summary file can belong to multiple tasks.
|
||||
- Then deep-dive into `rollout_summaries/*.md` when:
|
||||
- the task is high-value and needs richer detail,
|
||||
- multiple rollouts overlap and need conflict/staleness resolution,
|
||||
- raw memory wording is too terse/ambiguous to consolidate confidently,
|
||||
@@ -319,12 +333,63 @@ For example, include (when known):
|
||||
## What's in Memory
|
||||
This is a compact index to help future agents quickly find details in `MEMORY.md`,
|
||||
`skills/`, and `rollout_summaries/`.
|
||||
Organize by topic. Each bullet must include: topic, keywords, and a clear description.
|
||||
Ordered by utility - which is the most likely to be useful for a future agent.
|
||||
Do not target a fixed topic count. Cover the real high-signal areas and omit low-signal noise.
|
||||
Prefer grouping by task family / workflow intent, not by incidental tools alone.
|
||||
Treat it as a routing/index layer, not a mini-handbook:
|
||||
- tell future agents what to search first,
|
||||
- preserve enough specificity to route into the right `MEMORY.md` block quickly.
|
||||
|
||||
Recommended format:
|
||||
Topic selection and quality rules:
|
||||
- Organize by topic and split the index into a recent high-utility window and older topics.
|
||||
- Do not target a fixed topic count. Include informative topics and omit low-signal noise.
|
||||
- Prefer grouping by task family / workflow intent, not by incidental tool overlap alone.
|
||||
- Order topics by utility, using `updated_at` recency as a strong default proxy unless there is
|
||||
strong contrary evidence.
|
||||
- Each topic bullet must include: topic, keywords, and a clear description.
|
||||
- Keywords must be representative and directly searchable in `MEMORY.md`.
|
||||
Prefer exact strings that a future agent can grep for (repo/project names, user query phrases,
|
||||
tool names, error strings, commands, file paths, APIs/contracts). Avoid vague synonyms.
|
||||
|
||||
Required subsection structure (in this order):
|
||||
|
||||
### <most recent memory day: YYYY-MM-DD>
|
||||
|
||||
Recent Active Memory Window behavior (day-ordered):
|
||||
- Define a "memory day" as a calendar date (derived from `updated_at`) that has at least one
|
||||
represented memory/rollout in the current memory set.
|
||||
- Recent Active Memory Window = the most recent 3 distinct memory days present in the current
|
||||
memory inventory (`updated_at` dates), skipping empty date gaps (do not require consecutive dates).
|
||||
- If fewer than 3 memory days exist, include all available memory days.
|
||||
- For each recent-day subsection, prioritize informative, likely-to-recur topics and make
|
||||
those entries richer (better keywords, clearer descriptions, and useful recent learnings);
|
||||
do not spend much space on trivial tasks touched that day.
|
||||
- Preserve routing coverage for `MEMORY.md` in the overall index. If a recent day includes
|
||||
less useful topics, include shorter/compact entries for routing rather than dropping them.
|
||||
- If a topic spans multiple recent days, list it under the most recent day it appears; do not
|
||||
duplicate it under multiple day sections.
|
||||
- Recent-day entries should be richer than older-topic entries: stronger keywords, clearer
|
||||
descriptions, and concise recent learnings/change notes.
|
||||
- Group similar tasks/topics together when it improves routing clarity.
|
||||
- Do not over cluster topics together, especially when they contain distinct task intents.
|
||||
|
||||
Recent-topic format:
|
||||
- <topic>: <keyword1>, <keyword2>, <keyword3>, ...
|
||||
- desc: <clear and specific description of what tasks are inside this topic; what future task/user goal this helps with; what kinds of outcomes/artifacts/procedures are covered; and when to search this topic first>
|
||||
- learnings: <some concise, topic-local recent takeaways / decision triggers / updates worth checking first; include useful specifics, but avoid overlap with `## General Tips` (cross-topic, broadly reusable guidance belongs there)>
|
||||
|
||||
|
||||
### <2nd most recent memory day: YYYY-MM-DD>
|
||||
|
||||
Use the same format and keep it informative.
|
||||
|
||||
### <3rd most recent memory day: YYYY-MM-DD>
|
||||
|
||||
Use the same format and keep it informative.
|
||||
|
||||
### Older Memory Topics
|
||||
|
||||
All remaining high-signal topics not placed in the recent day subsections.
|
||||
Avoid duplicating recent topics. Keep these compact and retrieval-oriented.
|
||||
|
||||
Older-topic format (compact):
|
||||
- <topic>: <keyword1>, <keyword2>, <keyword3>, ...
|
||||
- desc: <clear and specific description of what is inside this topic and when to use it>
|
||||
|
||||
@@ -332,10 +397,16 @@ Notes:
|
||||
- Do not include large snippets; push details into MEMORY.md and rollout summaries.
|
||||
- Prefer topics/keywords that help a future agent search MEMORY.md efficiently.
|
||||
- Prefer clear topic taxonomy over verbose drill-down pointers.
|
||||
- Keep descriptions explicit enough that a future model can decide which keyword cluster
|
||||
to search first for a new user query.
|
||||
- Topic descriptions should mention what is inside, when to use it, and what kind of
|
||||
outcome/procedure depth is available (for example: runbook, diagnostics, reporting, recovery).
|
||||
- This section is primarily an index to `MEMORY.md`; mention `skills/` / `rollout_summaries/`
|
||||
only when they materially improve routing.
|
||||
- Separation rule: recent-topic `learnings` should emphasize topic-local recent deltas,
|
||||
caveats, and decision triggers; move cross-topic, stable, broadly reusable guidance to
|
||||
`## General Tips`.
|
||||
- Coverage guardrail: ensure every top-level `# Task Group` in `MEMORY.md` is represented by
|
||||
at least one topic bullet in this index (either directly or via a clearly subsuming topic).
|
||||
- Keep descriptions explicit: what is inside, when to use it, and what kind of
|
||||
outcome/procedure depth is available (for example: runbook, diagnostics, reporting, recovery),
|
||||
so a future agent can quickly choose which topic/keyword cluster to search first.
|
||||
|
||||
============================================================
|
||||
3) `skills/` FORMAT (optional)
|
||||
@@ -413,6 +484,10 @@ WORKFLOW
|
||||
|
||||
2) INIT phase behavior:
|
||||
- Read `raw_memories.md` first, then rollout summaries carefully.
|
||||
- In INIT mode, do a chunked coverage pass over `raw_memories.md` (top-to-bottom; do not stop
|
||||
after only the first chunk).
|
||||
- Use `wc -l` (or equivalent) to gauge file size, then scan in chunks so the full inventory can
|
||||
influence clustering decisions (not just the newest chunk).
|
||||
- Build Phase 2 artifacts from scratch:
|
||||
- produce/refresh `MEMORY.md`
|
||||
- create initial `skills/*` (optional but highly recommended)
|
||||
@@ -424,17 +499,32 @@ WORKFLOW
|
||||
3) INCREMENTAL UPDATE behavior:
|
||||
- Treat `raw_memories.md` as the primary source of NEW signal.
|
||||
- Read existing memory files first for continuity.
|
||||
- Build an index of rollout references already present in existing `MEMORY.md` before
|
||||
scanning raw memories so you can route net-new evidence into the right blocks.
|
||||
- Compute net-new candidates from the raw-memory inventory (threads / rollout summaries /
|
||||
updated evidence not already represented in `MEMORY.md`).
|
||||
- Integrate new signal into existing artifacts by:
|
||||
- scanning new raw memories in recency order and identifying which existing blocks they should update
|
||||
- updating existing knowledge with better/newer evidence
|
||||
- updating stale or contradicting guidance
|
||||
- expanding terse old blocks when new summaries/raw memories make the task family clearer
|
||||
- doing light clustering and merging if needed
|
||||
- refreshing `MEMORY.md` top-of-file ordering so recent high-utility task families stay easy to find
|
||||
- rebuilding the `memory_summary.md` recent active window (last 3 memory days) from current `updated_at` coverage
|
||||
- updating existing skills or adding new skills only when there is clear new reusable procedure
|
||||
- update `memory_summary.md` last to reflect the final state of the memory folder
|
||||
- Minimize churn in incremental mode: if an existing `MEMORY.md` block or `## What's in Memory`
|
||||
topic still reflects the current evidence and points to the same task family / retrieval
|
||||
target, keep its wording, label, and relative order mostly stable. Rewrite/reorder/rename/
|
||||
split/merge only when fixing a real problem (staleness, ambiguity, schema drift, wrong
|
||||
boundaries) or when meaningful new evidence materially improves retrieval clarity/searchability.
|
||||
|
||||
4) Evidence deep-dive rule (both modes):
|
||||
- `raw_memories.md` is the routing layer, not always the final authority for detail.
|
||||
- Start by inventorying the real files on disk (`rg --files rollout_summaries` or
|
||||
equivalent) and only open/cite rollout summaries from that set.
|
||||
- If raw memory mentions a rollout summary file that is missing on disk, do not invent or
|
||||
guess the file path in `MEMORY.md`; treat it as missing evidence and low confidence.
|
||||
- When a task family is important, ambiguous, or duplicated across multiple rollouts,
|
||||
open the relevant `rollout_summaries/*.md` files and extract richer procedural detail,
|
||||
validation signals, and user feedback before finalizing `MEMORY.md`.
|
||||
@@ -449,11 +539,22 @@ WORKFLOW
|
||||
- if multiple summaries overlap for the same thread, keep the best one
|
||||
|
||||
7) Final pass:
|
||||
- remove duplication in memory_summary, skills/, and MEMORY.md
|
||||
- ensure any referenced skills/summaries actually exist
|
||||
- ensure MEMORY blocks and "What's in Memory" use a consistent task-oriented taxonomy
|
||||
- ensure recent important task families are easy to find (description + keywords + topic wording)
|
||||
- if there is no net-new or higher-quality signal to add, keep changes minimal (no
|
||||
- remove duplication in memory_summary, skills/, and MEMORY.md
|
||||
- remove stale or low-signal blocks that are less likely to be useful in the future
|
||||
- run a global rollout-reference audit on final `MEMORY.md` and fix accidental duplicate
|
||||
entries / redundant repetition, while preserving intentional multi-task or multi-block
|
||||
reuse when it adds distinct task-local value
|
||||
- ensure any referenced skills/summaries actually exist
|
||||
- ensure MEMORY blocks and "What's in Memory" use a consistent task-oriented taxonomy
|
||||
- ensure recent important task families are easy to find (description + keywords + topic wording)
|
||||
- verify `MEMORY.md` block order and `What's in Memory` section order reflect current
|
||||
utility/recency priorities (especially the recent active memory window)
|
||||
- verify `## What's in Memory` quality checks:
|
||||
- recent-day headings are correctly day-ordered
|
||||
- no accidental duplicate topic bullets across recent-day sections and `### Older Memory Topics`
|
||||
- topic coverage still represents all top-level `# Task Group` blocks in `MEMORY.md`
|
||||
- topic keywords are grep-friendly and likely searchable in `MEMORY.md`
|
||||
- if there is no net-new or higher-quality signal to add, keep changes minimal (no
|
||||
churn for its own sake).
|
||||
|
||||
You should dive deep and make sure you didn't miss any important information that might
|
||||
|
||||
Reference in New Issue
Block a user