Files
codex/codex-rs/memories
ningyi-oai bee78806a9 [codex] add compaction metadata to turn headers (#24368)
## Summary
- Add `request_kind` values for foreground turn, startup prewarm,
compaction, and detached memory model requests.
- Attach compaction dispatch metadata to local Responses, legacy
`/v1/responses/compact`, and remote v2 compact requests.
- Add the existing logical context-window identifier as `window_id` on
turn-owned model request metadata.
- Keep identity fields optional for detached memory requests, while
still emitting `request_kind="memory"` in non-git/no-sandbox workspaces.

## Root Cause
`x-codex-turn-metadata` has more than one producer. Foreground turns and
compaction requests own a real turn and should carry that turn identity.
Detached memory stage-one requests do not own a foreground turn, so
absent identity fields are valid rather than missing data. Startup
websocket prewarm is also a model request, but it has `generate=false`
and must not be counted as a foreground turn.

`thread_source` or session source identifies where a thread came from
(for example review, guardian, or another subagent). `request_kind`
identifies what the current outbound model request is doing (`turn`,
`prewarm`, `compaction`, or `memory`). A review or guardian thread can
issue either a normal turn request or a compaction request, so source
cannot replace request kind.

## Behavior / Impact
- Ordinary foreground requests send `request_kind="turn"`, their real
identity fields, and `window_id="<thread_id>:<window_generation>"`.
- Startup websocket warmup requests send `request_kind="prewarm"` so
they are not counted as foreground turns.
- Compaction requests send `request_kind="compaction"`, their real
owning turn identity, the existing `window_id`, and
`compaction.{trigger,reason,implementation,phase,strategy}`.
- Detached memory stage-one requests send `request_kind="memory"`
without `session_id`, `thread_id`, `turn_id`, or `window_id`; when no
workspace metadata exists, the kind-only header is still emitted.
- `session_id`, `thread_id`, `turn_id`, and `window_id` remain optional
in the header schema because detached memory requests do not own a
foreground turn or context window.
- `window_id` is not a new ID system: it is copied from the already-sent
`x-codex-window-id` / WS client metadata value at model-request dispatch
time.
- Existing `x-codex-window-id` HTTP/WS emission, value format,
generation advancement, resume behavior, and fork reset behavior are
unchanged.
- `request_kind`, `window_id`, and upstream turn-owned identity fields
remain schema-owned; input `responsesapi_client_metadata` cannot replace
their canonical values.
- No table, DAG, export, app-server API, or MCP `_meta` schema changes
are included.

A compaction attempt stopped by a pre-compact hook issues no model
request and therefore has no request header; its outcome remains in
analytics events. Status, error, duration, and token deltas also remain
analytics fields rather than request-header fields.

Future detached-memory attribution using a real initiating turn ID as
`trigger_turn_id` is intentionally not part of this PR.

## Sync With Main
- Final pushed head `716342e79` is rebased onto `origin/main@0d37db4b2`.
- The metadata conflict came from upstream `#24160`, which added
`forked_from_thread_id` on the same `turn_metadata` surface. Resolution
preserves that field and its protection from client metadata override
alongside this PR's request-kind, compaction, and window-id fields.
- While resolving the overlapping commits, I removed an accidental
recursive model-request overlay and a duplicate detached-memory header
builder before completing the rebase.

## Latency / User Experience Boundary
- Foreground turns perform no new filesystem, git, or network work. New
fields are inserted into metadata already serialized for outgoing
requests.
- Compaction issues the same model/HTTP requests with the same prompt,
model, service tier, and sampling settings; only metadata bytes change.
- Startup prewarm already sent metadata; it is now correctly classified
as `prewarm`.
- Non-git detached memory now sends a small kind-only metadata header
rather than no header.
- This client diff adds no user-visible latency mechanism beyond
negligible serialization and header bytes on already-existing requests.

## Validation
On conflict-resolved head `1d35c2cfb` based on `origin/main@487521733`:
- `just fmt` (passed)
- `just fix -p codex-core` (passed)
- `git diff --check origin/main...HEAD` (passed)
- `just test -p codex-core -E 'test(turn_metadata) |
test(websocket_first_turn_uses_startup_prewarm_and_create) |
test(responses_stream_includes_turn_metadata_header_for_git_workspace_e2e)
|
test(responses_websocket_forwards_turn_metadata_on_initial_and_incremental_create)
| test(remote_compact_v2_retries_failures_with_stream_retry_budget) |
test(window_id_advances_after_compact_persists_on_resume_and_resets_on_fork)'`
(`23 passed`; `bench-smoke` passed)
- `just test -p codex-app-server -E
'test(turn_start_forwards_client_metadata_to_responses_request_v2) |
test(turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2)
| test(auto_compaction_remote_emits_started_and_completed_items)'` (`3
passed`; `bench-smoke` passed)
- `just test -p codex-memories-write` (`29 passed`; `bench-smoke`
passed)
2026-05-27 11:09:33 -07:00
..

Memories

This directory owns reusable memory crates and the memory pipeline documentation.

Runtime orchestration for Phase 1 and Phase 2 still lives in codex-core under codex-rs/core/src/memories/.

Crates

  • codex-rs/memories/read (codex-memories-read) owns the read path: memory developer-instruction injection, memory citation parsing, and read-usage telemetry classification.
  • codex-rs/memories/write (codex-memories-write) owns the write path: Phase 1 and Phase 2 prompt rendering, filesystem artifact helpers, workspace diff helpers, and extension resource pruning.

Prompt Templates

Memory prompt templates live with the crate that uses them:

  • The undated template files are the canonical latest versions used at runtime:
    • read/templates/memories/read_path.md
    • write/templates/memories/stage_one_system.md
    • write/templates/memories/stage_one_input.md
    • write/templates/memories/consolidation.md
  • In codex, edit those undated template files in place.
  • The dated snapshot-copy workflow is used in the separate openai/project/agent_memory/write harness repo, not here.

When it runs

The pipeline is triggered when a root session starts, and only if:

  • the session is not ephemeral
  • the memory feature is enabled
  • the session is not a sub-agent session
  • the state DB is available

It runs asynchronously in the background and executes two phases in order: Phase 1, then Phase 2.

Phase 1: Rollout Extraction (per-thread)

Phase 1 finds recent eligible rollouts and extracts a structured memory from each one.

Eligible rollouts are selected from the state DB using startup claim rules. In practice this means the pipeline only considers rollouts that are:

  • from allowed interactive session sources
  • within the configured age window
  • idle long enough (to avoid summarizing still-active/fresh rollouts)
  • not already owned by another in-flight phase-1 worker
  • within startup scan/claim limits (bounded work per startup)

What it does:

  • claims a bounded set of rollout jobs from the state DB (startup claim)
  • filters rollout content down to memory-relevant response items
  • sends each rollout to a model (in parallel, with a concurrency cap)
  • expects structured output containing:
    • a detailed raw_memory
    • a compact rollout_summary
    • an optional rollout_slug
  • redacts secrets from the generated memory fields
  • stores successful outputs back into the state DB as stage-1 outputs

Concurrency / coordination:

  • Phase 1 runs multiple extraction jobs in parallel (with a fixed concurrency cap) so startup memory generation can process several rollouts at once.
  • Each job is leased/claimed in the state DB before processing, which prevents duplicate work across concurrent workers/startups.
  • Failed jobs are marked with retry backoff, so they are retried later instead of hot-looping.

Job outcomes:

  • succeeded (memory produced)
  • succeeded_no_output (valid run but nothing useful generated)
  • failed (with retry backoff/lease handling in DB)

Phase 1 is the stage that turns individual rollouts into DB-backed memory records.

Phase 2: Global Consolidation

Phase 2 consolidates the latest stage-1 outputs into the filesystem memory artifacts and then runs a dedicated consolidation agent.

What it does:

  • claims a single global phase-2 lock before touching the memories root (so only one consolidation inspects or mutates the workspace at a time)
  • loads a bounded set of stage-1 outputs from the state DB using phase-2 selection rules:
    • ignores memories whose last_usage falls outside the configured max_unused_days window
    • for memories with no last_usage, falls back to generated_at so fresh never-used memories can still be selected
    • ranks eligible memories by usage_count first, then by the most recent last_usage / generated_at
  • computes a completion watermark from the claimed watermark + newest input timestamps
  • syncs local memory artifacts under the memories root:
    • raw_memories.md (merged raw memories, stable ascending thread-id order)
    • rollout_summaries/ (one summary file per selected rollout)
  • keeps the memories root itself as a git-baseline directory, initialized under ~/.codex/memories/.git by codex-git-utils
  • prunes stale rollout summaries that are no longer selected
  • prunes memory extension resource files older than the extension retention window, so cleanup appears in the workspace diff
  • writes phase2_workspace_diff.md in the memories root with the git-style diff from the previous successful Phase 2 baseline to the current worktree
  • if the memory workspace has no changes after artifact sync/pruning, marks the job successful and exits

If the memory workspace has changes, it then:

  • spawns an internal consolidation sub-agent
  • builds the Phase 2 prompt with the path to the generated workspace diff
  • points the agent at phase2_workspace_diff.md for the detailed diff context
  • runs it with no approvals, no network, and local write access only
  • disables collab for that agent (to prevent recursive delegation)
  • watches the agent status and heartbeats the global job lease while it runs
  • resets the memory git baseline after the agent completes successfully; the generated diff file is removed before this reset so deleted content is not kept in the prompt artifact or unreachable git objects
  • marks the phase-2 job success/failure in the state DB when the agent finishes

Selection and workspace-diff behavior:

  • successful Phase 2 runs mark the exact stage-1 snapshots they consumed with selected_for_phase2 = 1 and persist the matching selected_for_phase2_source_updated_at
  • Phase 1 upserts preserve the previous selected_for_phase2 baseline until the next successful Phase 2 run rewrites it
  • Phase 2 loads only the current top-N selected stage-1 inputs, syncs rollout_summaries/ directly to that selection, renders raw_memories.md in stable ascending thread-id order to avoid usage-rank churn, then lets the git-style workspace diff surface additions, modifications, and deletions against the previous successful memory baseline
  • when the selected input set is empty, stale rollout_summaries/ files are removed and raw_memories.md is rewritten to the empty-input placeholder; consolidated outputs such as MEMORY.md, memory_summary.md, and skills/ are left for the agent to update

Watermark behavior:

  • The global phase-2 lock does not use DB watermarks as a dirty check; git workspace dirtiness decides whether an agent needs to run.
  • The global phase-2 job row still tracks an input watermark as bookkeeping for the latest DB input timestamp known when the job was claimed.
  • Phase 2 recomputes a new_watermark using the max of:
    • the claimed watermark
    • the newest source_updated_at timestamp in the stage-1 inputs it actually loaded
  • On success, Phase 2 stores that completion watermark in the DB.
  • This avoids moving the recorded completion watermark backwards, but does not decide whether Phase 2 has work.

In practice, this phase is responsible for refreshing the on-disk memory workspace and producing/updating the higher-level consolidated memory outputs.

Why it is split into two phases

  • Phase 1 scales across many rollouts and produces normalized per-rollout memory records.
  • Phase 2 serializes global consolidation so the shared memory artifacts are updated safely and consistently.