mirror of https://github.com/openai/codex.git synced 2026-05-28 15:00:16 +00:00

Files

ningyi-oai bee78806a9 [codex] add compaction metadata to turn headers (#24368 )

## Summary
- Add `request_kind` values for foreground turn, startup prewarm,
compaction, and detached memory model requests.
- Attach compaction dispatch metadata to local Responses, legacy
`/v1/responses/compact`, and remote v2 compact requests.
- Add the existing logical context-window identifier as `window_id` on
turn-owned model request metadata.
- Keep identity fields optional for detached memory requests, while
still emitting `request_kind="memory"` in non-git/no-sandbox workspaces.

## Root Cause
`x-codex-turn-metadata` has more than one producer. Foreground turns and
compaction requests own a real turn and should carry that turn identity.
Detached memory stage-one requests do not own a foreground turn, so
absent identity fields are valid rather than missing data. Startup
websocket prewarm is also a model request, but it has `generate=false`
and must not be counted as a foreground turn.

`thread_source` or session source identifies where a thread came from
(for example review, guardian, or another subagent). `request_kind`
identifies what the current outbound model request is doing (`turn`,
`prewarm`, `compaction`, or `memory`). A review or guardian thread can
issue either a normal turn request or a compaction request, so source
cannot replace request kind.

## Behavior / Impact
- Ordinary foreground requests send `request_kind="turn"`, their real
identity fields, and `window_id="<thread_id>:<window_generation>"`.
- Startup websocket warmup requests send `request_kind="prewarm"` so
they are not counted as foreground turns.
- Compaction requests send `request_kind="compaction"`, their real
owning turn identity, the existing `window_id`, and
`compaction.{trigger,reason,implementation,phase,strategy}`.
- Detached memory stage-one requests send `request_kind="memory"`
without `session_id`, `thread_id`, `turn_id`, or `window_id`; when no
workspace metadata exists, the kind-only header is still emitted.
- `session_id`, `thread_id`, `turn_id`, and `window_id` remain optional
in the header schema because detached memory requests do not own a
foreground turn or context window.
- `window_id` is not a new ID system: it is copied from the already-sent
`x-codex-window-id` / WS client metadata value at model-request dispatch
time.
- Existing `x-codex-window-id` HTTP/WS emission, value format,
generation advancement, resume behavior, and fork reset behavior are
unchanged.
- `request_kind`, `window_id`, and upstream turn-owned identity fields
remain schema-owned; input `responsesapi_client_metadata` cannot replace
their canonical values.
- No table, DAG, export, app-server API, or MCP `_meta` schema changes
are included.

A compaction attempt stopped by a pre-compact hook issues no model
request and therefore has no request header; its outcome remains in
analytics events. Status, error, duration, and token deltas also remain
analytics fields rather than request-header fields.

Future detached-memory attribution using a real initiating turn ID as
`trigger_turn_id` is intentionally not part of this PR.

## Sync With Main
- Final pushed head `716342e79` is rebased onto `origin/main@0d37db4b2`.
- The metadata conflict came from upstream `#24160`, which added
`forked_from_thread_id` on the same `turn_metadata` surface. Resolution
preserves that field and its protection from client metadata override
alongside this PR's request-kind, compaction, and window-id fields.
- While resolving the overlapping commits, I removed an accidental
recursive model-request overlay and a duplicate detached-memory header
builder before completing the rebase.

## Latency / User Experience Boundary
- Foreground turns perform no new filesystem, git, or network work. New
fields are inserted into metadata already serialized for outgoing
requests.
- Compaction issues the same model/HTTP requests with the same prompt,
model, service tier, and sampling settings; only metadata bytes change.
- Startup prewarm already sent metadata; it is now correctly classified
as `prewarm`.
- Non-git detached memory now sends a small kind-only metadata header
rather than no header.
- This client diff adds no user-visible latency mechanism beyond
negligible serialization and header bytes on already-existing requests.

## Validation
On conflict-resolved head `1d35c2cfb` based on `origin/main@487521733`:
- `just fmt` (passed)
- `just fix -p codex-core` (passed)
- `git diff --check origin/main...HEAD` (passed)
- `just test -p codex-core -E 'test(turn_metadata) |
test(websocket_first_turn_uses_startup_prewarm_and_create) |
test(responses_stream_includes_turn_metadata_header_for_git_workspace_e2e)
|
test(responses_websocket_forwards_turn_metadata_on_initial_and_incremental_create)
| test(remote_compact_v2_retries_failures_with_stream_retry_budget) |
test(window_id_advances_after_compact_persists_on_resume_and_resets_on_fork)'`
(`23 passed`; `bench-smoke` passed)
- `just test -p codex-app-server -E
'test(turn_start_forwards_client_metadata_to_responses_request_v2) |
test(turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2)
| test(auto_compaction_remote_emits_started_and_completed_items)'` (`3
passed`; `bench-smoke` passed)
- `just test -p codex-memories-write` (`29 passed`; `bench-smoke`
passed)

2026-05-27 11:09:33 -07:00

read

chore: move memory prompt builder into extension (#24558 )

2026-05-26 11:53:47 +02:00

write

[codex] add compaction metadata to turn headers (#24368 )

2026-05-27 11:09:33 -07:00

README.md

chore: drop orphaned codex memories MCP crate (#24555 )

2026-05-26 11:29:37 +02:00

README.md

Memories

This directory owns reusable memory crates and the memory pipeline documentation.

Runtime orchestration for Phase 1 and Phase 2 still lives in codex-core under codex-rs/core/src/memories/.

Crates

codex-rs/memories/read (codex-memories-read) owns the read path: memory developer-instruction injection, memory citation parsing, and read-usage telemetry classification.
codex-rs/memories/write (codex-memories-write) owns the write path: Phase 1 and Phase 2 prompt rendering, filesystem artifact helpers, workspace diff helpers, and extension resource pruning.

Prompt Templates

Memory prompt templates live with the crate that uses them:

The undated template files are the canonical latest versions used at runtime:
- read/templates/memories/read_path.md
- write/templates/memories/stage_one_system.md
- write/templates/memories/stage_one_input.md
- write/templates/memories/consolidation.md
In codex, edit those undated template files in place.
The dated snapshot-copy workflow is used in the separate openai/project/agent_memory/write harness repo, not here.

When it runs

The pipeline is triggered when a root session starts, and only if:

the session is not ephemeral
the memory feature is enabled
the session is not a sub-agent session
the state DB is available

It runs asynchronously in the background and executes two phases in order: Phase 1, then Phase 2.

Phase 1: Rollout Extraction (per-thread)

Phase 1 finds recent eligible rollouts and extracts a structured memory from each one.

Eligible rollouts are selected from the state DB using startup claim rules. In practice this means the pipeline only considers rollouts that are:

from allowed interactive session sources
within the configured age window
idle long enough (to avoid summarizing still-active/fresh rollouts)
not already owned by another in-flight phase-1 worker
within startup scan/claim limits (bounded work per startup)

What it does:

claims a bounded set of rollout jobs from the state DB (startup claim)
filters rollout content down to memory-relevant response items
sends each rollout to a model (in parallel, with a concurrency cap)
expects structured output containing:
- a detailed raw_memory
- a compact rollout_summary
- an optional rollout_slug
redacts secrets from the generated memory fields
stores successful outputs back into the state DB as stage-1 outputs

Concurrency / coordination:

Phase 1 runs multiple extraction jobs in parallel (with a fixed concurrency cap) so startup memory generation can process several rollouts at once.
Each job is leased/claimed in the state DB before processing, which prevents duplicate work across concurrent workers/startups.
Failed jobs are marked with retry backoff, so they are retried later instead of hot-looping.

Job outcomes:

succeeded (memory produced)
succeeded_no_output (valid run but nothing useful generated)
failed (with retry backoff/lease handling in DB)

Phase 1 is the stage that turns individual rollouts into DB-backed memory records.

Phase 2: Global Consolidation

Phase 2 consolidates the latest stage-1 outputs into the filesystem memory artifacts and then runs a dedicated consolidation agent.

What it does:

claims a single global phase-2 lock before touching the memories root (so only one consolidation inspects or mutates the workspace at a time)
loads a bounded set of stage-1 outputs from the state DB using phase-2 selection rules:
- ignores memories whose last_usage falls outside the configured max_unused_days window
- for memories with no last_usage, falls back to generated_at so fresh never-used memories can still be selected
- ranks eligible memories by usage_count first, then by the most recent last_usage / generated_at
computes a completion watermark from the claimed watermark + newest input timestamps
syncs local memory artifacts under the memories root:
- raw_memories.md (merged raw memories, stable ascending thread-id order)
- rollout_summaries/ (one summary file per selected rollout)
keeps the memories root itself as a git-baseline directory, initialized under ~/.codex/memories/.git by codex-git-utils
prunes stale rollout summaries that are no longer selected
prunes memory extension resource files older than the extension retention window, so cleanup appears in the workspace diff
writes phase2_workspace_diff.md in the memories root with the git-style diff from the previous successful Phase 2 baseline to the current worktree
if the memory workspace has no changes after artifact sync/pruning, marks the job successful and exits

If the memory workspace has changes, it then:

spawns an internal consolidation sub-agent
builds the Phase 2 prompt with the path to the generated workspace diff
points the agent at phase2_workspace_diff.md for the detailed diff context
runs it with no approvals, no network, and local write access only
disables collab for that agent (to prevent recursive delegation)
watches the agent status and heartbeats the global job lease while it runs
resets the memory git baseline after the agent completes successfully; the generated diff file is removed before this reset so deleted content is not kept in the prompt artifact or unreachable git objects
marks the phase-2 job success/failure in the state DB when the agent finishes

Selection and workspace-diff behavior:

successful Phase 2 runs mark the exact stage-1 snapshots they consumed with selected_for_phase2 = 1 and persist the matching selected_for_phase2_source_updated_at
Phase 1 upserts preserve the previous selected_for_phase2 baseline until the next successful Phase 2 run rewrites it
Phase 2 loads only the current top-N selected stage-1 inputs, syncs rollout_summaries/ directly to that selection, renders raw_memories.md in stable ascending thread-id order to avoid usage-rank churn, then lets the git-style workspace diff surface additions, modifications, and deletions against the previous successful memory baseline
when the selected input set is empty, stale rollout_summaries/ files are removed and raw_memories.md is rewritten to the empty-input placeholder; consolidated outputs such as MEMORY.md, memory_summary.md, and skills/ are left for the agent to update

Watermark behavior:

The global phase-2 lock does not use DB watermarks as a dirty check; git workspace dirtiness decides whether an agent needs to run.
The global phase-2 job row still tracks an input watermark as bookkeeping for the latest DB input timestamp known when the job was claimed.
Phase 2 recomputes a new_watermark using the max of:
- the claimed watermark
- the newest source_updated_at timestamp in the stage-1 inputs it actually loaded
On success, Phase 2 stores that completion watermark in the DB.
This avoids moving the recorded completion watermark backwards, but does not decide whether Phase 2 has work.

In practice, this phase is responsible for refreshing the on-disk memory workspace and producing/updating the higher-level consolidated memory outputs.

Why it is split into two phases

Phase 1 scales across many rollouts and produces normalized per-rollout memory records.
Phase 2 serializes global consolidation so the shared memory artifacts are updated safely and consistently.