Commit Graph

63 Commits

Author SHA1 Message Date
Eric Traut
0ee737cea6 Add goal persistence foundation (1 / 5) (#18073)
Adds the persisted goal foundation for the rest of the stack. This PR is
intentionally limited to feature flag and state-layer behavior;
app-server APIs, model tools, runtime continuation, and TUI UX are
layered in later PRs.

## Why

Goal mode needs durable thread-level state before clients or model tools
can safely build on it. The state layer needs to know whether a goal
exists, what objective it tracks, whether it is active, paused,
budget-limited, or complete, and how much time/token usage has already
been accounted.

## What changed

- Added the `goals` feature flag and generated config schema entry.
- Added the `thread_goals` state table and Rust model for persisted
thread goals.
- Added state runtime APIs for creating, replacing, updating, deleting,
and accounting goal usage.
- Added `goal_id`-based stale update protection so an old goal update
cannot overwrite a replacement.
- Kept this PR scoped to persistence and state runtime behavior, with no
app-server, model-facing, continuation, or TUI behavior yet.

## Verification

- Added state runtime coverage for goal creation, replacement, stale
update protection, status transitions, token-budget behavior, and usage
accounting.
2026-04-24 20:51:38 -07:00
Ruslan Nigmatullin
19badb0be2 app-server: persist device key bindings in sqlite (#19206)
## Why

Device-key providers should only own platform key material. The
account/client binding used to authorize a signing payload is app-server
state, and keeping that state in provider-specific metadata makes the
same check harder to audit and harder to share across platform
implementations.

Persisting the binding in the shared state database gives the device-key
crate a platform-neutral source of truth before it asks a provider to
sign. It also lets app-server move potentially blocking key operations
off the main message processor path, which matters once providers may
wait for OS authentication prompts.

## What changed

- Add a `device_key_bindings` state migration plus `StateRuntime`
helpers keyed by `key_id`.
- Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key`
and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`.
- Keep provider calls behind async store methods and run the synchronous
provider work through `spawn_blocking`.
- Wire app-server device-key RPC handling to the SQLite-backed binding
store and spawn response/error delivery tasks for device-key requests.
- Run the turn-start tracing test on the existing larger current-thread
test harness after the larger async surface made the default test stack
too small locally.

## Validation

- `cargo test -p codex-device-key`
- `cargo test -p codex-state device_key`
- `cargo test -p codex-state`
- `cargo test -p codex-app-server device_key`
- `cargo test -p codex-app-server
message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
- `cargo test -p codex-app-server`
- `just fix -p codex-device-key`
- `just fix -p codex-state`
- `just fix -p codex-app-server`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `git diff --check`
2026-04-23 21:55:56 -07:00
jif-oai
e1c289e11b feat: log client use min log level (#18661)
In the log client, use the log level filter as a minimum severity
instead of exact match

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-20 14:40:39 +01:00
David de Regt
eaf78e43f2 Add sorting/backwardsCursor to thread/list and new thread/turns/list api (#17305)
To improve performance of UI loads from the app, add two main
improvements:
1. The `thread/list` api now gets a `sortDirection` request field and a
`backwardsCursor` to the response, which lets you paginate forwards and
backwards from a window. This lets you fetch the first few items to
display immediately while you paginate to fill in history, then can
paginate "backwards" on future loads to catch up with any changes since
the last UI load without a full reload of the entire data set.
2. Added a new `thread/turns/list` api which also has sortDirection and
backwardsCursor for the same behavior as `thread/list`, allowing you the
same small-fetch for immediate display followed by background fill-in
and resync catchup.
2026-04-17 11:49:02 -07:00
David de Regt
6adba99f4d Stabilize Bazel tests (timeout tweaks and flake fixes) (#17791) 2026-04-16 07:57:51 -07:00
David de Regt
4f2fc3e3fa Moving updated-at timestamps to unique millisecond times (#17489)
To allow the ability to have guaranteed-unique cursors, we make two
important updates:
* Add new updated_at_ms and created_at_ms columns that are in
millisecond precision
* Guarantee uniqueness -- if multiple items are inserted at the same
millisecond, bump the new one by one millisecond until it becomes unique

This lets us use single-number cursors for forwards and backwards paging
through resultsets and guarantee that the cursor is a fixed point to do
(timestamp > cursor) and get new items only.

This updated implementation is backwards-compatible since multiple
appservers can be running and won't handle the previous method well.
2026-04-14 11:55:34 -04:00
Ruslan Nigmatullin
73dab2046f app-server: Add transport for remote control (#15951) 2026-04-06 14:55:59 -07:00
Owen Lin
9bb813353e fix(sqlite): don't hard fail migrator if DB is newer (#16924)
## Description

This PR makes the SQLite state runtime tolerate databases that have
already been migrated by a newer Codex binary.

Today, if an older CLI sees migration versions in `_sqlx_migrations`
that it doesn't know about, startup fails. This change relaxes that
check for the runtime migrators we use in `codex-state` so older
binaries can keep opening the DB in that case.

## Why

We can end up with mixed-version CLIs running against the same local
state DB. In that setup, treating "the database is ahead of me" as a
hard error is unnecessarily strict and breaks the older client even when
the migration history is otherwise fine.

## Follow-up

We still clean up versioned `state_*.sqlite` and `logs_*.sqlite` files
during init, so older binaries can treat newer DB files as legacy. That
should probably be tightened separately if we want mixed-version local
usage to be fully safe.
2026-04-06 12:16:31 -07:00
jif-oai
f839f3ff2e feat: auto vaccum state DB (#16434)
Start with a full vaccum the first time, then auto-vaccum incremental
2026-04-01 16:46:21 +02:00
jif-oai
868ac158d7 feat: log db better maintenance (#16330)
Run a DB clean-up more frequently with an incremental `VACCUM` in it
2026-03-31 19:15:44 +02:00
Charley Cunningham
ebbbc52ce4 Align SQLite feedback logs with feedback formatter (#13494)
## Summary
- store a pre-rendered `feedback_log_body` in SQLite so `/feedback`
exports keep span prefixes and structured event fields
- render SQLite feedback exports with timestamps and level prefixes to
match the old in-memory feedback formatter, while preserving existing
trailing newlines
- count `feedback_log_body` in the SQLite retention budget so structured
or span-prefixed rows still prune correctly
- bound `/feedback` row loading in SQL with the retention estimate, then
apply exact whole-line truncation in Rust so uploads stay capped without
splitting lines

## Details
- add a `feedback_log_body` column to `logs` and backfill it from
`message` for existing rows
- capture span names plus formatted span and event fields at write time,
since SQLite does not retain enough structure to reconstruct the old
formatter later
- keep SQLite feedback queries scoped to the requested thread plus
same-process threadless rows
- restore a SQL-side cumulative `estimated_bytes` cap for feedback
export queries so over-retained partitions do not load every matching
row before truncation
- add focused formatting coverage for exported feedback lines and parity
coverage against `tracing_subscriber`

## Testing
- cargo test -p codex-state
- just fix -p codex-state
- just fmt

codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-18 22:44:31 +00:00
jif-oai
cf143bf71e feat: simplify DB further (#13771) 2026-03-07 03:48:36 -08:00
Owen Lin
289ed549cf chore(otel): rename OtelManager to SessionTelemetry (#13808)
## Summary
This is a purely mechanical refactor of `OtelManager` ->
`SessionTelemetry` to better convey what the struct is doing. No
behavior change.

## Why

`OtelManager` ended up sounding much broader than what this type
actually does. It doesn't manage OTEL globally; it's the session-scoped
telemetry surface for emitting log/trace events and recording metrics
with consistent session metadata (`app_version`, `model`, `slug`,
`originator`, etc.).

`SessionTelemetry` is a more accurate name, and updating the call sites
makes that boundary a lot easier to follow.

## Validation

- `just fmt`
- `cargo test -p codex-otel`
- `cargo test -p codex-core`
2026-03-06 16:23:30 -08:00
Charley Cunningham
4e6c6193a1 Move sqlite logs to a dedicated database (#13772)
## Summary
- move sqlite log reads and writes onto a dedicated `logs_1.sqlite`
database to reduce lock contention with the main state DB
- add a dedicated logs migrator and route `codex-state-logs` to the new
database path
- leave the old `logs` table in the existing state DB untouched for now

## Testing
- just fmt
- cargo test -p codex-state

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-06 10:54:20 -08:00
jif-oai
c8f4b5bc1e feat: limit number of rows per log (#13763)
avoid DB explosion. This is a temp solution
2026-03-06 18:51:42 +01:00
jif-oai
79d6f80e41 chore: clean DB runtime (#12905) 2026-02-26 14:11:10 +00:00
jif-oai
382fa338b3 feat: memories forgetting (#12900)
Add diff based memory forgetting
2026-02-26 13:19:57 +00:00
jif-oai
e4bfa763f6 feat: record memory usage (#12761) 2026-02-25 13:48:40 +00:00
jif-oai
f46b767b7e feat: add search term to thread list (#12578)
Add `searchTerm` to `thread/list` that will search for a match in the
titles (the condition being `searchTerm` $$\in$$ `title`)
2026-02-25 09:59:41 +00:00
daveaitel-openai
dcab40123f Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
## Summary
- Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
auto-export, and store results in SQLite.
- Simplify workflow: remove run/resume/get-status/export tools; spawn is
deterministic and completes in one call.
- Improve exec UX: stable, single-line progress bar with ETA; suppress
sub-agent chatter in exec.

## Why
Enables map-reduce style workflows over arbitrarily large repos using
the existing Codex orchestrator. This addresses review feedback about
overly complex job controls and non-deterministic monitoring.

## Demo (progress bar)
```
./codex-rs/target/debug/codex exec \
  --enable collab \
  --enable sqlite \
  --full-auto \
  --progress-cursor \
  -c agents.max_threads=16 \
  -C /Users/daveaitel/code/codex \
  - <<'PROMPT'
Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
path = item-01..item-30, area = test.

Then call spawn_agents_on_csv with:
- csv_path: /tmp/agent_job_progress_demo.csv
- instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
- output_csv_path: /tmp/agent_job_progress_demo_out.csv
PROMPT
```

## Review feedback addressed
- Auto-start jobs on spawn; removed run/resume/status/export tools.
- Auto-export on success.
- More descriptive tool spec + clearer prompts.
- Avoid deadlocks on spawn failure; pending/running handled safely.
- Progress bar no longer scrolls; stable single-line redraw.

## Tests
- `cd codex-rs && cargo test -p codex-exec`
- `cd codex-rs && cargo build -p codex-cli`
2026-02-24 21:00:19 +00:00
jif-oai
fd67aba114 feat: do not enqueue phase 2 if not necessary (#12344) 2026-02-20 17:21:45 +00:00
jif-oai
0f9eed3a6f feat: add nick name to sub-agents (#12320)
Adding random nick name to sub-agents. Used for UX

At the same time, also storing and wiring the role of the sub-agent
2026-02-20 14:39:49 +00:00
Charley Cunningham
7f3dbaeb25 state: enforce 10 MiB log caps for thread and threadless process logs (#12038)
## Summary
- enforce a 10 MiB cap per `thread_id` in state log storage
- enforce a 10 MiB cap per `process_uuid` for threadless (`thread_id IS
NULL`) logs
- scope pruning to only keys affected by the current insert batch
- add a cheap per-key `SUM(...)` precheck so windowed prune queries only
run for keys that are currently over the cap
- add SQLite indexes used by the pruning queries
- add focused runtime tests covering both pruning behaviors

## Why
This keeps log growth bounded by the intended partition semantics while
preserving a small, readable implementation localized to the existing
insert path.

## Local Latency Snapshot (No Truncation-Pressure Run)
Collected from session `019c734f-1d16-7002-9e00-c966c9fbbcae` using
local-only (uncommitted) instrumentation, while not specifically
benchmarking the truncation-heavy regime.

### Percentiles By Query (ms)
| query | count | p50 | p90 | p95 | p99 | max |
|---|---:|---:|---:|---:|---:|---:|
| `insert_logs.insert_batch` | 110 | 0.332 | 0.999 | 1.811 | 2.978 |
3.493 |
| `insert_logs.precheck.process` | 106 | 0.074 | 0.152 | 0.206 | 0.258 |
0.426 |
| `insert_logs.precheck.thread` | 73 | 0.118 | 0.206 | 0.253 | 1.025 |
1.025 |
| `insert_logs.prune.process` | 58 | 0.291 | 0.576 | 0.607 | 1.088 |
1.088 |
| `insert_logs.prune.thread` | 44 | 0.318 | 0.467 | 0.728 | 0.797 |
0.797 |
| `insert_logs.prune_total` | 110 | 0.488 | 0.976 | 1.237 | 1.593 |
1.684 |
| `insert_logs.total` | 110 | 1.315 | 2.889 | 3.623 | 5.739 | 5.961 |
| `insert_logs.tx_begin` | 110 | 0.133 | 0.235 | 0.282 | 0.412 | 0.546 |
| `insert_logs.tx_commit` | 110 | 0.259 | 0.689 | 0.772 | 1.065 | 1.080
|

### `insert_logs.total` Histogram (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 7 |
| `<= 1.000` | 33 |
| `<= 2.000` | 40 |
| `<= 5.000` | 28 |
| `<= 10.000` | 2 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

## Local Latency Snapshot (Truncation-Heavy / Cap-Hit Regime)
Collected from a run where cap-hit behavior was frequent (`135/180`
insert calls), using local-only (uncommitted) instrumentation and a
temporary local cap of `10_000` bytes for stress testing (not the merged
`10 MiB` cap).

### Percentiles By Query (ms)
| query | count | p50 | p90 | p95 | p99 | max |
|---|---:|---:|---:|---:|---:|---:|
| `insert_logs.insert_batch` | 180 | 0.524 | 1.645 | 2.163 | 3.424 |
3.777 |
| `insert_logs.precheck.process` | 171 | 0.086 | 0.235 | 0.373 | 0.758 |
1.147 |
| `insert_logs.precheck.thread` | 100 | 0.105 | 0.251 | 0.291 | 1.176 |
1.622 |
| `insert_logs.prune.process` | 109 | 0.386 | 0.839 | 1.146 | 1.548 |
2.588 |
| `insert_logs.prune.thread` | 56 | 0.253 | 0.550 | 1.148 | 2.484 |
2.484 |
| `insert_logs.prune_total` | 180 | 0.511 | 1.221 | 1.695 | 4.548 |
5.512 |
| `insert_logs.total` | 180 | 1.631 | 3.902 | 5.103 | 8.901 | 9.095 |
| `insert_logs.total_cap_hit` | 135 | 1.876 | 4.501 | 5.547 | 8.902 |
9.096 |
| `insert_logs.total_no_cap_hit` | 45 | 0.520 | 1.700 | 2.079 | 3.294 |
3.294 |
| `insert_logs.tx_begin` | 180 | 0.109 | 0.253 | 0.287 | 1.088 | 1.406 |
| `insert_logs.tx_commit` | 180 | 0.267 | 0.813 | 1.170 | 2.497 | 2.574
|

### `insert_logs.total` Histogram (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 16 |
| `<= 1.000` | 39 |
| `<= 2.000` | 60 |
| `<= 5.000` | 54 |
| `<= 10.000` | 11 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

### `insert_logs.total` Histogram When Cap Was Hit (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 0 |
| `<= 1.000` | 22 |
| `<= 2.000` | 51 |
| `<= 5.000` | 51 |
| `<= 10.000` | 11 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

### Performance Takeaways
- Even in a cap-hit-heavy run (`75%` cap-hit calls), `insert_logs.total`
stays sub-10ms at p99 (`8.901ms`) and max (`9.095ms`).
- Calls that did **not** hit the cap are materially cheaper
(`insert_logs.total_no_cap_hit` p95 `2.079ms`) than cap-hit calls
(`insert_logs.total_cap_hit` p95 `5.547ms`).
- Compared to the earlier non-truncation-pressure run, overall
`insert_logs.total` rose from p95 `3.623ms` to p95 `5.103ms`
(+`1.48ms`), indicating bounded overhead when pruning is active.
- This truncation-heavy run used an intentionally low local cap for
stress testing; with the real 10 MiB cap, cap-hit frequency should be
much lower in normal sessions.

## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-state` (in `codex-rs`)
2026-02-18 17:08:08 -08:00
jif-oai
31d4bfdde0 feat: add --search to just log (#11995)
Summary
- extend the log client to accept an optional `--search` substring
filter when querying codex-state logs
- propagate the filter through `LogQuery` and apply it in
`push_log_filters` via `INSTR(message, ...)`
- add an integration test that exercises the new search filtering
behavior

Testing
- Not run (not requested)
2026-02-17 14:19:52 +00:00
Charley Cunningham
fce4ad9cf4 Add process_uuid to sqlite logs (#11534)
## Summary
This PR is the first slice of the per-session `/feedback` logging work:
it adds a process-unique identifier to SQLite log rows.

It does **not** change `/feedback` sourcing behavior yet.

## Changes
- Add migration `0009_logs_process_id.sql` to extend `logs` with:
  - `process_uuid TEXT`
  - `idx_logs_process_uuid` index
- Extend state log models:
  - `LogEntry.process_uuid: Option<String>`
  - `LogRow.process_uuid: Option<String>`
- Stamp each log row with a stable per-process UUID in the sqlite log
layer:
  - generated once per process as `pid:<pid>:<uuid>`
- Update sqlite log insert/query paths to persist and read
`process_uuid`:
  - `INSERT INTO logs (..., process_uuid, ...)`
  - `SELECT ..., process_uuid, ... FROM logs`

## Why
App-server runs many sessions in one process. This change provides a
process-scoping primitive we need for follow-up `/feedback` work, so
threadless/process-level logs can be associated with the emitting
process without mixing across processes.

## Non-goals in this PR
- No `/feedback` transport/source changes
- No attachment size changes
- No sqlite retention/trim policy changes

## Testing
- `just fmt`
- CI will run the full checks
2026-02-14 17:27:22 -08:00
jif-oai
db66d827be feat: add slug in name (#11739) 2026-02-13 15:24:03 +00:00
Wendy Jiao
88c5ca2573 Add cwd to memory files (#11591)
Add cwd to memory files so that model can deal with multi cwd memory
better.

---------

Co-authored-by: jif-oai <jif@openai.com>
2026-02-12 17:46:49 +00:00
jif-oai
2a409ca67c nit: upgrade DB version (#11581) 2026-02-12 13:16:28 +00:00
jif-oai
19ab038488 fix: db stuff mem (#11575)
* Documenting DB functions
* Fixing 1 nit where stage-2 was sorting the stage 1 in the wrong
direction
* Added some tests
2026-02-12 12:53:47 +00:00
jif-oai
adad23f743 Ensure list_threads drops stale rollout files (#11572)
Summary
- trim `state_db::list_threads_db` results to entries whose rollout
files still exist, logging and recording a discrepancy for dropped rows
- delete stale metadata rows from the SQLite store so future calls don’t
surface invalid paths
- add regression coverage in `recorder.rs` to verify stale DB paths are
dropped when the file is missing
2026-02-12 12:49:31 +00:00
Michael Bolin
abbd74e2be feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.

It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.

## What

- Added `ReadOnlyAccess` in protocol with:
  - `Restricted { include_platform_defaults, readable_roots }`
  - `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
  - `ReadOnly { access: ReadOnlyAccess }`
  - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.

## Compatibility / rollout

- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.
2026-02-11 18:31:14 -08:00
jif-oai
f5d4a21098 feat: new memory prompts (#11439)
* Update prompt
* Wire CWD in the prompt
* Handle the no-output case
2026-02-11 13:57:52 +00:00
jif-oai
2c5eeb6b1f fix: flaky test (#11428)
stage1_concurrent_claims_respect_running_cap was flaky due to SQLite
lock contention, not cap logic correctness. The claim flow used deferred
transactions (BEGIN) with read-then-write behavior, which can fail under
concurrency with SQLITE_BUSY_SNAPSHOT/database is locked when upgrading
a read transaction to a write transaction. We fixed this by using BEGIN
IMMEDIATE for stage1 and phase2 claim paths, so lock acquisition happens
up front and contenders serialize cleanly instead of failing during
upgrade. After the change, codex-state tests pass and stress reruns of
the flaky path no longer reproduced the failure.
2026-02-11 10:23:18 +00:00
pakrym-oai
bfd4e2112c Disable very flaky tests (#11394)
Collected from last 20 builds of main in
https://github.com/openai/codex/commits/main/.
2026-02-10 18:50:11 -08:00
jif-oai
87bbfc50a1 feat: prevent double backfill (#11377)
## Summary

Add a DB-backed lease to prevent duplicate `.sqlite` backfill workers
from running concurrently.

### What changed
- Added StateRuntime::try_claim_backfill(lease_seconds) that atomically
claims backfill only when:
  - backfill is not complete, and
  - no fresh running worker currently owns it.
- Updated backfill_sessions to use the claim API and exit early when
another worker already holds the lease.
- Added runtime tests covering:
  - singleton claim behavior,
  - stale lease takeover,
  - claim blocked after complete.
- Set backfill lease to 900s in production and 1s in tests.

### Why

This avoids duplicate backfill work and reduces backfill status churn
under concurrent startup, while preserving
current best-effort fallback behavior.
2026-02-11 00:24:20 +00:00
jif-oai
674799d356 feat: mem v2 - PR6 (consolidation) (#11374) 2026-02-11 00:02:57 +00:00
jif-oai
2c9be54c9a feat: mem v2 - PR5 (#11372) 2026-02-10 23:22:55 +00:00
jif-oai
623d3f4071 feat: mem v2 - PR4 (#11369)
# Memories migration plan (simplified global workflow)

## Target behavior

- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.

## Current code map

- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.

## PR plan

## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)

- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.

Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.

## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)

- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.

Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.

## PR 3: Remove per-cwd memories and move to one global memory root

- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).

Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.

## PR 4: Phase-2 global lock simplification and cleanup

- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.

Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.

## PR 5: Final cleanup and docs

- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.

Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.

## Notes on sequencing

- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
2026-02-10 23:10:35 +00:00
jif-oai
3419660767 feat: mem v2 - PR3 (#11366)
# Memories migration plan (simplified global workflow)

## Target behavior

- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.

## Current code map

- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.

## PR plan

## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)

- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.

Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.

## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)

- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.

Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.

## PR 3: Remove per-cwd memories and move to one global memory root

- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).

Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.

## PR 4: Phase-2 global lock simplification and cleanup

- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.

Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.

## PR 5: Final cleanup and docs

- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.

Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.

## Notes on sequencing

- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
2026-02-10 22:12:50 +00:00
jif-oai
07da740c8a feat: mem v2 - PR1 (#11364)
# Memories migration plan (simplified global workflow)

## Target behavior

- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.

## Current code map

- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.

## PR plan

## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)

- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.

Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.

## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)

- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.

Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.

## PR 3: Remove per-cwd memories and move to one global memory root

- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).

Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.

## PR 4: Phase-2 global lock simplification and cleanup

- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.

Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.

## PR 5: Final cleanup and docs

- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.

Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.

## Notes on sequencing

- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
2026-02-10 21:29:06 +00:00
jif-oai
a6e9469fa4 chore: unify memory job flow (#11334) 2026-02-10 20:26:39 +00:00
jif-oai
e57892b211 feat: phase 2 consolidation (#11306)
Consolidation phase of memories

Cleaning and better handling of concurrency
2026-02-10 14:31:16 +00:00
jif-oai
1d5eba0090 feat: align memory phase 1 and make it stronger (#11300)
## Align with the new phase-1 design

Basically we know run phase 1 in parallel by considering:
* Max 64 rollouts
* Max 1 month old
* Consider the most recent first

This PR also adds stronger parallelization capabilities by detecting
stale jobs, retry policies, ownership of computation to prevent double
computations etc etc
2026-02-10 13:42:09 +00:00
jif-oai
6049ff02a0 memories: add extraction and prompt module foundation (#11200)
## Summary
- add the new `core/src/memories` module (phase-one parsing, rollout
filtering, storage, selection, prompts)
- add Askama-backed memory templates for stage-one input/system and
consolidation prompts
- add module tests for parsing, filtering, path bucketing, and summary
maintenance

## Testing
- just fmt
- cargo test -p codex-core --lib memories::
2026-02-10 10:10:24 +00:00
jif-oai
74ecd6e3b2 state: add memory consolidation lock primitives (#11199)
## Summary
- add a migration for memory_consolidation_locks
- add acquire/release lock primitives to codex-state runtime
- add core/state_db wrappers and cwd normalization for memory queries
and lock keys

## Testing
- cargo test -p codex-state memory_consolidation_lock_
- cargo test -p codex-core --lib state_db::
2026-02-09 21:04:20 +00:00
jif-oai
3800173459 feat: backfill async again (#10894) 2026-02-06 15:41:52 +01:00
jif-oai
9ee746afd6 Leverage state DB metadata for thread summaries (#10621)
Summary:
- read conversation summaries and cwd info from the state DB when
possible so we no longer rely on rollout files for metadata and avoid
extra I/O
- persist CLI version in thread metadata, surface it through summary
builders, and add the necessary DB migration hooks
- simplify thread listing by using enriched state DB data directly
rather than reading rollout heads

Testing:
- Not run (not requested)
2026-02-05 16:39:11 +00:00
jif-oai
68e82e5dc9 nit: add DB version is discrepancy recording (#10762) 2026-02-05 16:24:18 +00:00
jif-oai
4033f905c6 feat: resumable backfill (#10745)
## Summary

This PR makes SQLite rollout backfill resumable and repeatable instead
of one-shot-on-db-create.

## What changed

- Added a persisted backfill state table:
  - state/migrations/0008_backfill_state.sql
- Tracks status (pending|running|complete), last_watermark, and
last_success_at.
- Added backfill state model/types in codex-state:
  - BackfillState, BackfillStatus (state/src/model/backfill_state.rs)
- Added runtime APIs to manage backfill lifecycle/progress:
  - get_backfill_state
  - mark_backfill_running
  - checkpoint_backfill
  - mark_backfill_complete
- Updated core startup behavior:
- Backfill now runs whenever state is not Complete (not only when DB
file is newly created).
- Reworked backfill execution:
- Collect rollout files, derive deterministic watermark per path, sort,
resume from last_watermark.
- Process in batches (BACKFILL_BATCH_SIZE = 200), checkpoint after each
batch.
  - Mark complete with last_success_at at the end.

## Why

Previous behavior could leave users permanently partially backfilled if
the process exited during initial async backfill. This change allows
safe continuation across restarts and avoids restarting from scratch.
2026-02-05 14:34:34 +00:00
jif-oai
4922b3e571 feat: add phase 1 mem db (#10634)
- Schema: thread_id (PK, FK to threads.id with cascade delete),
trace_summary, memory_summary, updated_at.
- Migration: creates the table and an index on (updated_at DESC,
thread_id DESC) for efficient recent-first reads.
  - Runtime API (DB-only):
      - `get_thread_memory(thread_id)`: fetch one memory row.
- `upsert_thread_memory(thread_id, trace_summary, memory_summary)`:
insert/update by thread id and always advance updated_at.
- `get_last_n_thread_memories_for_cwd(cwd, n)`: join thread_memory with
threads and return newest n rows for an exact cwd match.
- Model layer: introduced ThreadMemory and row conversion types to keep
query decoding typed and consistent with existing state models.
2026-02-04 21:38:39 +00:00