codex

mirror of https://github.com/openai/codex.git synced 2026-05-15 08:42:34 +00:00

Author	SHA1	Message	Date
Felipe Coury	9798eb377a	feat(cli): add codex doctor diagnostics (#22336 ) ## Why Users and support need a single command that captures the local Codex runtime, configuration, auth, terminal, network, and state shape without asking the user to know which diagnostic depth to choose first. `codex doctor` now runs the useful checks by default and makes the detailed human output the default because the command is usually run when someone already needs context. The command also targets concrete support failure modes we have seen while iterating on the design: - update-target mismatches like #21956, where the installed package manager target can differ from the running executable - terminal and multiplexer issues that depend on `TERM`, tmux/zellij state, color handling, and TTY metadata - provider-specific HTTP/WebSocket connectivity, including ChatGPT WebSocket handshakes and API-key/provider endpoint reachability - local state/log SQLite integrity problems and large rollout directories - feedback reports that need an attached, redacted diagnostic snapshot without asking the user to run a second command ## What Changed - Adds `codex doctor` as a grouped CLI diagnostic report with default detailed output and `--summary` for the compact view. - Adds stable report sections for Environment, Configuration, Updates, Connectivity, and Background Server, plus a top Notes block that promotes anomalies such as available updates, large rollout directories, optional MCP issues, and mixed auth signals. - Adds runtime provenance, install consistency, bundled/system search readiness, terminal/multiplexer metadata, `config.toml` parse status, auth mode details, sandbox details, feature flag summaries, update cache/latest-version state, app-server daemon state, SQLite integrity checks, rollout statistics, and provider-aware network diagnostics. - Adds ChatGPT WebSocket diagnostics that report the negotiated HTTP upgrade as `HTTP 101 Switching Protocols` and include timeout, DNS, auth, and provider context in detailed output. - Makes reachability provider-aware: API-key OpenAI setups check the API endpoint, ChatGPT auth checks the ChatGPT path, and custom/AWS/local providers check configured HTTP endpoints when available. - Adds structured, redacted JSON output where `checks` is keyed by check id and `details` is a key/value object for support tooling. - Integrates doctor with feedback uploads by attaching a best-effort `codex-doctor-report.json` report and adding derived Sentry tags for overall status and failing/warning checks. - Updates the TUI feedback consent copy so users can see that the doctor report is included when logs/diagnostics are uploaded. - Updates the CLI bug issue template to ask reporters for `codex doctor --json` and render pasted reports as JSON. ## Example Output The examples below are sanitized from local smoke runs with `--no-color` so the structure is reviewable in plain text. ### `codex doctor` ```text Codex Doctor v0.0.0 · macos-aarch64 Notes ↑ updates 0.130.0 available (current 0.0.0, dismissed 0.128.0) ⚠ rollouts 1,526 active files · 2.53 GB on disk ⚠ mcp MCP configuration has optional issues ⚠ auth mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode ───────────────────────────────────────────────────────────── Environment ✓ runtime local debug build version 0.0.0 install method other commit unknown executable ~/code/codex.fcoury-doct…x-rs/target/debug/codex ✓ install consistent context other managed by npm: no · bun: no · package root — PATH entries (2) ~/.local/share/mise/installs/node/24/bin/codex ~/.local/share/mise/shims/codex ✓ search ripgrep 15.1.0 (system, `rg`) ✓ terminal Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color terminal Ghostty TERM_PROGRAM ghostty terminal version 1.3.2-main-+b0f827665 TERM xterm-256color multiplexer tmux 3.6a tmux extended-keys on tmux allow-passthrough on tmux set-clipboard on ✓ state databases healthy CODEX_HOME ~/.codex (dir) state DB ~/.codex/state_5.sqlite (file) · integrity ok log DB ~/.codex/logs_2.sqlite (file) · integrity ok active rollouts 1,526 files · 2.53 GB (avg 1.70 MB) archived rollouts 8 files · 3.84 MB (avg 491.11 KB) Configuration ✓ config loaded model gpt-5.5 · openai cwd ~/code/codex.fcoury-doctor/codex-rs config.toml ~/.codex/config.toml config.toml parse ok MCP servers 1 feature flags 36 enabled · 7 overridden (full list with --all) overrides code_mode, code_mode_only, memories, chronicle, goals, remote_control, prevent_idle_sleep ✓ auth auth is configured auth storage mode File auth file ~/.codex/auth.json auth env vars present OPENAI_API_KEY stored auth mode chatgpt stored API key false stored ChatGPT tokens true stored agent identity false ⚠ mcp MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server. configured servers 1 disabled servers 0 streamable_http servers 1 optional reachability openaiDeveloperDocs: https://developers.openai.com/mcp (HEAD connect failed; GET connect failed) ✓ sandbox restricted fs + restricted network · approval OnRequest approval policy OnRequest filesystem sandbox restricted network sandbox restricted Connectivity ✓ network network-related environment looks readable ✓ websocket connected (HTTP 101 Switching Protocols) · 15s timeout model provider openai provider name OpenAI wire API responses supports websockets true connect timeout 15000 ms auth mode chatgpt endpoint wss://chatgpt.com/backend-api/<redacted> DNS 2 IPv4, 2 IPv6, first IPv6 handshake result HTTP 101 Switching Protocols ✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration. reachability mode API key auth openai API https://api.openai.com/v1 connect failed (required) Background Server ○ app-server not running (ephemeral mode) ───────────────────────────────────────────────────────────── 11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed --summary compact output --all expand truncated lists --json redacted report ``` ### `codex doctor --summary` ```text Codex Doctor v0.0.0 · macos-aarch64 Notes ↑ updates 0.130.0 available (current 0.0.0, dismissed 0.128.0) ⚠ rollouts 1,526 active files · 2.53 GB on disk ⚠ mcp MCP configuration has optional issues ⚠ auth mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode ───────────────────────────────────────────────────────────── Environment ✓ runtime local debug build ✓ install consistent ✓ search ripgrep 15.1.0 (system, `rg`) ✓ terminal Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color ✓ state databases healthy Configuration ✓ config loaded ✓ auth auth is configured ⚠ mcp MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server. ✓ sandbox restricted fs + restricted network · approval OnRequest Updates ✓ updates update configuration is locally consistent Connectivity ✓ network network-related environment looks readable ✓ websocket connected (HTTP 101 Switching Protocols) · 15s timeout ✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration. Background Server ○ app-server not running (ephemeral mode) ───────────────────────────────────────────────────────────── 11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed Run codex doctor without --summary for detailed diagnostics. --all expand truncated lists --json redacted report ``` ### `codex doctor --json` shape ```json { "schema_version": 1, "overall_status": "fail", "checks": { "runtime.provenance": { "id": "runtime.provenance", "category": "Environment", "status": "ok", "summary": "local debug build", "details": { "version": "0.0.0", "install method": "other", "commit": "unknown" } }, "sandbox.helpers": { "id": "sandbox.helpers", "category": "Configuration", "status": "ok", "summary": "restricted fs + restricted network · approval OnRequest", "details": { "approval policy": "OnRequest", "filesystem sandbox": "restricted", "network sandbox": "restricted" } } } } ``` ### `/feedback` new sentry attachment <img width="938" height="798" alt="CleanShot 2026-05-13 at 15 36 14" src="https://github.com/user-attachments/assets/715e62e0-d7b4-4fea-a35a-fd5d5d33c4c0" /> ### New section in CLI issue template <img width="1164" height="435" alt="CleanShot 2026-05-13 at 15 47 24" src="https://github.com/user-attachments/assets/9081dc25-a28c-4afa-8ba1-e299c2b4031d" /> ## How to Test 1. Run `cargo run --bin codex -- doctor --no-color`. 2. Confirm the detailed report is the default and includes promoted Notes, grouped sections, terminal details, state DB integrity, rollout stats, provider reachability, WebSocket diagnostics, and app-server status. 3. Run `cargo run --bin codex -- doctor --summary --no-color`. 4. Confirm the compact view keeps the same sections and summary counts but omits detailed key/value rows. 5. Run `cargo run --bin codex -- doctor --json`. 6. Confirm the output is redacted JSON, `checks` is an object keyed by check id, and each check's `details` is a key/value object. 7. Preview the CLI bug issue template and confirm the `Codex doctor report` field appears after the terminal field, asks for `codex doctor --json`, and renders pasted output as JSON. 8. Start a feedback flow that includes logs. 9. Confirm the upload consent copy lists `codex-doctor-report.json` alongside the log attachments. Targeted tests: - `cargo test -p codex-cli doctor` - `cargo test -p codex-app-server doctor_report_tags_summarize_status_counts` - `cargo test -p codex-feedback` - `cargo test -p codex-tui feedback_view` - `just argument-comment-lint` - `git diff --check`	2026-05-13 21:23:19 +00:00
jif-oai	b401666ca5	Add process-scoped SQLite telemetry (#22154 ) ## Summary - add SQLite init, backfill-gate, and fallback telemetry without introducing a cross-cutting state-db access wrapper - install one process-scoped telemetry sink after OTEL startup and let low-level state/rollout paths emit through it directly - add process-start metrics for the process owners that initialize SQLite --------- Co-authored-by: Owen Lin <owen@openai.com>	2026-05-11 11:32:40 -07:00
Eric Traut	1e65b3e0af	Fix goal update and add `/goal edit` command in TUI (#21954 ) ## Why Users have requested the ability to edit a goal's objective after a goal has been created. This PR exposes a new `/goal edit` command in the TUI to address this request. In the process of implementing this, I also noticed an existing bug in the goal runtime. When a goal's objective is updated through the `thread/goal/set` app server API, the goal runtime didn't emit a new steering prompt to tell the agent about the new objective. This PR also fixes this hole. ## What Changed - Adds `/goal edit` in the TUI, opening an edit box prefilled with the current goal objective. - Keeps active and paused goals in their current state, resets completed goals to active, keeps budget-limited goals budget-limited, and preserves the existing token budget. - Changes the existing `thread/goal/set` behavior so editing an objective preserves goal accounting instead of resetting it. The older reset-on-new-objective behavior was left over from before `thread/goal/clear`; clients that need to reset accounting can now clear the existing goal and create a new one. - Reuses the existing goal set API path; this does not add or change app-server protocol surface area. - Adds a dedicated goal runtime steering prompt when an externally persisted goal mutation changes the objective, so active turns receive the updated objective. ## Validation - Make sure `/goal edit` returns an error if no goal currently exists - Make sure `/goal edit` displays an edit box that can be optionally canceled with no side effects - Make sure that an edited goal results in a steer so the agent starts pursuing the new objective - Make sure the new objective is reflected in the goal if you use `/goal` to display the goal summary - Make sure that `/goal edit` doesn't reset the token budget, time/token accounting on the updated goal	2026-05-11 10:49:19 -07:00
Eric Traut	f10ddc3f13	Use goal preview metadata for goal-first threads (#21981 ) Fixes #20792 ## Why `/goal`-first threads are valid resumable threads, but they can be missing from `codex resume` and app recents because discovery depends on metadata derived from a normal first user message. PR #21489 attempted to fix this by using the goal objective as `first_user_message`. Review feedback pointed out that `first_user_message` does more than provide visible text today: it gates listing, supplies preview text, and participates in deciding whether a later title should surface as a distinct thread name. Reusing it for the goal objective could leave a `/goal`-first thread with `first_user_message=<goal>` and `title=<later prompt>`, even though the goal should only provide the initial visible preview. This PR follows that feedback by and keeps the `first_user_message` as is but introduces a new `preview` field to separate concerns. The `preview` field is populated from the first user message or the goal objective. We can extend it in the future to include other sources. ## What Changed - Added internal thread `preview` metadata in `codex-state`, including a SQLite migration that backfills from `first_user_message` and from existing `thread_goals` objectives when needed. - Treated `ThreadGoalUpdated` as preview-bearing metadata so goal-first threads can be listed and searched without mutating `first_user_message`. - Updated rollout listing, state queries, thread-store conversion, and app-server mapping to use preview metadata while continuing to expose the existing public `preview` field. - Preserved title/name distinctness behavior around literal `first_user_message`, so a later normal prompt after `/goal` does not surface as a separate name just because the goal supplied the initial preview. - Preserved compatibility for older/internal metadata writes by deriving preview from `first_user_message` when explicit preview metadata is absent. ## Verification - Manually verified that a thread that starts with a `/goal <objective>` shows up in the resume picker.	2026-05-11 10:12:46 -07:00
Owen Lin	95ca276373	sqlite: no more destructive version bumps (#21847 ) ## Why We'd like SQLite state to become required and load-bearing. As a first step, let's remove the mechanism that allows us to blow away the SQLite DB on a version bump, and instead rely on graceful migrations. The original motivation ([PR](https://github.com/openai/codex/pull/10623)) behind this mechanism was to care less about backwards compatibility while SQLite was being landed, but I'd say it's quite important now to keep the data in it. ## What changed - Make `STATE_DB_FILENAME` and `LOGS_DB_FILENAME` the full canonical filenames: `state_5.sqlite` and `logs_2.sqlite`. - Remove `STATE_DB_VERSION` / `LOGS_DB_VERSION` and the helper that constructed filenames from versions. - Stop `StateRuntime::init` from scanning for or deleting older SQLite DB filenames at startup. - Delete the tests that encoded legacy state/logs DB deletion behavior. ## Verification - `cargo test -p codex-state`	2026-05-08 17:29:44 -07:00
Ruslan Nigmatullin	e64a8979b0	device-key: clean up unused crate (#21487 )	2026-05-07 09:01:44 -07:00
jif-oai	ab43db44a2	feat: move auto vaccum (#21378 ) The initial vaccum is not needed anymore. We can consider all the DBs have been reclaimed by now	2026-05-06 19:32:28 +02:00
rhan-oai	b3d4f1a9f0	[codex-analytics] rework thread_source for thread analytics (#20949 ) ## Summary - make `thread_source` an explicit optional thread-level field on `thread/start`, `thread/fork`, and returned thread payloads - persist `thread_source` in rollout/session metadata so resumed live threads retain the original value - replace the old best-effort `session_source` -> `thread_source` mapping with an explicit caller-supplied analytics classification ## Why Before this change, analytics `thread_source` was populated by a best-effort mapping from `session_source`. `session_source` describes the runtime/client surface, not the actual thread-level origin, so that projection was not accurate enough to distinguish cases such as `user`, `subagent`, `memory_consolidation`, and future thread origins reliably. Making `thread_source` explicit keeps one thread-level analytics field while letting callers provide the real classification directly instead of recovering it indirectly from `session_source`. ## Impact For new analytics events, `thread_source` now reflects the explicit thread-level classification supplied by the caller rather than an inferred value derived from `session_source`. Existing protocol fields remain optional; callers that omit `threadSource` now produce `null` instead of a best-effort inferred value. ## Validation - `just write-app-server-schema` - `cargo test -p codex-analytics -p codex-core -p codex-app-server-protocol --no-run` - `cargo test -p codex-app-server-protocol generated_ts_optional_nullable_fields_only_in_params` - `cargo test -p codex-analytics thread_initialized_event_serializes_expected_shape` - `cargo test -p codex-core resume_stopped_thread_from_rollout_preserves_thread_source`	2026-05-06 02:12:31 +00:00
pakrym-oai	2c1a361a2e	[codex] Move thread naming to app server (#21260 ) ## Why Thread names are app-server metadata now, backed by the thread store and sqlite state database. Keeping a core `SetThreadName` op plus a rollout `thread_name_updated` event made rename persistence live in the wrong layer and required historical replay support for an event that new app-server flows should not write. ## What changed - Removed `Op::SetThreadName` and `EventMsg::ThreadNameUpdated` from the core protocol and deleted the core handler path that appended rename events to rollouts. - Updated app-server `thread/name/set` so both loaded and unloaded threads write through thread-store metadata and app-server emits `thread/name/updated` notifications. - Updated local thread-store name metadata updates to write sqlite title metadata and the legacy thread-name index without appending rollout events. - Removed state extraction and rollout handling for the deleted thread-name event. ## Validation - `cargo test -p codex-app-server thread_name_updated_broadcasts` - `cargo test -p codex-app-server thread_name_set_is_reflected_in_read_list_and_resume` - `cargo test -p codex-thread-store update_thread_metadata_sets_name_on_active_rollout_and_indexes_name` - `cargo test -p codex-state` - `cargo check -p codex-mcp-server -p codex-rollout-trace` - `just fix -p codex-app-server -p codex-thread-store -p codex-state -p codex-mcp-server -p codex-rollout-trace` ## Docs No external documentation update is expected for this internal ownership change.	2026-05-05 17:16:06 -07:00
Tom	707e51bd8b	codex: route metadata updates through ThreadStore (#20576 ) - Route `thread/metadata/update` through `ThreadStore::update_thread_metadata`. - Add `LocalThreadStore` git metadata patch support for set, partial update, and clear semantics. - Add some unit tests for the new thread store code - Remove a lot of dead code/tests!	2026-05-04 20:09:41 -07:00
Rasmus Rygaard	782191547c	Add agent graph store interface (#19229 ) ## Summary Persisted subagent parent/child topology currently leaks through `StateRuntime`'s SQLite-specific thread-spawn helpers. This PR introduces a narrow `AgentGraphStore` boundary so follow-up work can route graph operations through a local or remote store without coupling orchestration code directly to the state DB graph API. ## Changes - Adds the new `codex-agent-graph-store` crate. - Defines a flat `AgentGraphStore` trait for the v1 graph surface: upsert edge, set edge status, list direct children, and list descendants. - Adds public graph types for `ThreadSpawnEdgeStatus`, `AgentGraphStoreError`, and `AgentGraphStoreResult`. - Implements `LocalAgentGraphStore` on top of an existing `codex_state::StateRuntime`, preserving today's SQLite-backed `thread_spawn_edges` behavior. - Registers the crate in Cargo/Bazel metadata. This PR only adds the local contract and implementation; call-site migration and the remote gRPC store are left to the follow-up PRs in the stack. ## Testing - `cargo test -p codex-agent-graph-store` The new unit tests cover local parity with the existing `StateRuntime` graph methods, `Open`/`Closed` filtering, status updates, and stable breadth-first descendant ordering.	2026-04-29 22:48:26 +00:00
Dylan Hurd	7f7c7c2c07	Fix log db batch flush flake (#19959 ) ## Why The log DB writer batches tracing events before inserting them into SQLite, but `tokio::time::interval` produces an immediate first tick. That meant the inserter could flush the first accepted log entry before `batch_size` was reached, making `configured_batch_size_flushes_without_explicit_flush` timing-sensitive in CI. ## What Changed - Consume the interval's startup tick before entering the inserter loop, so interval flushing starts after the configured delay. - Remove the test's startup sleep, which was masking the race instead of proving the batch-size behavior. ## Validation - `cargo test -p codex-state` - `cargo test -p codex-state configured_batch_size_flushes_without_explicit_flush` passed 3 consecutive focused runs - PR checks passed across `rust-ci`, Bazel, `ci`, `sdk`, `cargo-deny`, Codespell, blob-size policy, and CLA	2026-04-28 12:08:41 -07:00
jif-oai	a9e5c34083	feat: trigger memories from user turns with cooldown (#19970 ) ## Why Memory startup was tied to thread lifecycle events such as create, load, and fork. That can run memory work before a thread receives real user input, and it makes startup cost scale with thread management instead of actual turns. Moving the trigger to `thread/sendInput` keeps memory startup aligned with the first real user turn and lets it use the current thread config at turn time. The idea is to prevent ghost cost due to pre-warm triggered by the app Turn-based startup can also make global phase-2 consolidation easier to request repeatedly, so this adds a success cooldown and tightens the default startup scan window. ## What Changed - Start `codex_memories_write::start_memories_startup_task` after a non-empty `thread/sendInput` turn is submitted, instead of from thread create/load/fork paths: `d4a6885b78/codex-rs/app-server/src/codex_message_processor.rs (L6477-L6487)` - Expose `CodexThread::config()` so app-server can pass the live config into memory startup at turn time. - Add a six-hour successful-run cooldown for global phase-2 consolidation via `SkippedCooldown`: `d4a6885b78/codex-rs/state/src/runtime/memories.rs (L963-L966)` - Reduce memory startup defaults to at most 2 rollouts over 10 days: `d4a6885b78/codex-rs/config/src/types.rs (L31-L34)` ## Verification Updated the memory runtime coverage around phase-2 reclaim behavior, including `phase2_global_lock_respects_success_cooldown`. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-28 16:23:13 +02:00
jif-oai	fa127be25f	Stabilize memory Phase 2 input ordering (#19967 ) ## Why Phase 2 still needs to choose the most relevant stage-1 memory outputs by usage and recency, but exposing that ranking as the rendered `raw_memories.md` order creates unnecessary large diff. Usage-count or timestamp changes can reshuffle otherwise unchanged memories, making the workspace diff noisy and giving the consolidation prompt a misleading recency signal from file position. This fix will reduce token consumption ## What Changed - Keep the existing top-N Phase 2 selection ranking by `usage_count`, `last_usage`, `source_updated_at`, and `thread_id`. - Return the selected rows in stable ascending `thread_id` order before syncing Phase 2 filesystem inputs. - Update the memory README, raw memories header, and consolidation prompt so they describe the stable order and tell the prompt to use metadata and workspace diffs instead of file order as the recency signal. - Adjust the memory runtime tests to use deterministic thread IDs and assert the stable return order separately from the ranked selection semantics. ## Test Coverage - Existing memory runtime tests in `codex-rs/state/src/runtime/memories.rs` now cover the stable returned ordering for Phase 2 inputs. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-28 13:32:05 +02:00
jif-oai	f431ec12c9	nit: one more fix (#19813 ) Fix this: https://github.com/openai/codex/pull/19812#discussion_r3147529230	2026-04-27 15:32:31 +02:00
jif-oai	79b4f691a6	Avoid rewriting Phase 2 selection on clean workspace (#19812 ) ## Why Phase 2 can now claim the global consolidation lock on startup even when the git-backed memory workspace is already clean. The clean-workspace path still finalized through the normal Phase 2 success path, which clears and re-marks `selected_for_phase2` rows. That made no-op startups perform avoidable writes to `stage1_outputs`, creating unnecessary DB I/O and contention when no memory files changed. ## What Changed - Added a preserving-selection Phase 2 finalizer in `codex-state` that only marks the global job row as succeeded. - Kept the existing `mark_global_phase2_job_succeeded` behavior for real consolidation runs, where the selected Phase 2 snapshot must be rewritten. - Switched the `succeeded_no_workspace_changes` branch in `core/src/memories/phase2.rs` to use the preserving-selection finalizer. - Added a regression test that installs a SQLite trigger on `stage1_outputs` and verifies the clean finalizer performs zero updates there. ## Testing - `cargo test -p codex-state` - `cargo test -p codex-core memories::tests::phase2`	2026-04-27 15:14:16 +02:00
jif-oai	5d314f324c	Allow Phase 2 memory claims after retry exhaustion (#19809 ) ## Why The Phase 2 memories job row is only the global lock for the git-backed memory workspace. Manual memory edits do not enqueue new Stage 1 work, so a Phase 2 row with `retry_remaining = 0` could be skipped before the worker ever claimed the lock and generated `phase2_workspace_diff.md`. That left workspace-only changes unconsolidated after repeated failures, even when retry backoff had elapsed and the filesystem had real diffable work. ## What Changed - Allow `try_claim_global_phase2_job` to claim the Phase 2 lock after the retry budget is exhausted, while still respecting active `retry_at` backoff and fresh running leases. - Treat `SkippedRetryUnavailable` for Phase 2 as backoff-only, and update the outcome docs to match. - Clamp Phase 2 retry bookkeeping at zero when failed attempts are recorded. ## Verification - Added `phase2_global_lock_can_be_claimed_after_retry_budget_is_exhausted` to cover the exhausted-budget lock claim path. - Ran `cargo test -p codex-state`.	2026-04-27 14:58:11 +02:00
jif-oai	01ab25dbb5	feat: use git-backed workspace diffs for memory consolidation (#18982 ) ## Why This PR make the `morpheus` agent (memory phase 2) use a git diff to start it's consolidation. The workflow is the following: 1. The agent acquire a lock 2. If `.codex/memories` does not exist or is not a git root, initialize everything (and make a first empty commit) 3. Update `raw_memories.md` and `rollout_summaries/` as before. Basically we select max N phase 1 memories based on a given policy 4. We use git (`gix`) to get a diff between the current state of `.codex/memories` and the last commit. 5. Dump the diff in `phase2_workspace_diff.md` 6. Spawn `morpheus` and point it to `phase2_workspace_diff.md` 7. Wait for `morpheus` to be done 8. Re-create a new `.git` and make one single commit on it. We do this because we don't want to preserve history through `.git` and this is cheap anyway 9. We release the lock On top of this, we keep the retry policies etc etc The goals of this new workflow are: * Better support of any memory extensions such as `chronicle` * Allow the user to manually edit memories and this will be considered by the phase 2 agent As a follow-up we will need to add support for user's edition while `morpheus` is running ## What Changed - Added memory workspace helpers that prepare the git baseline, compute the diff, write `phase2_workspace_diff.md`, and reset the baseline after successful consolidation. - Updated Phase 2 to sync current inputs into `raw_memories.md` and `rollout_summaries/`, prune old extension resources, skip clean workspaces, and run the consolidation subagent only when the workspace has changes. - Tightened Phase 2 job ownership around long-running consolidation with heartbeats and an ownership check before resetting the baseline. - Simplified the prompt and state APIs so DB watermarks are bookkeeping, while workspace dirtiness decides whether consolidation work exists. - Updated the memory pipeline README and tests for workspace diffs, extension-resource cleanup, pollution-driven forgetting, selection ranking, and baseline persistence. ## Verification - Added/updated coverage in `core/src/memories/tests.rs`, `core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`, and `core/tests/suite/memories.rs`. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-27 14:32:44 +02:00
Andrey Mishchenko	35bc6e3d01	Delete unused ResponseItem::Message.end_turn (#19605 ) This field is unused. Delete it.	2026-04-26 17:18:09 -07:00
Eric Traut	0ee737cea6	Add goal persistence foundation (1 / 5) (#18073 ) Adds the persisted goal foundation for the rest of the stack. This PR is intentionally limited to feature flag and state-layer behavior; app-server APIs, model tools, runtime continuation, and TUI UX are layered in later PRs. ## Why Goal mode needs durable thread-level state before clients or model tools can safely build on it. The state layer needs to know whether a goal exists, what objective it tracks, whether it is active, paused, budget-limited, or complete, and how much time/token usage has already been accounted. ## What changed - Added the `goals` feature flag and generated config schema entry. - Added the `thread_goals` state table and Rust model for persisted thread goals. - Added state runtime APIs for creating, replacing, updating, deleting, and accounting goal usage. - Added `goal_id`-based stale update protection so an old goal update cannot overwrite a replacement. - Kept this PR scoped to persistence and state runtime behavior, with no app-server, model-facing, continuation, or TUI behavior yet. ## Verification - Added state runtime coverage for goal creation, replacement, stale update protection, status transitions, token-budget behavior, and usage accounting.	2026-04-24 20:51:38 -07:00
Rasmus Rygaard	5378cccd8a	Refactor log DB into LogWriter interface (#19234 ) ## Why This prepares feedback log capture for a future remote app-server hook sink without changing the current local SQLite upload path. The important boundary is now intentionally small: a log sink is a tracing `Layer` that can also flush entries it has accepted. That keeps the existing SQLite implementation simple while giving the upcoming gRPC sink a place to fit beside it. SQLite and gRPC have different worker/write semantics, so this PR avoids introducing a shared buffered-sink abstraction and instead lets each `LogWriter` own the buffering mechanics it needs. ## What Changed - Added `LogSinkQueueConfig` with the existing local defaults: queue capacity `512`, batch size `128`, and flush interval `2s`. - Added `LogDbLayer::start_with_config(...)` while preserving `LogDbLayer::start(...)` and `log_db::start(...)` defaults. - Introduced the `LogWriter` trait as the minimal shared interface: `tracing_subscriber::Layer` plus `flush()`. - Made `LogDbLayer` implement `LogWriter`. - Kept tracing event formatting inside `LogDbLayer`; it still creates one `LogEntry` per tracing event before queueing it for SQLite. - Kept normal event capture best-effort and non-blocking via bounded `try_send`. ## Behavior Notes This does not change the SQLite schema, retention behavior, `/feedback/upload`, or Sentry upload behavior. Normal log events still drop when the queue is full; explicit `flush()` still waits for queue capacity and receiver processing before returning. ## Verification - `cargo test -p codex-state log_db` - `cargo test -p codex-state` - `just fix -p codex-state` The added tests cover configured batch-size flushing, configured interval flushing, queue-full drops, and the flush barrier semantics.	2026-04-24 16:27:39 -07:00
Ruslan Nigmatullin	19badb0be2	app-server: persist device key bindings in sqlite (#19206 ) ## Why Device-key providers should only own platform key material. The account/client binding used to authorize a signing payload is app-server state, and keeping that state in provider-specific metadata makes the same check harder to audit and harder to share across platform implementations. Persisting the binding in the shared state database gives the device-key crate a platform-neutral source of truth before it asks a provider to sign. It also lets app-server move potentially blocking key operations off the main message processor path, which matters once providers may wait for OS authentication prompts. ## What changed - Add a `device_key_bindings` state migration plus `StateRuntime` helpers keyed by `key_id`. - Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key` and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`. - Keep provider calls behind async store methods and run the synchronous provider work through `spawn_blocking`. - Wire app-server device-key RPC handling to the SQLite-backed binding store and spawn response/error delivery tasks for device-key requests. - Run the turn-start tracing test on the existing larger current-thread test harness after the larger async surface made the default test stack too small locally. ## Validation - `cargo test -p codex-device-key` - `cargo test -p codex-state device_key` - `cargo test -p codex-state` - `cargo test -p codex-app-server device_key` - `cargo test -p codex-app-server message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans` - `cargo test -p codex-app-server` - `just fix -p codex-device-key` - `just fix -p codex-state` - `just fix -p codex-app-server` - `just bazel-lock-update` - `just bazel-lock-check` - `git diff --check`	2026-04-23 21:55:56 -07:00
Michael Bolin	6ca038bbd1	rollout: persist turn permission profiles (#18281 ) ## Why Resume and reconstruction need to preserve the permissions that were active for each user turn. If rollouts only keep legacy sandbox fields, replay cannot faithfully represent profile-shaped overrides introduced earlier in the stack. ## What changed This records `permission_profile` on user-turn rollout events, reconstructs it through history/state extraction, and updates rollout reconstruction and related fixtures to keep the field explicit. ## Verification - `cargo test -p codex-core --test all permissions_messages -- --nocapture` - `cargo test -p codex-core --test all request_permissions -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18281). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * #18282 * __->__ #18281	2026-04-22 17:00:29 -07:00
acrognale-oai	4f8c58f737	Support multiple cwd filters for thread list (#18502 ) ## Summary - Teach app-server `thread/list` to accept either a single `cwd` or an array of cwd filters, returning threads whose recorded session cwd matches any requested path - Add `useStateDbOnly` as an explicit opt-in fast path for callers that want to answer `thread/list` from SQLite without scanning JSONL rollout files - Preserve backwards compatibility: by default, `thread/list` still scans JSONL rollouts and repairs SQLite state - Wire the new cwd array and SQLite-only options through app-server, local/remote thread-store, rollout listing, generated TypeScript/schema fixtures, proto output, and docs ## Test Plan - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-rollout` - `cargo test -p codex-thread-store` - `cargo test -p codex-app-server thread_list` - `just fmt` - `just fix -p codex-app-server-protocol -p codex-rollout -p codex-thread-store -p codex-app-server` - `cargo build -p codex-cli --bin codex`	2026-04-22 06:10:09 -04:00
efrazer-oai	be75785504	fix: fully revert agent identity runtime wiring (#18757 ) ## Summary This PR fully reverts the previously merged Agent Identity runtime integration from the old stack: https://github.com/openai/codex/pull/17387/changes It removes the Codex-side task lifecycle wiring, rollout/session persistence, feature flag plumbing, lazy `auth.json` mutation, background task auth paths, and request callsite changes introduced by that stack. This leaves the repo in a clean pre-AgentIdentity integration state so the follow-up PRs can reintroduce the pieces in smaller reviewable layers. ## Stack 1. This PR: full revert 2. https://github.com/openai/codex/pull/18871: move Agent Identity business logic into a crate 3. https://github.com/openai/codex/pull/18785: add explicit AgentIdentity auth mode and startup task allocation 4. https://github.com/openai/codex/pull/18811: migrate auth callsites through AuthProvider ## Testing Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.	2026-04-21 14:30:55 -07:00
pash-openai	dc1a8f2190	[tool search] support namespaced deferred dynamic tools (#18413 ) Deferred dynamic tools need to round-trip a namespace so a tool returned by `tool_search` can be called through the same registry key that core uses for dispatch. This change adds namespace support for dynamic tool specs/calls, persists it through app-server thread state, and routes dynamic tool calls by full `ToolName` while still sending the app the leaf tool name. Deferred dynamic tools must provide a namespace; non-deferred dynamic tools may remain top-level. It also introduces `LoadableToolSpec` as the shared function-or-namespace Responses shape used by both `tool_search` output and dynamic tool registration, so dynamic tools use the same wrapping logic in both paths. Validation: - `cargo test -p codex-tools` - `cargo test -p codex-core tool_search` --------- Co-authored-by: Sayan Sisodiya <sayan@openai.com>	2026-04-21 14:13:08 +08:00
jif-oai	660153b6de	feat: cascade thread archive (#18112 ) Cascade the thread archive endpoint to all the sub-agents in the agent tree Fix: https://github.com/openai/codex/issues/17867 --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-20 23:38:18 +01:00
jif-oai	e1c289e11b	feat: log client use min log level (#18661 ) In the log client, use the log level filter as a minimum severity instead of exact match --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-20 14:40:39 +01:00
Adrian	e5b52a3caa	Persist and prewarm agent tasks per thread (#17978 ) ## Summary - persist registered agent tasks in the session state update stream so the thread can reuse them - prewarm task registration once identity registration succeeds, while keeping startup failures best-effort - isolate the session-side task lifecycle into a dedicated module so AgentIdentityManager and RegisteredAgentTask do not leak across as many core layers ## Testing - cargo test -p codex-core startup_agent_task_prewarm - cargo test -p codex-core cached_agent_task_for_current_identity_clears_stale_task - cargo test -p codex-core record_initial_history_	2026-04-19 15:45:28 -07:00
David de Regt	eaf78e43f2	Add sorting/backwardsCursor to thread/list and new thread/turns/list api (#17305 ) To improve performance of UI loads from the app, add two main improvements: 1. The `thread/list` api now gets a `sortDirection` request field and a `backwardsCursor` to the response, which lets you paginate forwards and backwards from a window. This lets you fetch the first few items to display immediately while you paginate to fill in history, then can paginate "backwards" on future loads to catch up with any changes since the last UI load without a full reload of the entire data set. 2. Added a new `thread/turns/list` api which also has sortDirection and backwardsCursor for the same behavior as `thread/list`, allowing you the same small-fetch for immediate display followed by background fill-in and resync catchup.	2026-04-17 11:49:02 -07:00
viyatb-oai	6862b9c745	feat(permissions): add glob deny-read policy support (#15979 ) ## Summary - adds first-class filesystem policy entries for deny-read glob patterns - parses config such as :project_roots { "*/.env" = "none" } into pattern entries - enforces deny-read patterns in direct read/list helpers - fails closed for sandbox execution until platform backends enforce glob patterns in #18096 - preserves split filesystem policy in turn context only when it cannot be reconstructed from legacy sandbox policy ## Stack 1. This PR - glob deny-read policy/config/direct-tool support 2. #18096 - macOS and Linux sandbox enforcement 3. #17740 - managed deny-read requirements ## Verification - just fmt - cargo check -p codex-core -p codex-sandboxing --tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-16 10:31:51 -07:00
David de Regt	6adba99f4d	Stabilize Bazel tests (timeout tweaks and flake fixes) (#17791 )	2026-04-16 07:57:51 -07:00
jif-oai	d4223091d0	fix: windows flake (#18127 ) Fix `sqlite_feedback_logs_match_feedback_formatter_shape` by explicitly flushing the async log DB layer before querying SQLite.	2026-04-16 13:52:21 +01:00
jif-oai	5e544be3c9	chore: do not disable memories for past rollouts on reset (#17919 )	2026-04-15 12:05:39 +01:00
David de Regt	4f2fc3e3fa	Moving updated-at timestamps to unique millisecond times (#17489 ) To allow the ability to have guaranteed-unique cursors, we make two important updates: * Add new updated_at_ms and created_at_ms columns that are in millisecond precision * Guarantee uniqueness -- if multiple items are inserted at the same millisecond, bump the new one by one millisecond until it becomes unique This lets us use single-number cursors for forwards and backwards paging through resultsets and guarantee that the cursor is a fixed point to do (timestamp > cursor) and get new items only. This updated implementation is backwards-compatible since multiple appservers can be running and won't handle the previous method well.	2026-04-14 11:55:34 -04:00
neil-oai	a92a5085bd	Forward app-server turn clientMetadata to Responses (#16009 ) ## Summary App-server v2 already receives turn-scoped `clientMetadata`, but the Rust app-server was dropping it before the outbound Responses request. This change keeps the fix lightweight by threading that metadata through the existing turn-metadata path rather than inventing a new transport. ## What we're trying to do and why We want turn-scoped metadata from the app-server protocol layer, especially fields like Hermes/GAAS run IDs, to survive all the way to the actual Responses API request so it is visible in downstream websocket request logging and analytics. The specific bug was: - app-server protocol uses camelCase `clientMetadata` - Responses transport already has an existing turn metadata carrier: `x-codex-turn-metadata` - websocket transport already rewrites that header into `request.request_body.client_metadata["x-codex-turn-metadata"]` - but the Rust app-server never parsed or stored `clientMetadata`, so nothing from the app-server request was making it into that existing path This PR fixes that without adding a new header or a second metadata channel. ## How we did it ### Protocol surface - Add optional `clientMetadata` to v2 `TurnStartParams` and `TurnSteerParams` - Regenerate the JSON schema / TypeScript fixtures - Update app-server docs to describe the field and its behavior ### Runtime plumbing - Add a dedicated core op for app-server user input carrying turn-scoped metadata: `Op::UserInputWithClientMetadata` - Wire `turn/start` and `turn/steer` through that op / signature path instead of dropping the metadata at the message-processor boundary - Store the metadata in `TurnMetadataState` ### Transport behavior - Reuse the existing serialized `x-codex-turn-metadata` payload - Merge the new app-server `clientMetadata` into that JSON additively - Do not replace built-in reserved fields already present in the turn metadata payload - Keep websocket behavior unchanged at the outer shape level: it still sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON string now contains the merged fields - Keep HTTP fallback behavior unchanged except that the existing `x-codex-turn-metadata` header now includes the merged fields too ### Request shape before / after Before, a websocket `response.create` looked like: ```json { "type": "response.create", "client_metadata": { "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}" } } ``` Even if the app-server caller supplied `clientMetadata`, it was not represented there. After, the same request shape is preserved, but the serialized payload now includes the new turn-scoped fields: ```json { "type": "response.create", "client_metadata": { "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}" } } ``` ## Validation ### Targeted tests added / updated - protocol round-trip coverage for `clientMetadata` on `turn/start` and `turn/steer` - protocol round-trip coverage for `Op::UserInputWithClientMetadata` - `TurnMetadataState` merge test proving client metadata is added without overwriting reserved built-in fields - websocket request-shape test proving outbound `response.create` contains merged metadata inside `client_metadata["x-codex-turn-metadata"]` - app-server integration tests proving: - `turn/start` forwards `clientMetadata` into the outbound Responses request path - websocket warmup + real turn request both behave correctly - `turn/steer` updates the follow-up request metadata ### Commands run - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-protocol` - `cargo test -p codex-core turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields --lib` - `cargo test -p codex-core --test all responses_websocket_preserves_custom_turn_metadata_fields` - `cargo test -p codex-app-server --test all client_metadata` - `cargo test -p codex-app-server --test all turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2 -- --nocapture` - `just fmt` - `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol -p codex-app-server` - `just fix -p codex-exec -p codex-tui-app-server` - `just argument-comment-lint` ### Full suite note `cargo test` in `codex-rs` still fails in: - `suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request` I verified that same failure on a clean detached `HEAD` worktree with an isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.	2026-04-09 11:52:37 -07:00
jif-oai	12f0e0b0eb	chore: merge name and title (#17116 ) Merge title and name concept to leverage the sqlite title column and have more efficient queries --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-09 18:44:26 +01:00
Dylan Hurd	6c36e7d688	fix(app-server) revert null instructions changes (#17047 )	2026-04-07 15:18:34 -07:00
Ahmed Ibrahim	cd591dc457	Preserve null developer instructions (#16976 ) Preserve explicit null developer-instruction overrides across app-server resume and fork flows.	2026-04-07 09:32:14 -07:00
Ruslan Nigmatullin	73dab2046f	app-server: Add transport for remote control (#15951 )	2026-04-06 14:55:59 -07:00
Owen Lin	9bb813353e	fix(sqlite): don't hard fail migrator if DB is newer (#16924 ) ## Description This PR makes the SQLite state runtime tolerate databases that have already been migrated by a newer Codex binary. Today, if an older CLI sees migration versions in `_sqlx_migrations` that it doesn't know about, startup fails. This change relaxes that check for the runtime migrators we use in `codex-state` so older binaries can keep opening the DB in that case. ## Why We can end up with mixed-version CLIs running against the same local state DB. In that setup, treating "the database is ahead of me" as a hard error is unnecessarily strict and breaks the older client even when the migration history is otherwise fine. ## Follow-up We still clean up versioned `state_.sqlite` and `logs_.sqlite` files during init, so older binaries can treat newer DB files as legacy. That should probably be tightened separately if we want mixed-version local usage to be fully safe.	2026-04-06 12:16:31 -07:00
Michael Bolin	75365bf718	fix: remove unused import (#16449 ) https://github.com/openai/codex/pull/16433 resulted in an unused import inside `mod tests`. This is flagged by `cargo clippy --tests`, which is run as part of https://github.com/openai/codex/actions/workflows/rust-ci-full.yml, but is not caught by our current Bazel setup for clippy. Fixing this ASAP to get https://github.com/openai/codex/actions/workflows/rust-ci-full.yml green again, but am looking at fixing the Bazel workflow in parallel.	2026-04-01 09:14:29 -07:00
jif-oai	f839f3ff2e	feat: auto vaccum state DB (#16434 ) Start with a full vaccum the first time, then auto-vaccum incremental	2026-04-01 16:46:21 +02:00
jif-oai	c846a57d03	chore: drop log DB (#16433 ) Drop the log table from the state DB	2026-04-01 15:49:17 +02:00
jif-oai	868ac158d7	feat: log db better maintenance (#16330 ) Run a DB clean-up more frequently with an incremental `VACCUM` in it	2026-03-31 19:15:44 +02:00
Michael Bolin	61dfe0b86c	chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054 ) ## Why `argument-comment-lint` was green in CI even though the repo still had many uncommented literal arguments. The main gap was target coverage: the repo wrapper did not force Cargo to inspect test-only call sites, so examples like the `latest_session_lookup_params(true, ...)` tests in `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path. This change cleans up the existing backlog, makes the default repo lint path cover all Cargo targets, and starts rolling that stricter CI enforcement out on the platform where it is currently validated. ## What changed - mechanically fixed existing `argument-comment-lint` violations across the `codex-rs` workspace, including tests, examples, and benches - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to `--all-targets` unless the caller explicitly narrows the target set - fixed both wrappers so forwarded cargo arguments after `--` are preserved with a single separator - documented the new default behavior in `tools/argument-comment-lint/README.md` - updated `rust-ci` so the macOS lint lane keeps the plain wrapper invocation and therefore enforces `--all-targets`, while Linux and Windows temporarily pass `-- --lib --bins` That temporary CI split keeps the stricter all-targets check where it is already cleaned up, while leaving room to finish the remaining Linux- and Windows-specific target-gated cleanup before enabling `--all-targets` on those runners. The Linux and Windows failures on the intermediate revision were caused by the wrapper forwarding bug, not by additional lint findings in those lanes. ## Validation - `bash -n tools/argument-comment-lint/run.sh` - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh` - shell-level wrapper forwarding check for `-- --lib --bins` - shell-level wrapper forwarding check for `-- --tests` - `just argument-comment-lint` - `cargo test` in `tools/argument-comment-lint` - `cargo test -p codex-terminal-detection` ## Follow-up - Clean up remaining Linux-only target-gated callsites, then switch the Linux lint lane back to the plain wrapper invocation. - Clean up remaining Windows-only target-gated callsites, then switch the Windows lint lane back to the plain wrapper invocation.	2026-03-27 19:00:44 -07:00
jif-oai	303d0190c5	feat: add multi-thread log query (#15776 ) Required for multi-agent v2	2026-03-25 16:30:04 +00:00
Ahmed Ibrahim	0f957a93cd	Move git utilities into a dedicated crate (#15564 ) - create `codex-git-utils` and move the shared git helpers into it with file moves preserved for diff readability - move the `GitInfo` helpers out of `core` so stacked rollout work can depend on the shared crate without carrying its own git info module --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-24 13:26:23 -07:00
jif-oai	79ad7b247b	feat: change multi-agent to use path-like system instead of uuids (#15313 ) This PR add an URI-based system to reference agents within a tree. This comes from a sync between research and engineering. The main agent (the one manually spawned by a user) is always called `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example where `agent_1` is chosen by the model. Any agent can contact any agents using the path. Paths can be used either in absolute or relative to the calling agents Resume is not supported for now on this new path	2026-03-20 18:23:48 +00:00
jif-oai	70cdb17703	feat: add graph representation of agent network (#15056 ) Add a representation of the agent graph. This is now used for: * Cascade close agents (when I close a parent, it close the kids) * Cascade resume (oposite) Later, this will also be used for post-compaction stuffing of the context Direct fix for: https://github.com/openai/codex/issues/14458	2026-03-19 10:21:25 +00:00

1 2 3

136 Commits