## Why
Users and support need a single command that captures the local Codex
runtime, configuration, auth, terminal, network, and state shape without
asking the user to know which diagnostic depth to choose first. `codex
doctor` now runs the useful checks by default and makes the detailed
human output the default because the command is usually run when someone
already needs context.
The command also targets concrete support failure modes we have seen
while iterating on the design:
- update-target mismatches like #21956, where the installed package
manager target can differ from the running executable
- terminal and multiplexer issues that depend on `TERM`, tmux/zellij
state, color handling, and TTY metadata
- provider-specific HTTP/WebSocket connectivity, including ChatGPT
WebSocket handshakes and API-key/provider endpoint reachability
- local state/log SQLite integrity problems and large rollout
directories
- feedback reports that need an attached, redacted diagnostic snapshot
without asking the user to run a second command
## What Changed
- Adds `codex doctor` as a grouped CLI diagnostic report with default
detailed output and `--summary` for the compact view.
- Adds stable report sections for Environment, Configuration, Updates,
Connectivity, and Background Server, plus a top Notes block that
promotes anomalies such as available updates, large rollout directories,
optional MCP issues, and mixed auth signals.
- Adds runtime provenance, install consistency, bundled/system search
readiness, terminal/multiplexer metadata, `config.toml` parse status,
auth mode details, sandbox details, feature flag summaries, update
cache/latest-version state, app-server daemon state, SQLite integrity
checks, rollout statistics, and provider-aware network diagnostics.
- Adds ChatGPT WebSocket diagnostics that report the negotiated HTTP
upgrade as `HTTP 101 Switching Protocols` and include timeout, DNS,
auth, and provider context in detailed output.
- Makes reachability provider-aware: API-key OpenAI setups check the API
endpoint, ChatGPT auth checks the ChatGPT path, and custom/AWS/local
providers check configured HTTP endpoints when available.
- Adds structured, redacted JSON output where `checks` is keyed by check
id and `details` is a key/value object for support tooling.
- Integrates doctor with feedback uploads by attaching a best-effort
`codex-doctor-report.json` report and adding derived Sentry tags for
overall status and failing/warning checks.
- Updates the TUI feedback consent copy so users can see that the doctor
report is included when logs/diagnostics are uploaded.
- Updates the CLI bug issue template to ask reporters for `codex doctor
--json` and render pasted reports as JSON.
## Example Output
The examples below are sanitized from local smoke runs with `--no-color`
so the structure is reviewable in plain text.
### `codex doctor`
```text
Codex Doctor v0.0.0 · macos-aarch64
Notes
↑ updates 0.130.0 available (current 0.0.0, dismissed 0.128.0)
⚠ rollouts 1,526 active files · 2.53 GB on disk
⚠ mcp MCP configuration has optional issues
⚠ auth mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode
─────────────────────────────────────────────────────────────
Environment
✓ runtime local debug build
version 0.0.0
install method other
commit unknown
executable ~/code/codex.fcoury-doct…x-rs/target/debug/codex
✓ install consistent
context other
managed by npm: no · bun: no · package root —
PATH entries (2) ~/.local/share/mise/installs/node/24/bin/codex
~/.local/share/mise/shims/codex
✓ search ripgrep 15.1.0 (system, `rg`)
✓ terminal Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color
terminal Ghostty
TERM_PROGRAM ghostty
terminal version 1.3.2-main-+b0f827665
TERM xterm-256color
multiplexer tmux 3.6a
tmux extended-keys on
tmux allow-passthrough on
tmux set-clipboard on
✓ state databases healthy
CODEX_HOME ~/.codex (dir)
state DB ~/.codex/state_5.sqlite (file) · integrity ok
log DB ~/.codex/logs_2.sqlite (file) · integrity ok
active rollouts 1,526 files · 2.53 GB (avg 1.70 MB)
archived rollouts 8 files · 3.84 MB (avg 491.11 KB)
Configuration
✓ config loaded
model gpt-5.5 · openai
cwd ~/code/codex.fcoury-doctor/codex-rs
config.toml ~/.codex/config.toml
config.toml parse ok
MCP servers 1
feature flags 36 enabled · 7 overridden (full list with --all)
overrides code_mode, code_mode_only, memories, chronicle, goals, remote_control, prevent_idle_sleep
✓ auth auth is configured
auth storage mode File
auth file ~/.codex/auth.json
auth env vars present OPENAI_API_KEY
stored auth mode chatgpt
stored API key false
stored ChatGPT tokens true
stored agent identity false
⚠ mcp MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server.
configured servers 1
disabled servers 0
streamable_http servers 1
optional reachability openaiDeveloperDocs: https://developers.openai.com/mcp (HEAD connect failed; GET connect failed)
✓ sandbox restricted fs + restricted network · approval OnRequest
approval policy OnRequest
filesystem sandbox restricted
network sandbox restricted
Connectivity
✓ network network-related environment looks readable
✓ websocket connected (HTTP 101 Switching Protocols) · 15s timeout
model provider openai
provider name OpenAI
wire API responses
supports websockets true
connect timeout 15000 ms
auth mode chatgpt
endpoint wss://chatgpt.com/backend-api/<redacted>
DNS 2 IPv4, 2 IPv6, first IPv6
handshake result HTTP 101 Switching Protocols
✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration.
reachability mode API key auth
openai API https://api.openai.com/v1 connect failed (required)
Background Server
○ app-server not running (ephemeral mode)
─────────────────────────────────────────────────────────────
11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed
--summary compact output --all expand truncated lists
--json redacted report
```
### `codex doctor --summary`
```text
Codex Doctor v0.0.0 · macos-aarch64
Notes
↑ updates 0.130.0 available (current 0.0.0, dismissed 0.128.0)
⚠ rollouts 1,526 active files · 2.53 GB on disk
⚠ mcp MCP configuration has optional issues
⚠ auth mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode
─────────────────────────────────────────────────────────────
Environment
✓ runtime local debug build
✓ install consistent
✓ search ripgrep 15.1.0 (system, `rg`)
✓ terminal Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color
✓ state databases healthy
Configuration
✓ config loaded
✓ auth auth is configured
⚠ mcp MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server.
✓ sandbox restricted fs + restricted network · approval OnRequest
Updates
✓ updates update configuration is locally consistent
Connectivity
✓ network network-related environment looks readable
✓ websocket connected (HTTP 101 Switching Protocols) · 15s timeout
✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration.
Background Server
○ app-server not running (ephemeral mode)
─────────────────────────────────────────────────────────────
11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed
Run codex doctor without --summary for detailed diagnostics.
--all expand truncated lists --json redacted report
```
### `codex doctor --json` shape
```json
{
"schema_version": 1,
"overall_status": "fail",
"checks": {
"runtime.provenance": {
"id": "runtime.provenance",
"category": "Environment",
"status": "ok",
"summary": "local debug build",
"details": {
"version": "0.0.0",
"install method": "other",
"commit": "unknown"
}
},
"sandbox.helpers": {
"id": "sandbox.helpers",
"category": "Configuration",
"status": "ok",
"summary": "restricted fs + restricted network · approval OnRequest",
"details": {
"approval policy": "OnRequest",
"filesystem sandbox": "restricted",
"network sandbox": "restricted"
}
}
}
}
```
### `/feedback` new sentry attachment
<img width="938" height="798" alt="CleanShot 2026-05-13 at 15 36 14"
src="https://github.com/user-attachments/assets/715e62e0-d7b4-4fea-a35a-fd5d5d33c4c0"
/>
### New section in CLI issue template
<img width="1164" height="435" alt="CleanShot 2026-05-13 at 15 47 24"
src="https://github.com/user-attachments/assets/9081dc25-a28c-4afa-8ba1-e299c2b4031d"
/>
## How to Test
1. Run `cargo run --bin codex -- doctor --no-color`.
2. Confirm the detailed report is the default and includes promoted
Notes, grouped sections, terminal details, state DB integrity, rollout
stats, provider reachability, WebSocket diagnostics, and app-server
status.
3. Run `cargo run --bin codex -- doctor --summary --no-color`.
4. Confirm the compact view keeps the same sections and summary counts
but omits detailed key/value rows.
5. Run `cargo run --bin codex -- doctor --json`.
6. Confirm the output is redacted JSON, `checks` is an object keyed by
check id, and each check's `details` is a key/value object.
7. Preview the CLI bug issue template and confirm the `Codex doctor
report` field appears after the terminal field, asks for `codex doctor
--json`, and renders pasted output as JSON.
8. Start a feedback flow that includes logs.
9. Confirm the upload consent copy lists `codex-doctor-report.json`
alongside the log attachments.
Targeted tests:
- `cargo test -p codex-cli doctor`
- `cargo test -p codex-app-server
doctor_report_tags_summarize_status_counts`
- `cargo test -p codex-feedback`
- `cargo test -p codex-tui feedback_view`
- `just argument-comment-lint`
- `git diff --check`
## Summary
- add SQLite init, backfill-gate, and fallback telemetry without
introducing a cross-cutting state-db access wrapper
- install one process-scoped telemetry sink after OTEL startup and let
low-level state/rollout paths emit through it directly
- add process-start metrics for the process owners that initialize
SQLite
---------
Co-authored-by: Owen Lin <owen@openai.com>
## Why
Users have requested the ability to edit a goal's objective after a goal
has been created. This PR exposes a new `/goal edit` command in the TUI
to address this request.
In the process of implementing this, I also noticed an existing bug in
the goal runtime. When a goal's objective is updated through the
`thread/goal/set` app server API, the goal runtime didn't emit a new
steering prompt to tell the agent about the new objective. This PR also
fixes this hole.
## What Changed
- Adds `/goal edit` in the TUI, opening an edit box prefilled with the
current goal objective.
- Keeps active and paused goals in their current state, resets completed
goals to active, keeps budget-limited goals budget-limited, and
preserves the existing token budget.
- Changes the existing `thread/goal/set` behavior so editing an
objective preserves goal accounting instead of resetting it. The older
reset-on-new-objective behavior was left over from before
`thread/goal/clear`; clients that need to reset accounting can now clear
the existing goal and create a new one.
- Reuses the existing goal set API path; this does not add or change
app-server protocol surface area.
- Adds a dedicated goal runtime steering prompt when an externally
persisted goal mutation changes the objective, so active turns receive
the updated objective.
## Validation
- Make sure `/goal edit` returns an error if no goal currently exists
- Make sure `/goal edit` displays an edit box that can be optionally
canceled with no side effects
- Make sure that an edited goal results in a steer so the agent starts
pursuing the new objective
- Make sure the new objective is reflected in the goal if you use
`/goal` to display the goal summary
- Make sure that `/goal edit` doesn't reset the token budget, time/token
accounting on the updated goal
Fixes#20792
## Why
`/goal`-first threads are valid resumable threads, but they can be
missing from `codex resume` and app recents because discovery depends on
metadata derived from a normal first user message.
PR #21489 attempted to fix this by using the goal objective as
`first_user_message`. Review feedback pointed out that
`first_user_message` does more than provide visible text today: it gates
listing, supplies preview text, and participates in deciding whether a
later title should surface as a distinct thread name. Reusing it for the
goal objective could leave a `/goal`-first thread with
`first_user_message=<goal>` and `title=<later prompt>`, even though the
goal should only provide the initial visible preview.
This PR follows that feedback by and keeps the `first_user_message` as
is but introduces a new `preview` field to separate concerns. The
`preview` field is populated from the first user message or the goal
objective. We can extend it in the future to include other sources.
## What Changed
- Added internal thread `preview` metadata in `codex-state`, including a
SQLite migration that backfills from `first_user_message` and from
existing `thread_goals` objectives when needed.
- Treated `ThreadGoalUpdated` as preview-bearing metadata so goal-first
threads can be listed and searched without mutating
`first_user_message`.
- Updated rollout listing, state queries, thread-store conversion, and
app-server mapping to use preview metadata while continuing to expose
the existing public `preview` field.
- Preserved title/name distinctness behavior around literal
`first_user_message`, so a later normal prompt after `/goal` does not
surface as a separate name just because the goal supplied the initial
preview.
- Preserved compatibility for older/internal metadata writes by deriving
preview from `first_user_message` when explicit preview metadata is
absent.
## Verification
- Manually verified that a thread that starts with a `/goal <objective>`
shows up in the resume picker.
## Why
We'd like SQLite state to become required and load-bearing. As a first
step, let's remove the mechanism that allows us to blow away the SQLite
DB on a version bump, and instead rely on graceful migrations.
The original motivation
([PR](https://github.com/openai/codex/pull/10623)) behind this mechanism
was to care less about backwards compatibility while SQLite was being
landed, but I'd say it's quite important now to keep the data in it.
## What changed
- Make `STATE_DB_FILENAME` and `LOGS_DB_FILENAME` the full canonical
filenames: `state_5.sqlite` and `logs_2.sqlite`.
- Remove `STATE_DB_VERSION` / `LOGS_DB_VERSION` and the helper that
constructed filenames from versions.
- Stop `StateRuntime::init` from scanning for or deleting older SQLite
DB filenames at startup.
- Delete the tests that encoded legacy state/logs DB deletion behavior.
## Verification
- `cargo test -p codex-state`
## Summary
- make `thread_source` an explicit optional thread-level field on
`thread/start`, `thread/fork`, and returned thread payloads
- persist `thread_source` in rollout/session metadata so resumed live
threads retain the original value
- replace the old best-effort `session_source` -> `thread_source`
mapping with an explicit caller-supplied analytics classification
## Why
Before this change, analytics `thread_source` was populated by a
best-effort mapping from `session_source`. `session_source` describes
the runtime/client surface, not the actual thread-level origin, so that
projection was not accurate enough to distinguish cases such as `user`,
`subagent`, `memory_consolidation`, and future thread origins reliably.
Making `thread_source` explicit keeps one thread-level analytics field
while letting callers provide the real classification directly instead
of recovering it indirectly from `session_source`.
## Impact
For new analytics events, `thread_source` now reflects the explicit
thread-level classification supplied by the caller rather than an
inferred value derived from `session_source`. Existing protocol fields
remain optional; callers that omit `threadSource` now produce `null`
instead of a best-effort inferred value.
## Validation
- `just write-app-server-schema`
- `cargo test -p codex-analytics -p codex-core -p
codex-app-server-protocol --no-run`
- `cargo test -p codex-app-server-protocol
generated_ts_optional_nullable_fields_only_in_params`
- `cargo test -p codex-analytics
thread_initialized_event_serializes_expected_shape`
- `cargo test -p codex-core
resume_stopped_thread_from_rollout_preserves_thread_source`
## Why
Thread names are app-server metadata now, backed by the thread store and
sqlite state database. Keeping a core `SetThreadName` op plus a rollout
`thread_name_updated` event made rename persistence live in the wrong
layer and required historical replay support for an event that new
app-server flows should not write.
## What changed
- Removed `Op::SetThreadName` and `EventMsg::ThreadNameUpdated` from the
core protocol and deleted the core handler path that appended rename
events to rollouts.
- Updated app-server `thread/name/set` so both loaded and unloaded
threads write through thread-store metadata and app-server emits
`thread/name/updated` notifications.
- Updated local thread-store name metadata updates to write sqlite title
metadata and the legacy thread-name index without appending rollout
events.
- Removed state extraction and rollout handling for the deleted
thread-name event.
## Validation
- `cargo test -p codex-app-server thread_name_updated_broadcasts`
- `cargo test -p codex-app-server
thread_name_set_is_reflected_in_read_list_and_resume`
- `cargo test -p codex-thread-store
update_thread_metadata_sets_name_on_active_rollout_and_indexes_name`
- `cargo test -p codex-state`
- `cargo check -p codex-mcp-server -p codex-rollout-trace`
- `just fix -p codex-app-server -p codex-thread-store -p codex-state -p
codex-mcp-server -p codex-rollout-trace`
## Docs
No external documentation update is expected for this internal ownership
change.
- Route `thread/metadata/update` through
`ThreadStore::update_thread_metadata`.
- Add `LocalThreadStore` git metadata patch support for set, partial
update, and clear semantics.
- Add some unit tests for the new thread store code
- Remove a lot of dead code/tests!
## Summary
Persisted subagent parent/child topology currently leaks through
`StateRuntime`'s SQLite-specific thread-spawn helpers. This PR
introduces a narrow `AgentGraphStore` boundary so follow-up work can
route graph operations through a local or remote store without coupling
orchestration code directly to the state DB graph API.
## Changes
- Adds the new `codex-agent-graph-store` crate.
- Defines a flat `AgentGraphStore` trait for the v1 graph surface:
upsert edge, set edge status, list direct children, and list
descendants.
- Adds public graph types for `ThreadSpawnEdgeStatus`,
`AgentGraphStoreError`, and `AgentGraphStoreResult`.
- Implements `LocalAgentGraphStore` on top of an existing
`codex_state::StateRuntime`, preserving today's SQLite-backed
`thread_spawn_edges` behavior.
- Registers the crate in Cargo/Bazel metadata.
This PR only adds the local contract and implementation; call-site
migration and the remote gRPC store are left to the follow-up PRs in the
stack.
## Testing
- `cargo test -p codex-agent-graph-store`
The new unit tests cover local parity with the existing `StateRuntime`
graph methods, `Open`/`Closed` filtering, status updates, and stable
breadth-first descendant ordering.
## Why
The log DB writer batches tracing events before inserting them into
SQLite, but `tokio::time::interval` produces an immediate first tick.
That meant the inserter could flush the first accepted log entry before
`batch_size` was reached, making
`configured_batch_size_flushes_without_explicit_flush` timing-sensitive
in CI.
## What Changed
- Consume the interval's startup tick before entering the inserter loop,
so interval flushing starts after the configured delay.
- Remove the test's startup sleep, which was masking the race instead of
proving the batch-size behavior.
## Validation
- `cargo test -p codex-state`
- `cargo test -p codex-state
configured_batch_size_flushes_without_explicit_flush` passed 3
consecutive focused runs
- PR checks passed across `rust-ci`, Bazel, `ci`, `sdk`, `cargo-deny`,
Codespell, blob-size policy, and CLA
## Why
Memory startup was tied to thread lifecycle events such as create, load,
and fork. That can run memory work before a thread receives real user
input, and it makes startup cost scale with thread management instead of
actual turns. Moving the trigger to `thread/sendInput` keeps memory
startup aligned with the first real user turn and lets it use the
current thread config at turn time.
The idea is to prevent ghost cost due to pre-warm triggered by the app
Turn-based startup can also make global phase-2 consolidation easier to
request repeatedly, so this adds a success cooldown and tightens the
default startup scan window.
## What Changed
- Start `codex_memories_write::start_memories_startup_task` after a
non-empty `thread/sendInput` turn is submitted, instead of from thread
create/load/fork paths:
d4a6885b78/codex-rs/app-server/src/codex_message_processor.rs (L6477-L6487)
- Expose `CodexThread::config()` so app-server can pass the live config
into memory startup at turn time.
- Add a six-hour successful-run cooldown for global phase-2
consolidation via `SkippedCooldown`:
d4a6885b78/codex-rs/state/src/runtime/memories.rs (L963-L966)
- Reduce memory startup defaults to at most 2 rollouts over 10 days:
d4a6885b78/codex-rs/config/src/types.rs (L31-L34)
## Verification
Updated the memory runtime coverage around phase-2 reclaim behavior,
including `phase2_global_lock_respects_success_cooldown`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Phase 2 still needs to choose the most relevant stage-1 memory outputs
by usage and recency, but exposing that ranking as the rendered
`raw_memories.md` order creates unnecessary large diff. Usage-count or
timestamp changes can reshuffle otherwise unchanged memories, making the
workspace diff noisy and giving the consolidation prompt a misleading
recency signal from file position.
This fix will reduce token consumption
## What Changed
- Keep the existing top-N Phase 2 selection ranking by `usage_count`,
`last_usage`, `source_updated_at`, and `thread_id`.
- Return the selected rows in stable ascending `thread_id` order before
syncing Phase 2 filesystem inputs.
- Update the memory README, raw memories header, and consolidation
prompt so they describe the stable order and tell the prompt to use
metadata and workspace diffs instead of file order as the recency
signal.
- Adjust the memory runtime tests to use deterministic thread IDs and
assert the stable return order separately from the ranked selection
semantics.
## Test Coverage
- Existing memory runtime tests in
`codex-rs/state/src/runtime/memories.rs` now cover the stable returned
ordering for Phase 2 inputs.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Phase 2 can now claim the global consolidation lock on startup even when
the git-backed memory workspace is already clean. The clean-workspace
path still finalized through the normal Phase 2 success path, which
clears and re-marks `selected_for_phase2` rows. That made no-op startups
perform avoidable writes to `stage1_outputs`, creating unnecessary DB
I/O and contention when no memory files changed.
## What Changed
- Added a preserving-selection Phase 2 finalizer in `codex-state` that
only marks the global job row as succeeded.
- Kept the existing `mark_global_phase2_job_succeeded` behavior for real
consolidation runs, where the selected Phase 2 snapshot must be
rewritten.
- Switched the `succeeded_no_workspace_changes` branch in
`core/src/memories/phase2.rs` to use the preserving-selection finalizer.
- Added a regression test that installs a SQLite trigger on
`stage1_outputs` and verifies the clean finalizer performs zero updates
there.
## Testing
- `cargo test -p codex-state`
- `cargo test -p codex-core memories::tests::phase2`
## Why
The Phase 2 memories job row is only the global lock for the git-backed
memory workspace. Manual memory edits do not enqueue new Stage 1 work,
so a Phase 2 row with `retry_remaining = 0` could be skipped before the
worker ever claimed the lock and generated `phase2_workspace_diff.md`.
That left workspace-only changes unconsolidated after repeated failures,
even when retry backoff had elapsed and the filesystem had real diffable
work.
## What Changed
- Allow `try_claim_global_phase2_job` to claim the Phase 2 lock after
the retry budget is exhausted, while still respecting active `retry_at`
backoff and fresh running leases.
- Treat `SkippedRetryUnavailable` for Phase 2 as backoff-only, and
update the outcome docs to match.
- Clamp Phase 2 retry bookkeeping at zero when failed attempts are
recorded.
## Verification
- Added
`phase2_global_lock_can_be_claimed_after_retry_budget_is_exhausted` to
cover the exhausted-budget lock claim path.
- Ran `cargo test -p codex-state`.
## Why
This PR make the `morpheus` agent (memory phase 2) use a git diff to
start it's consolidation. The workflow is the following:
1. The agent acquire a lock
2. If `.codex/memories` does not exist or is not a git root, initialize
everything (and make a first empty commit)
3. Update `raw_memories.md` and `rollout_summaries/` as before.
Basically we select max N phase 1 memories based on a given policy
4. We use git (`gix`) to get a diff between the current state of
`.codex/memories` and the last commit.
5. Dump the diff in `phase2_workspace_diff.md`
6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
7. Wait for `morpheus` to be done
8. Re-create a new `.git` and make one single commit on it. We do this
because we don't want to preserve history through `.git` and this is
cheap anyway
9. We release the lock
On top of this, we keep the retry policies etc etc
The goals of this new workflow are:
* Better support of any memory extensions such as `chronicle`
* Allow the user to manually edit memories and this will be considered
by the phase 2 agent
As a follow-up we will need to add support for user's edition while
`morpheus` is running
## What Changed
- Added memory workspace helpers that prepare the git baseline, compute
the diff, write `phase2_workspace_diff.md`, and reset the baseline after
successful consolidation.
- Updated Phase 2 to sync current inputs into `raw_memories.md` and
`rollout_summaries/`, prune old extension resources, skip clean
workspaces, and run the consolidation subagent only when the workspace
has changes.
- Tightened Phase 2 job ownership around long-running consolidation with
heartbeats and an ownership check before resetting the baseline.
- Simplified the prompt and state APIs so DB watermarks are bookkeeping,
while workspace dirtiness decides whether consolidation work exists.
- Updated the memory pipeline README and tests for workspace diffs,
extension-resource cleanup, pollution-driven forgetting, selection
ranking, and baseline persistence.
## Verification
- Added/updated coverage in `core/src/memories/tests.rs`,
`core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
and `core/tests/suite/memories.rs`.
---------
Co-authored-by: Codex <noreply@openai.com>
Adds the persisted goal foundation for the rest of the stack. This PR is
intentionally limited to feature flag and state-layer behavior;
app-server APIs, model tools, runtime continuation, and TUI UX are
layered in later PRs.
## Why
Goal mode needs durable thread-level state before clients or model tools
can safely build on it. The state layer needs to know whether a goal
exists, what objective it tracks, whether it is active, paused,
budget-limited, or complete, and how much time/token usage has already
been accounted.
## What changed
- Added the `goals` feature flag and generated config schema entry.
- Added the `thread_goals` state table and Rust model for persisted
thread goals.
- Added state runtime APIs for creating, replacing, updating, deleting,
and accounting goal usage.
- Added `goal_id`-based stale update protection so an old goal update
cannot overwrite a replacement.
- Kept this PR scoped to persistence and state runtime behavior, with no
app-server, model-facing, continuation, or TUI behavior yet.
## Verification
- Added state runtime coverage for goal creation, replacement, stale
update protection, status transitions, token-budget behavior, and usage
accounting.
## Why
This prepares feedback log capture for a future remote app-server hook
sink without changing the current local SQLite upload path. The
important boundary is now intentionally small: a log sink is a tracing
`Layer` that can also flush entries it has accepted.
That keeps the existing SQLite implementation simple while giving the
upcoming gRPC sink a place to fit beside it. SQLite and gRPC have
different worker/write semantics, so this PR avoids introducing a shared
buffered-sink abstraction and instead lets each `LogWriter` own the
buffering mechanics it needs.
## What Changed
- Added `LogSinkQueueConfig` with the existing local defaults: queue
capacity `512`, batch size `128`, and flush interval `2s`.
- Added `LogDbLayer::start_with_config(...)` while preserving
`LogDbLayer::start(...)` and `log_db::start(...)` defaults.
- Introduced the `LogWriter` trait as the minimal shared interface:
`tracing_subscriber::Layer` plus `flush()`.
- Made `LogDbLayer` implement `LogWriter`.
- Kept tracing event formatting inside `LogDbLayer`; it still creates
one `LogEntry` per tracing event before queueing it for SQLite.
- Kept normal event capture best-effort and non-blocking via bounded
`try_send`.
## Behavior Notes
This does not change the SQLite schema, retention behavior,
`/feedback/upload`, or Sentry upload behavior. Normal log events still
drop when the queue is full; explicit `flush()` still waits for queue
capacity and receiver processing before returning.
## Verification
- `cargo test -p codex-state log_db`
- `cargo test -p codex-state`
- `just fix -p codex-state`
The added tests cover configured batch-size flushing, configured
interval flushing, queue-full drops, and the flush barrier semantics.
## Why
Device-key providers should only own platform key material. The
account/client binding used to authorize a signing payload is app-server
state, and keeping that state in provider-specific metadata makes the
same check harder to audit and harder to share across platform
implementations.
Persisting the binding in the shared state database gives the device-key
crate a platform-neutral source of truth before it asks a provider to
sign. It also lets app-server move potentially blocking key operations
off the main message processor path, which matters once providers may
wait for OS authentication prompts.
## What changed
- Add a `device_key_bindings` state migration plus `StateRuntime`
helpers keyed by `key_id`.
- Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key`
and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`.
- Keep provider calls behind async store methods and run the synchronous
provider work through `spawn_blocking`.
- Wire app-server device-key RPC handling to the SQLite-backed binding
store and spawn response/error delivery tasks for device-key requests.
- Run the turn-start tracing test on the existing larger current-thread
test harness after the larger async surface made the default test stack
too small locally.
## Validation
- `cargo test -p codex-device-key`
- `cargo test -p codex-state device_key`
- `cargo test -p codex-state`
- `cargo test -p codex-app-server device_key`
- `cargo test -p codex-app-server
message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
- `cargo test -p codex-app-server`
- `just fix -p codex-device-key`
- `just fix -p codex-state`
- `just fix -p codex-app-server`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `git diff --check`
## Why
Resume and reconstruction need to preserve the permissions that were
active for each user turn. If rollouts only keep legacy sandbox fields,
replay cannot faithfully represent profile-shaped overrides introduced
earlier in the stack.
## What changed
This records `permission_profile` on user-turn rollout events,
reconstructs it through history/state extraction, and updates rollout
reconstruction and related fixtures to keep the field explicit.
## Verification
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
- `cargo test -p codex-core --test all request_permissions --
--nocapture`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18281).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* __->__ #18281
## Summary
- Teach app-server `thread/list` to accept either a single `cwd` or an
array of cwd filters, returning threads whose recorded session cwd
matches any requested path
- Add `useStateDbOnly` as an explicit opt-in fast path for callers that
want to answer `thread/list` from SQLite without scanning JSONL rollout
files
- Preserve backwards compatibility: by default, `thread/list` still
scans JSONL rollouts and repairs SQLite state
- Wire the new cwd array and SQLite-only options through app-server,
local/remote thread-store, rollout listing, generated TypeScript/schema
fixtures, proto output, and docs
## Test Plan
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-rollout`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-app-server thread_list`
- `just fmt`
- `just fix -p codex-app-server-protocol -p codex-rollout -p
codex-thread-store -p codex-app-server`
- `cargo build -p codex-cli --bin codex`
## Summary
This PR fully reverts the previously merged Agent Identity runtime
integration from the old stack:
https://github.com/openai/codex/pull/17387/changes
It removes the Codex-side task lifecycle wiring, rollout/session
persistence, feature flag plumbing, lazy `auth.json` mutation,
background task auth paths, and request callsite changes introduced by
that stack.
This leaves the repo in a clean pre-AgentIdentity integration state so
the follow-up PRs can reintroduce the pieces in smaller reviewable
layers.
## Stack
1. This PR: full revert
2. https://github.com/openai/codex/pull/18871: move Agent Identity
business logic into a crate
3. https://github.com/openai/codex/pull/18785: add explicit
AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate auth callsites
through AuthProvider
## Testing
Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
Deferred dynamic tools need to round-trip a namespace so a tool returned
by `tool_search` can be called through the same registry key that core
uses for dispatch.
This change adds namespace support for dynamic tool specs/calls,
persists it through app-server thread state, and routes dynamic tool
calls by full `ToolName` while still sending the app the leaf tool name.
Deferred dynamic tools must provide a namespace; non-deferred dynamic
tools may remain top-level.
It also introduces `LoadableToolSpec` as the shared
function-or-namespace Responses shape used by both `tool_search` output
and dynamic tool registration, so dynamic tools use the same wrapping
logic in both paths.
Validation:
- `cargo test -p codex-tools`
- `cargo test -p codex-core tool_search`
---------
Co-authored-by: Sayan Sisodiya <sayan@openai.com>
## Summary
- persist registered agent tasks in the session state update stream so
the thread can reuse them
- prewarm task registration once identity registration succeeds, while
keeping startup failures best-effort
- isolate the session-side task lifecycle into a dedicated module so
AgentIdentityManager and RegisteredAgentTask do not leak across as many
core layers
## Testing
- cargo test -p codex-core startup_agent_task_prewarm
- cargo test -p codex-core
cached_agent_task_for_current_identity_clears_stale_task
- cargo test -p codex-core record_initial_history_
To improve performance of UI loads from the app, add two main
improvements:
1. The `thread/list` api now gets a `sortDirection` request field and a
`backwardsCursor` to the response, which lets you paginate forwards and
backwards from a window. This lets you fetch the first few items to
display immediately while you paginate to fill in history, then can
paginate "backwards" on future loads to catch up with any changes since
the last UI load without a full reload of the entire data set.
2. Added a new `thread/turns/list` api which also has sortDirection and
backwardsCursor for the same behavior as `thread/list`, allowing you the
same small-fetch for immediate display followed by background fill-in
and resync catchup.
To allow the ability to have guaranteed-unique cursors, we make two
important updates:
* Add new updated_at_ms and created_at_ms columns that are in
millisecond precision
* Guarantee uniqueness -- if multiple items are inserted at the same
millisecond, bump the new one by one millisecond until it becomes unique
This lets us use single-number cursors for forwards and backwards paging
through resultsets and guarantee that the cursor is a fixed point to do
(timestamp > cursor) and get new items only.
This updated implementation is backwards-compatible since multiple
appservers can be running and won't handle the previous method well.
## Summary
App-server v2 already receives turn-scoped `clientMetadata`, but the
Rust app-server was dropping it before the outbound Responses request.
This change keeps the fix lightweight by threading that metadata through
the existing turn-metadata path rather than inventing a new transport.
## What we're trying to do and why
We want turn-scoped metadata from the app-server protocol layer,
especially fields like Hermes/GAAS run IDs, to survive all the way to
the actual Responses API request so it is visible in downstream
websocket request logging and analytics.
The specific bug was:
- app-server protocol uses camelCase `clientMetadata`
- Responses transport already has an existing turn metadata carrier:
`x-codex-turn-metadata`
- websocket transport already rewrites that header into
`request.request_body.client_metadata["x-codex-turn-metadata"]`
- but the Rust app-server never parsed or stored `clientMetadata`, so
nothing from the app-server request was making it into that existing
path
This PR fixes that without adding a new header or a second metadata
channel.
## How we did it
### Protocol surface
- Add optional `clientMetadata` to v2 `TurnStartParams` and
`TurnSteerParams`
- Regenerate the JSON schema / TypeScript fixtures
- Update app-server docs to describe the field and its behavior
### Runtime plumbing
- Add a dedicated core op for app-server user input carrying turn-scoped
metadata: `Op::UserInputWithClientMetadata`
- Wire `turn/start` and `turn/steer` through that op / signature path
instead of dropping the metadata at the message-processor boundary
- Store the metadata in `TurnMetadataState`
### Transport behavior
- Reuse the existing serialized `x-codex-turn-metadata` payload
- Merge the new app-server `clientMetadata` into that JSON additively
- Do **not** replace built-in reserved fields already present in the
turn metadata payload
- Keep websocket behavior unchanged at the outer shape level: it still
sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON
string now contains the merged fields
- Keep HTTP fallback behavior unchanged except that the existing
`x-codex-turn-metadata` header now includes the merged fields too
### Request shape before / after
Before, a websocket `response.create` looked like:
```json
{
"type": "response.create",
"client_metadata": {
"x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}"
}
}
```
Even if the app-server caller supplied `clientMetadata`, it was not
represented there.
After, the same request shape is preserved, but the serialized payload
now includes the new turn-scoped fields:
```json
{
"type": "response.create",
"client_metadata": {
"x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}"
}
}
```
## Validation
### Targeted tests added / updated
- protocol round-trip coverage for `clientMetadata` on `turn/start` and
`turn/steer`
- protocol round-trip coverage for `Op::UserInputWithClientMetadata`
- `TurnMetadataState` merge test proving client metadata is added
without overwriting reserved built-in fields
- websocket request-shape test proving outbound `response.create`
contains merged metadata inside
`client_metadata["x-codex-turn-metadata"]`
- app-server integration tests proving:
- `turn/start` forwards `clientMetadata` into the outbound Responses
request path
- websocket warmup + real turn request both behave correctly
- `turn/steer` updates the follow-up request metadata
### Commands run
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields
--lib`
- `cargo test -p codex-core --test all
responses_websocket_preserves_custom_turn_metadata_fields`
- `cargo test -p codex-app-server --test all client_metadata`
- `cargo test -p codex-app-server --test all
turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2
-- --nocapture`
- `just fmt`
- `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol
-p codex-app-server`
- `just fix -p codex-exec -p codex-tui-app-server`
- `just argument-comment-lint`
### Full suite note
`cargo test` in `codex-rs` still fails in:
-
`suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request`
I verified that same failure on a clean detached `HEAD` worktree with an
isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.
## Description
This PR makes the SQLite state runtime tolerate databases that have
already been migrated by a newer Codex binary.
Today, if an older CLI sees migration versions in `_sqlx_migrations`
that it doesn't know about, startup fails. This change relaxes that
check for the runtime migrators we use in `codex-state` so older
binaries can keep opening the DB in that case.
## Why
We can end up with mixed-version CLIs running against the same local
state DB. In that setup, treating "the database is ahead of me" as a
hard error is unnecessarily strict and breaks the older client even when
the migration history is otherwise fine.
## Follow-up
We still clean up versioned `state_*.sqlite` and `logs_*.sqlite` files
during init, so older binaries can treat newer DB files as legacy. That
should probably be tightened separately if we want mixed-version local
usage to be fully safe.
## Why
`argument-comment-lint` was green in CI even though the repo still had
many uncommented literal arguments. The main gap was target coverage:
the repo wrapper did not force Cargo to inspect test-only call sites, so
examples like the `latest_session_lookup_params(true, ...)` tests in
`codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
This change cleans up the existing backlog, makes the default repo lint
path cover all Cargo targets, and starts rolling that stricter CI
enforcement out on the platform where it is currently validated.
## What changed
- mechanically fixed existing `argument-comment-lint` violations across
the `codex-rs` workspace, including tests, examples, and benches
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
`tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
`--all-targets` unless the caller explicitly narrows the target set
- fixed both wrappers so forwarded cargo arguments after `--` are
preserved with a single separator
- documented the new default behavior in
`tools/argument-comment-lint/README.md`
- updated `rust-ci` so the macOS lint lane keeps the plain wrapper
invocation and therefore enforces `--all-targets`, while Linux and
Windows temporarily pass `-- --lib --bins`
That temporary CI split keeps the stricter all-targets check where it is
already cleaned up, while leaving room to finish the remaining Linux-
and Windows-specific target-gated cleanup before enabling
`--all-targets` on those runners. The Linux and Windows failures on the
intermediate revision were caused by the wrapper forwarding bug, not by
additional lint findings in those lanes.
## Validation
- `bash -n tools/argument-comment-lint/run.sh`
- `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
- shell-level wrapper forwarding check for `-- --lib --bins`
- shell-level wrapper forwarding check for `-- --tests`
- `just argument-comment-lint`
- `cargo test` in `tools/argument-comment-lint`
- `cargo test -p codex-terminal-detection`
## Follow-up
- Clean up remaining Linux-only target-gated callsites, then switch the
Linux lint lane back to the plain wrapper invocation.
- Clean up remaining Windows-only target-gated callsites, then switch
the Windows lint lane back to the plain wrapper invocation.
- create `codex-git-utils` and move the shared git helpers into it with
file moves preserved for diff readability
- move the `GitInfo` helpers out of `core` so stacked rollout work can
depend on the shared crate without carrying its own git info module
---------
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
This PR add an URI-based system to reference agents within a tree. This
comes from a sync between research and engineering.
The main agent (the one manually spawned by a user) is always called
`/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
where `agent_1` is chosen by the model.
Any agent can contact any agents using the path.
Paths can be used either in absolute or relative to the calling agents
Resume is not supported for now on this new path
Add a representation of the agent graph. This is now used for:
* Cascade close agents (when I close a parent, it close the kids)
* Cascade resume (oposite)
Later, this will also be used for post-compaction stuffing of the
context
Direct fix for: https://github.com/openai/codex/issues/14458