## Why
We want the agent graph store to be passed down the stack as a real
dependency, the same way we already treat the thread store.
This will let us inject the agent graph store as a real dependency and
support implementations other than the local SQLite-backed one. Right
now most code instantiates a state DB and an agent graph store
just-in-time. Ideally, we would not depend on the state DB directly but
only read through the higher-level interfaces.
This change makes the dependency boundaries explicit and moves state DB
initialization to process bootstrap instead of hiding it inside local
store implementations.
## What changed
- `ThreadManager` now requires a `StateDbHandle` and an
`AgentGraphStore` at construction time instead of treating them as
optional internals.
- The local store constructors no longer lazily initialize SQLite.
Callers now initialize the state DB once per process and use that shared
handle to build:
- `LocalThreadStore`
- `LocalAgentGraphStore`
- App bootstraps (`app-server`, `mcp-server`, `prompt_debug`, and the
thread-manager sample) now initialize the state DB up front and inject
the resulting handle down the stack.
- `app-server` now consistently uses its process-scoped state DB handle
instead of reopening SQLite or trying to recover it from loaded threads.
- Device-key storage now reuses the shared state DB handle instead of
maintaining its own lazy opener.
- The thread archive / descendant traversal paths now use the injected
`AgentGraphStore` instead of reaching through local
thread-store-specific state.
## Verification
- `cargo check -p codex-core -p codex-thread-store -p codex-app-server
-p codex-mcp-server -p codex-thread-manager-sample --tests`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-core
thread_manager_accepts_separate_agent_graph_store_and_thread_store --
--nocapture`
- `cargo test -p codex-app-server
thread_archive_archives_spawned_descendants -- --nocapture`
## Summary
This is the first PR in the V8 in-process sandboxing rollout.
It adds the build-system and Rust feature plumbing needed to support
sandboxed V8 builds, then enables sandboxing by default for the
source-built Bazel V8 path that we control directly. It deliberately
keeps the published `rusty_v8` artifact workflows on their current
non-sandboxed contract so this PR can land and ship independently before
we change any released artifacts.
## Rollout plan
- [x] **PR 1: land sandbox plumbing and default source-built Bazel V8 to
sandboxed mode**
- [ ] **PR 2: publish sandbox-enabled release artifacts and add
compatibility validation**
- Produce sandboxed artifact pairs for every released Cargo target that
does not already use the source-built Bazel path.
- Add CI coverage that consumes those sandboxed artifacts and verifies:
- `codex-v8-poc` reports sandbox enabled
- `codex-code-mode` builds/tests against the sandboxed path
- [ ] **PR 3: switch release consumers to sandboxed artifacts by
default**
- Update released artifact selectors/checksums.
- Enable the Rust `v8_enable_sandbox` feature in the default release
path.
- Make the sandboxed artifact family the normal path for published
builds.
- [ ] **PR 4: remove rollout-only compatibility paths**
- Remove the temporary non-sandbox release compatibility config once the
new default has shipped and baked.
- Keep the invariant tests permanently.
## Summary
Adds the required `items_view` field to the three session picker `Turn`
test fixtures that populate full turn item lists.
## Root Cause
`#21063` added `Turn.items_view` to the app-server protocol type. The
later session picker merge added three test-only
`codex_app_server_protocol::Turn` literals without the new field, which
broke Bazel compilation on `main` with `E0063: missing field
items_view`.
## Validation
- `just fmt`
- `cargo test -p codex-tui resume_picker --no-fail-fast`
- `just argument-comment-lint`
I also ran `cargo test -p codex-tui`; it compiled and ran the suite, but
this local machine failed two pre-existing status permission-profile
tests because `/etc/codex/requirements.toml` disallows
`DangerFullAccess`.
## Summary
Xcode 26.4 was built against app-server behavior from before MCP
elicitation requests became client-visible in CLI 0.120.0 via #17043.
That client line does not expect the new events/messages, so this PR
restores the old behavior for exactly that client/version combination.
The compatibility handling stays in the app-server layer: when the
initialized client is `Xcode` and its version starts with `26.4`, the
app server marks the live Codex thread so MCP elicitations are
auto-denied. The flag is applied on thread start/resume/fork/turn
attachment, carried through `Codex`/`CodexThread`, and stored on
`McpConnectionManager` so refreshed MCP managers preserve the behavior.
## Notes
This is intentionally narrow and includes a TODO to remove the
compatibility path once Xcode 26.4 ages out.
## Why
Tool registration used to bind a tool name to a handler externally,
which left ownership split between the registry plan and the handler
implementation. Some built-in handlers also multiplexed multiple in-core
tools by switching on the invoked tool name internally.
This moves the registry identity onto the handler itself and makes
built-in multi-tool areas use separate concrete handlers, so each
registered handler instance owns exactly one tool name and one dispatch
path.
## What Changed
- Added `ToolHandler::tool_name()` and changed
`ToolRegistryBuilder::register_handler` to derive the registry key from
the handler.
- Split built-in multiplexed handlers into concrete per-tool handlers
for unified exec, shell/local shell/container exec, MCP resources, goal
tools, and agent job tools.
- Kept name-carrying handler instances only where the runtime target is
inherently external or dynamic, such as MCP tools, dynamic tools, and
unavailable placeholders.
- Updated `ToolHandlerKind` and registry-plan construction so plan
entries map directly to concrete handler registrations.
## Verification
- `cargo test -p codex-tools tool_registry_plan`
- `cargo test -p codex-core --lib tools::registry_tests`
- `just fix -p codex-tools`
- `just fix -p codex-core`
## Summary
- make the Linux sandbox synthetic mount registry path unique per
effective UID
- keep same-user coordination intact while avoiding collisions between
users sharing `/tmp`
- add a regression test for the registry path contract
## Why
Issue #21192 reports that the Linux sandbox currently uses one global
temp path at `/tmp/codex-bwrap-synthetic-mount-targets`. If another user
creates that directory first, later users can fail to open the shared
lock file with `Permission denied`.
## Validation
- `just fmt`
- `cargo test -p codex-linux-sandbox`
- `cargo clippy -p codex-linux-sandbox --all-targets`
Fixes#21192
## Summary
- Propagate Linux bubblewrap argument-construction failures instead of
panicking in the helper
- Keep mutable-symlink carveouts fail-closed while reporting them as
ordinary sandbox build failures
- Add regression coverage for a protected `.codex` symlink inside a
writable workspace root
## Root cause
Linux bubblewrap intentionally rejects read-only carveouts that cross a
symlink the sandboxed process can still rewrite. That is the correct
security behavior for protected metadata paths such as `.codex`.
The bug was one layer higher: `linux_run_main` treated the expected
build failure as impossible and panicked while constructing the
bubblewrap argv. For issue #20716, that turned a normal fail-closed
sandbox outcome into a noisy panic in the transcript.
## User impact
Users with a project-local `.codex` symlink inside a writable workspace
still get the conservative sandbox decision, but they no longer see a
Rust panic for that condition. The helper now exits with the concise
sandbox-build error so the normal denial / escalation path can handle
it.
Fixes#20716
## Why
The resume/fork picker is becoming the main way users recover previous
work, but the old fixed table made sessions hard to scan once thread
names, branches, working directories, and timestamps all mattered. This
redesign makes the picker denser by default, easier to search, and safer
to inspect before resuming or forking.
<table>
<tr>
<td>
<img width="1660" height="1103" alt="CleanShot 2026-05-03 at 12 34 10"
src="https://github.com/user-attachments/assets/313ede1d-1da4-4863-acd2-56b3e27e9703"
/>
</td>
<td>
<img width="1662" height="1100" alt="CleanShot 2026-05-03 at 12 34 15"
src="https://github.com/user-attachments/assets/cfde7d5c-bab0-4994-a807-254e53f344ea"
/>
</td>
</tr>
<tr>
<td>
<img width="1664" height="1107" alt="CleanShot 2026-05-03 at 12 39 22"
src="https://github.com/user-attachments/assets/e1ee58ca-4dc5-4a35-ae0f-47562da3974c"
/>
</td>
<td>
<img width="1662" height="1100" alt="CleanShot 2026-05-03 at 12 35 09"
src="https://github.com/user-attachments/assets/9c888072-eedf-4f45-985c-0c14df28bcc7"
/>
</td>
</tr>
</table>
## What Changed
- Replaces the old session table with responsive session rows that
prioritize the session name or preview, then show timestamp, cwd, and
branch metadata.
- Makes dense view the default while keeping comfortable view available
through `Ctrl+O`.
- Persists the picker view preference in `[tui].session_picker_view`,
including active profile-scoped config.
- Adds sort/filter controls for updated time, created time, cwd, and all
sessions.
- Expands search matching across session name, preview, thread id,
branch, and cwd.
- Makes `Esc` safer in search mode: it clears an active query before
starting a new session.
- Adds lazy transcript inspection:
- `Space` expands recent transcript context inline.
- `Ctrl+T` opens a transcript overlay.
- raw reasoning visibility follows `show_raw_agent_reasoning`.
- Keeps remote cwd filtering server-side for remote app-server sessions
so local path normalization does not incorrectly hide remote results.
- Updates snapshots and config schema for the new picker states and
config option.
## How to Test
1. Start Codex in a repo with several saved sessions.
2. Press `Ctrl+R` / resume picker entry point.
3. Confirm the picker opens in dense mode and shows session name or
preview, timestamp, cwd, and branch metadata.
4. Press `Ctrl+O` and confirm it switches between dense and comfortable
views.
5. Restart Codex and confirm the selected view persists.
6. Type a query that matches a branch, cwd, thread id, or session name;
confirm matching sessions appear.
7. Press `Esc` while the query is non-empty and confirm it clears search
instead of starting a new session.
8. Select a session and press `Space`; confirm recent transcript context
expands inline.
9. Press `Ctrl+T`; confirm the transcript overlay opens and respects
raw-reasoning visibility settings.
Targeted tests:
- `cargo test -p codex-tui resume_picker --no-fail-fast`
- `cargo test -p codex-core
runtime_config_resolves_session_picker_view_default_and_override`
- `cargo test -p codex-core profile_tui_rejects_unsupported_settings`
- `cargo check -p codex-thread-manager-sample`
- `cargo insta pending-snapshots`
Stacked on #20892.
## Why
#20892 adds the TUI workspace command abstraction so branch status
metadata can run through app-server instead of assuming the CLI process
has the active workspace locally. `/diff` still used direct local
process execution, which means remote app-server sessions could compute
the diff against the wrong machine or fail to see the active workspace
at all.
This PR moves `/diff` onto that same app-server-backed command path so
Git runs wherever the active workspace lives.
## What Changed
- Route `/diff` through the TUI `WorkspaceCommandExecutor` using the
active chat cwd.
- Replace direct `tokio::process::Command` usage in `get_git_diff` with
argv-based workspace command requests.
- Preserve the existing `/diff` behavior: tracked diff output, untracked
file diffs, treating Git diff exit code `1` as success, and showing the
existing non-git-repository message.
- Extend `WorkspaceCommand` with caller-set timeouts and an explicit
uncapped-output opt-out. Metadata probes remain capped by default;
`/diff` opts out because its full output is the user-visible payload.
## How to Test
Manual reviewer path:
1. Start the Codex TUI from a Git worktree with one tracked file change
and one untracked file.
2. Run `/diff`.
3. Confirm the rendered diff includes both the tracked diff and the
untracked file diff.
4. Start the TUI outside a Git worktree, or switch to a non-git cwd,
then run `/diff`.
5. Confirm it shows the existing `/diff` not-inside-a-git-repository
message.
Targeted tests run:
- `cargo test -p codex-tui get_git_diff -- --nocapture`
- `cargo test -p codex-tui branch_summary -- --nocapture`
- `cargo test -p codex-tui`
## Why
`Turn.items` currently overloads an empty array to mean either that no
items exist or that the server intentionally did not load them for this
response. That ambiguity blocks future lazy-loading work where clients
need to distinguish unloaded, summary, and fully hydrated turn payloads.
## What changed
- add a new `TurnItemsView` enum with `notLoaded`, `summary`, and `full`
variants
- add required `itemsView` metadata to app-server `Turn` payloads
- mark reconstructed persisted history as `full` and live shell-style
turn payloads as `notLoaded`
- keep current `thread/turns/list` behavior unchanged and document that
it still returns `full` turns today
- regenerate the JSON and TypeScript protocol fixtures
## Verification
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server thread_read_can_include_turns`
- `cargo test -p codex-app-server
thread_turns_list_can_page_backward_and_forward`
- `cargo test -p codex-app-server
thread_resume_rejects_history_when_thread_is_running`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fmt`
## Why
App-server had repeated hand-built JSON-RPC error objects for standard
error shapes. Using the shared helpers keeps the common
`invalid_request`, `invalid_params`, and `internal_error` construction
in one place and reduces the chance of new call sites drifting from the
common error payload shape.
## What changed
- Replaced manual standard JSON-RPC error object creation with
`internal_error(...)`, `invalid_request(...)`, and `invalid_params(...)`
across app-server request processors and runtime paths.
- Removed local duplicate helper definitions from search and review
request handling.
- Preserved existing structured `data` payloads by creating the shared
helper error first and then attaching the existing metadata.
- Left custom non-standard errors and raw error-code assertions intact.
## Validation
- `cargo test -p codex-app-server`
# Why
We want shared hook trust that both the app and the TUI can build on,
but the metadata is only useful if runtime behavior agrees with it. This
PR adds a single backend trust model for hooks so unmanaged hooks cannot
run until the current definition has been reviewed, while managed hooks
remain runnable and non-configurable.
# What
- persist `trusted_hash` alongside hook state in `config.toml`
- expose `currentHash` and derived `trustStatus` through `hooks/list`
- derive trust from normalized hook definitions so equivalent hooks from
`config.toml` and `hooks.json` share the same trust identity
- gate unmanaged hooks on trust before they enter the runnable handler
set
# Reviewer Notes
- key file to review is `codex-rs/hooks/src/engine/discovery.rs`
- the only **core** change is schema related
## Why
When a turn exposes multiple selected environments, shell-style tools
need a model-facing way to identify the intended target environment and
handlers need to resolve that target before parsing cwd-relative
permission fields or launching processes.
This PR scopes that rollout to process tools. Filesystem-oriented tools
such as `apply_patch`, `view_image`, and `list_dir` are intentionally
left for follow-up slices.
## What Changed
- Adds an `include_environment_id` option to shell-style tool schema
builders.
- Exposes optional `environment_id` on `shell`, `shell_command`, and
`exec_command` only when `ToolEnvironmentMode::Multiple` is active.
- Adds a shared handler helper that parses `environment_id` and
`workdir` from JSON function-call arguments and returns the selected
`Environment` plus effective absolute cwd.
- Uses that helper in `shell`, `shell_command`, and `exec_command`
handling so process execution uses the selected environment filesystem
and cwd.
- Changes `ExecCommandRequest` to carry a required resolved `cwd`,
removing the process-manager fallback to the primary turn cwd for new
exec commands.
- Leaves `write_stdin` unchanged because it targets an existing process
id, not a new environment.
## Testing
- Added unit coverage for process-tool schema exposure, selected
environment resolution, primary fallback, no-environment handling,
unknown environment ids, and resolving cwd-relative permission paths
against the selected environment cwd.
- Added a remote-suite e2e coverage case for `exec_command` routing
across explicit zero environments, one local environment, and
local+remote environments.
- Ran `just fmt` and `git diff --check`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Tool analytics need stable, typed payloads before the later lifecycle
reducer starts emitting them. Keeping the event schema definitions
isolated in their own PR makes the emitted surface reviewable separately
from the reducer logic that produces those events.
## What changed
- Adds the common tool-item analytics event base plus event payload
types for command execution, file changes, MCP calls, dynamic tools,
collaboration tools, web search, and image generation.
- Extends `TrackEventRequest` with the corresponding tool-item variants.
- Adds serialization coverage for the command-execution event shape.
## Verification
- `cargo test -p codex-analytics`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17089).
* #18748
* #18747
* #17090
* __->__ #17089
* #20514
## Why
Taking a step to removing the `persistExtendedHistory` field. It's not
scalable to be persisting so much data in the rollout file and returning
it in the thread history.
When a client explicitly sends `true`, the server now tells that client
the parameter is deprecated and ignored so the caller has a clear
migration signal via the `deprecationNotice` notification.
## What changed
- Keep the `persist_extended_history` / `persistExtendedHistory` field
in the v2 protocol for compatibility, but document it as deprecated and
ignored.
- Ignore the parameter in app-server `thread/start`, `thread/resume`,
and `thread/fork`; those paths always use limited history persistence
now.
- Stop treating `persistExtendedHistory` as a running-thread resume
override mismatch.
- Emit a connection-scoped `deprecationNotice` when a request explicitly
sets `persist_extended_history: true`.
## Verification
- Added `thread_start_deprecates_persist_extended_history_true` to cover
the deprecation notice.
- `cargo test -p codex-app-server`
- `cargo test -p codex-app-server-protocol`
## Why
Granular copy is particularly difficult with the current output. Part of
it was solved with the introduction of the `/copy` command but when you
only need to copy parts of a response, you still encounter some issues:
- When you copy a paragraph, the result is a sequence of separate lines
instead of one correctly joined paragraph.
- When a word wraps, part of it stays on the original line and the rest
appears at the start of the next line.
- When you copy a long command, extra line breaks are often inserted,
and command arguments can be split across multiple lines.
https://github.com/user-attachments/assets/0ef85c84-9363-4aad-b43a-15fce062a443
## Solution
Now that we own the scrollback and we re-create it when we resize, we
have the opportunity of toggling between the raw text and the rich text
we see today.
- Add TUI raw scrollback mode with `tui.raw_output_mode`, `/raw
[on|off]`, and the configurable `tui.keymap.global.toggle_raw_output`
action.
- Render transcript cells through rich/raw-aware paths so raw mode
preserves source text and lets the terminal soft-wrap selection-friendly
output.
- Bind raw-mode toggle to `alt-r` by default, with the keybinding path
toggling silently while `/raw` continues to emit confirmation messages.
## Related Issues
Likely addressed by raw mode:
- #12200: clean copy for multiline and soft-wrapped output. Raw mode
removes Codex-inserted wrapping/indentation and lets the terminal
soft-wrap logical lines.
- #9252: command suggestions gain unwanted leading spaces when copied.
Raw mode renders transcript text without the rich-mode left
padding/gutter.
- #8258: prompt output is hard to copy because of leading indentation.
Raw mode renders user/source-backed transcript text without that
decorative indentation.
Partially or conditionally addressed:
- #2880: copy/export message as Markdown. Raw mode exposes raw Markdown
for terminal selection, but this PR does not add a dedicated
export/copy-message command.
- #19820: mouse drag selection + copy in the TUI. Raw mode improves
terminal-native selection of output/history text, but this PR does not
implement in-TUI mouse selection, highlighting, auto-copy, or composer
selection.
- #18979: copied content is divided into two parts. This should improve
cases caused by Codex-inserted wraps/padding in rendered output; if the
report is about pasting into the composer/input path, that remains
outside this PR.
## Validation
- `just write-config-schema`
- `just fmt`
- `cargo test -p codex-config`
- `cargo test -p codex-tui`
- `just fix -p codex-tui`
- `just argument-comment-lint`
- `cargo test -p codex-tui
raw_output_mode_can_change_without_inserting_notice -- --nocapture`
- `cargo test -p codex-tui
raw_slash_command_toggles_and_accepts_on_off_args -- --nocapture`
- `cargo test -p codex-tui raw_output_toggle -- --nocapture`
- `git diff --check`
- `cargo insta pending-snapshots`
## Why
Recent Auto Review reports show Git traffic hanging through the local
proxy on both SSH and HTTPS paths. Today the support bundle does not
make it obvious whether a request is stuck before upstream dialing,
during the proxy hop, or after the upstream response begins, which slows
down root-cause triage.
This adds a small amount of runtime visibility at the existing proxy
boundaries without changing routing or policy behavior.
## What changed
- log whether HTTP and CONNECT traffic take the direct or upstream-proxy
route
- log start / success / failure timings for CONNECT, HTTP, and SOCKS5
upstream dials
- log CONNECT forwarding lifecycle events
- describe HTTP success at the response-header boundary that is actually
observed, rather than implying the full body finished
## Verification
- `cargo test -p codex-network-proxy`
- `cargo clippy -p codex-network-proxy --all-targets -- -D warnings`
## Why
Linux startup runs an advisory system `bwrap` warning probe on each
launch. On hosts with NFS or autofs mounts, its `--ro-bind / /` probe
can take tens of seconds before Codex prints anything, matching #19828.
Because this probe only decides whether to surface a warning, it should
not be allowed to stall startup.
Relevant pre-change path:
[`codex-rs/sandboxing/src/bwrap.rs`](de2ccf9473/codex-rs/sandboxing/src/bwrap.rs (L64-L80))
## What changed
- Bound the advisory system `bwrap` probe to 500 ms.
- Preserve the existing warning behavior when `bwrap` promptly reports a
known user-namespace failure.
- Kill and reap the probe child on timeout, then suppress the advisory
warning instead of blocking startup.
- Read probe stderr with a bounded nonblocking drain so descendants that
inherit the pipe cannot extend startup after the probe child exits.
- Add regression coverage for both a deliberately slow fake `bwrap`
process and a fake probe whose descendant keeps stderr open.
## Security
This only bounds the advisory startup probe. It does not change the
command execution path or add a fail-open sandbox fallback. The related
command-side hang in #20017 remains separate from this PR.
## Verification
- Added `system_bwrap_probe_times_out_without_reporting_a_warning`.
- Added
`system_bwrap_probe_does_not_wait_for_descendants_holding_stderr_open`.
- `cargo test -p codex-sandboxing`
- `cargo clippy -p codex-sandboxing --all-targets -- -D warnings`
Fixes#19828
Related: #20017
## Why
We found this while reviewing #21091, but confirmed it is not introduced
by that PR: the order-sensitive `current_text_with_pending()`
replacement loop already existed, and `main` already allowed active
same-size large pastes to use prefix-overlapping labels such as `[Pasted
Content N chars]` and `[Pasted Content N chars] #2`.
#21091 fixes placeholder numbering after a draft is cleared, so a fresh
same-size paste can reuse the base label. This PR fixes a different
path: when a draft already contains multiple active same-size large
pastes, the placeholders can overlap by prefix, for example `[Pasted
Content N chars]` and `[Pasted Content N chars] #2`.
That overlap breaks `current_text_with_pending()` when the composer
materializes the draft text for the external editor. Replacing the base
placeholder first can partially rewrite the `#2` placeholder, leaving
the external editor seeded with corrupted text instead of both paste
payloads.
| Before | After |
|---|---|
| <img width="1230" height="1008" alt="CleanShot 2026-05-05 at 10 18 09"
src="https://github.com/user-attachments/assets/88a2936c-cf00-4adc-8567-8fd8f398b4a8"
/> | <img width="1230" height="1008" alt="CleanShot 2026-05-05 at 10 20
31"
src="https://github.com/user-attachments/assets/119cff52-43c8-432a-9367-418d82f4ed82"
/> |
| <img width="1230" height="1008" alt="CleanShot 2026-05-05 at 10 18 57"
src="https://github.com/user-attachments/assets/026031bb-839b-4252-a0fd-9ba9616435fe"
/> | <img width="1230" height="1008" alt="CleanShot 2026-05-05 at 10 21
31"
src="https://github.com/user-attachments/assets/8cb6f2c8-3a5d-411b-8623-dca666ee3c08"
/> |
## What Changed
- Changed `current_text_with_pending()` to expand pending pastes through
the existing element-range based `expand_pending_pastes()` helper
instead of global string replacement.
- Added a regression test with two different same-length large pastes to
ensure both overlapping placeholders expand to their original payloads.
## How to Test
1. Start Codex TUI.
2. Paste a large string, for example 1004 `A` characters.
```shell
perl -e 'print "A" x 1004' | pbcopy
```
3. Paste a second large string with the same length, for example 1004
`B` characters.
```shell
perl -e 'print "B" x 1004' | pbcopy
```
4. Open the external editor from the composer.
5. Confirm the editor is seeded with the full `A...` payload followed by
the full `B...` payload, with no literal `#2` left behind.
Targeted tests:
- `cargo test -p codex-tui
current_text_with_pending_expands_overlapping_placeholders`
- `just argument-comment-lint-from-source -p codex-tui`
I also ran `cargo test -p codex-tui`; it reached the full crate suite
but failed two unrelated local status tests because this machine's
`/etc/codex/requirements.toml` rejects `DangerFullAccess`.
# Why
Revert #20524 for now because the computer use plugin has not migrated
off legacy `notify` yet. Keeping the deprecation in place today would
show users a warning before the plugin path is ready to move, so this
rolls the change back until that migration is complete.
# What
- revert the legacy `notify` deprecation change from #20524
- restore the prior `notify` behavior and remove the temporary
deprecation metrics/docs from that change
Once the computer use plugin has migrated, we can land the same
deprecation again.
Fixes#20945.
This keeps `codex fork --last` aligned with the neighboring
latest-session lookup flows. The local fork path now uses the same
cwd-scope helper as `resume --last`, which is also a small code cleanup
around how this selection logic is shared.
Credit to @chanwooyang1 for the report and for pointing out the narrow
fix direction.
What changed:
- Route `fork --last` through the shared latest-session cwd filter.
- Preserve `--all` as the explicit opt-in for global latest-session
selection.
- Keep remote cwd override behavior unchanged.
- Add focused coverage for local default, `--all`, and remote override
filter semantics.
Validation:
- Ran `just fmt`.
- Ran `git diff --check`.
- Reviewed the `fork --last`, `resume --last`, and fork picker selection
paths against the issue report.
Fixes#19940.
Large-paste placeholder numbering was backed by a per-size counter, so
clearing a draft with `Ctrl+C` left numbering state behind even though
the active pending paste state was gone. This updates the composer to
derive the next placeholder suffix from active pending pastes instead,
which keeps simultaneous same-size pastes distinct while letting fresh
drafts reuse the base label. This is also a small code cleanup: pending
paste state is now the source of truth instead of maintaining a separate
counter.
Credit to @Sungyoun-Kim for the issue report, root-cause notes, and fork
with the proposed fix, and to @charley-oai for the earlier related
#10032 proposal.
Changes:
- Remove the monotonic large-paste counter from the composer.
- Compute suffixes from currently active pending paste placeholders.
- Document large-paste placeholder behavior in the composer module docs.
- Add regression coverage for `Ctrl+C` clearing and deletion/reset
behavior.
Testing:
- `just fmt`
- `git diff --check`
# Why
`PreToolUse` already exposes `hookSpecificOutput.additionalContext` in
the generated hook schema, but the runtime still rejected it as
unsupported. That leaves `PreToolUse` out of step with the other
context-injecting hooks and prevents hook authors from attaching
model-visible guidance to a pending tool call before it runs.
# What
- Parse `PreToolUse.additionalContext` and carry it through the hook
event pipeline.
- Record `PreToolUse` context at the hook boundary so successful context
is preserved for both allowed and blocked calls without widening the
tool registry surface.
- Preserve existing deny behavior when context is combined with either
`permissionDecision: "deny"` or the legacy `decision: "block"` shape.
## Why
The desktop app on Windows needs a read-only way to tell, before the
next tool call, whether the local Windows sandbox setup is in a state
that should block the user and ask for setup again.
The main case we want to cover is the elevated sandbox setup version
bump. Today, if the app is configured for elevated Windows sandboxing
and the installed setup is stale, the next sandboxed shell/exec path can
end up triggering the elevated setup flow directly. That means the user
can see an unexpected UAC prompt with no UI explanation.
This change adds a small app-server preflight so the desktop app can ask
“is Windows sandbox ready, not configured, or update-required?” during
startup and show the appropriate blocking UI before the user hits a tool
call.
## What changed
- Added a new read-only app-server RPC: `windowsSandbox/readiness`
- Added a new protocol enum and response type:
- `WindowsSandboxReadiness`
- `WindowsSandboxReadinessResponse`
- Added core readiness logic in `core/src/windows_sandbox.rs`:
- `ready`
- `notConfigured`
- `updateRequired`
- Wired the new request through `codex_message_processor`
- Regenerated the vendored app-server schema fixtures
## Readiness semantics
This is intentionally a coarse startup/version-bump readiness check, not
a full predictor of every runtime repair case.
For now, readiness is determined from:
- the configured Windows sandbox level
- `sandbox_setup_is_complete()` for elevated mode
That means:
- `disabled` maps to `notConfigured`
- `restricted token` maps to `ready`
- `elevated` maps to `ready` or `updateRequired` depending on
`sandbox_setup_is_complete()`
This is deliberate for the first UI integration because the common case
we want to catch is “the app updated, the elevated setup version bumped,
and the user should see an update-required blocker instead of a surprise
UAC prompt”.
It does not attempt to model every case where the deeper runtime path
might decide to repair or re-run setup.
## Testing
- Ran `cargo fmt --all -- app-server-protocol/src/protocol/common.rs
app-server-protocol/src/protocol/v2.rs
app-server/src/codex_message_processor.rs core/src/windows_sandbox.rs
core/src/windows_sandbox_tests.rs`
- Added unit tests for the pure readiness mapping in
`core/src/windows_sandbox_tests.rs`
- Regenerated vendored schema fixtures with `cargo run -p
codex-app-server-protocol --bin write_schema_fixtures -- --schema-root
app-server-protocol/schema`
- Did not run the full cargo test suite
## Why
Long `/goal` definitions currently reach lower-level goal validation and
can produce an opaque failure. This bug was reported by a user. Pasted
instruction blocks are especially confusing because the composer can
still contain a paste placeholder before expansion, which may otherwise
fall into the generic prompt-size error path.
There was also a related paste edge case where `/goal ` followed by a
multiline block whose first pasted line was blank looked like a bare
`/goal` command. That showed the goal usage/summary instead of setting
the pasted objective.
## What Changed
This adds TUI-side preflight validation for `/goal <objective>` using
the shared `MAX_THREAD_GOAL_OBJECTIVE_CHARS` limit. Oversized typed,
queued, and pasted goal objectives now fail locally with a goal-specific
message that recommends putting longer instructions in a file and
referencing that file from the goal.
The TUI now also lets inline-argument slash commands consume later-line
arguments before treating the first line as a bare command, so `/goal `
followed by blank lines and then objective text sets the goal instead of
opening the bare `/goal` flow.
## Manual Testing
1. Start the TUI with goals enabled and an active session.
2. Submit `/goal ` followed by exactly 4,000 objective characters. It
should continue through the normal goal-setting path.
3. Submit `/goal ` followed by 4,001 objective characters. It should not
set a goal, and should show `Goal objective is too long: 4,001
characters. Limit: 4,000 characters.` followed by the guidance to put
longer instructions in a file and reference that file from the goal.
4. Type `/goal `, paste a large block that becomes a `[Pasted Content
... chars]` placeholder, then submit. It should validate the expanded
pasted text and show the goal-specific file guidance rather than the
generic prompt-size error.
5. Type `/goal `, paste a multiline block whose first line is blank,
then submit. It should set the objective from the non-blank pasted
content instead of showing `Usage: /goal <objective>` or the bare goal
summary.
6. While a turn is running, queue an oversized `/goal` command. When the
queue drains, it should show the same goal-specific error and should not
emit a goal-setting request.
## Why
Adding goal metrics makes it possible to track how often goals are
created, completed, and stopped by budget limits, plus the final token
and wall-clock usage for terminal outcomes.
## What Changed
- Added OpenTelemetry metric constants for goal lifecycle tracking:
- `codex.goal.created`: increments each time a new persisted goal is
created or an existing goal is replaced with a new objective.
- `codex.goal.completed`: increments when a goal transitions to
`complete`.
- `codex.goal.budget_limited`: increments when a goal transitions to
`budget_limited` because its token budget has been reached.
- `codex.goal.token_count`: records the final persisted token count when
a goal transitions to `complete` or `budget_limited`.
- `codex.goal.duration_s`: records the final persisted elapsed
wall-clock time, in seconds, when a goal transitions to `complete` or
`budget_limited`.
- Emitted creation metrics when a goal is created or replaced.
- Emitted terminal outcome counters and final usage histograms when a
goal transitions to `complete` or `budget_limited`, avoiding
double-counting later in-flight accounting for already budget-limited
goals.
- Added focused `codex-core` tests for create/complete metrics and
one-time budget-limit metrics.
## Summary
- prefer tmux's native clipboard integration for `/copy` when running
inside tmux
- fall back to OSC 52 when tmux clipboard copy is unavailable
- add coverage for tmux-preferred, fallback, and combined-failure paths
## Why
Inside tmux, `/copy` previously relied on DCS-wrapped OSC 52 when `TMUX`
was set. That only reaches the outer terminal when tmux passthrough is
enabled, so Codex could report success even though the system clipboard
never changed.
## User impact
`/copy` now works inside tmux even when `allow-passthrough` is off, as
long as tmux clipboard integration is available. If tmux cannot handle
the copy, Codex still keeps the existing OSC 52 fallback path.
## Validation
- `cargo test -p codex-tui`
- `just fmt`
- `just fix -p codex-tui`
- `just argument-comment-lint`
- manually verified `/copy` inside tmux with `allow-passthrough off`
Fixes#19926
## Why
Memory search currently treats separators literally, so callers need to
know whether a stored term uses spaces, hyphens, or no separators at
all. That makes recall brittle for terms such as `MultiAgentV2` vs.
`multi agent v2` and `cold-resume` vs. `cold resume`.
## What changed
- Add an opt-in `normalized` mode to memory search that removes
non-alphanumeric separators after any requested case folding.
- Thread the new flag through the MCP `search` tool into the local
backend while keeping existing literal matching as the default.
- Reject queries that normalize to an empty string, and add regression
coverage for both normalized matching and that validation path.
## Testing
- `cargo test -p codex-memories-mcp`
## Why
Memory search currently supports either independent substring matches or
requiring every query to appear on the same line. That is too
restrictive for memory files where related terms often land on nearby
lines in the same note or bullet block.
## What changed
- Replace the old `all` match mode with explicit tagged modes:
`all_on_same_line` and `all_within_lines { line_count }`.
- Add windowed matching in `codex-rs/memories/mcp/src/local.rs` so
callers can require every query to appear within a bounded line range
while returning only the minimal qualifying windows.
- Reject invalid zero-width windows and update the MCP tool description
plus argument parsing to expose the new mode.
- Add coverage for same-line matching, windowed matching, and invalid
`line_count` input.
## Verification
- Added targeted coverage in `codex-rs/memories/mcp/src/local_tests.rs`
for `search_supports_all_within_lines_match_mode` and
`search_rejects_zero_line_window`.
- Added server-side parsing coverage in
`codex-rs/memories/mcp/src/server.rs` for
`search_args_accept_windowed_all_match_mode`.
## Why
The local memories root can contain implementation details such as
`.git` plus incidental OS metadata like `.DS_Store`. Those entries are
not authored memory content, so the memories MCP should keep them
invisible instead of exposing them through normal discovery or direct
lookup.
Only for local implementation ofc
## What changed
- Return `NotFound` for scoped `list`, `read`, and `search` requests
that include a hidden path component.
- Skip hidden files and directories while listing a directory or
recursively searching the memories tree.
- Add regression coverage for hidden files, hidden directories, and
hidden scoped requests across `list`, `read`, and `search`.
## Testing
- Added focused regression tests in `memories/mcp/src/local_tests.rs`
covering hidden-path behavior across the affected APIs.
## Why
`list_dir` still carries a full spec/handler/test path, but nothing in
the current model catalog advertises it via
`experimental_supported_tools`. That leaves us maintaining an
environment-backed tool surface that is effectively unused.
## What changed
- delete the `list_dir` handler and its tests from `codex-core`
- remove the `list_dir` spec builder, handler kind, and registry wiring
from `codex-tools`
- clean up the remaining internal README and registry tests so they no
longer mention the removed tool
## Why
The model list needs to carry display-ready service tier metadata so
clients can render tier choices with stable IDs, names, and
descriptions. A raw speed-tier string list is not enough for richer UI
copy or future tier labels.
## What changed
- Added `ModelServiceTier` to shared model metadata with string `id`,
`name`, and `description` fields.
- Added `service_tiers` to `ModelInfo` and `ModelPreset`, preserving
empty defaults for older cached model payloads.
- Exposed `serviceTiers` on app-server v2 `Model` responses and threaded
it through TUI app-server model conversion.
- Marked legacy `additional_speed_tiers` / `additionalSpeedTiers`
metadata as deprecated in source and generated schema output.
- Regenerated app-server protocol JSON schema and TypeScript fixtures,
including `ModelServiceTier.ts`.
## Verification
- Ran `just write-app-server-schema`.
- Did not run local tests per repo instruction; relying on PR CI.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Large hook outputs can enter model-visible context through hook-specific
paths such as `additionalContext` and `Stop` continuation prompts.
Without a dedicated cap, one hook can inject a large blob directly into
conversation history instead of leaving a bounded preview for the model
and preserving the full text elsewhere.
## What
- spill hook text once it exceeds a fixed `2_500`-token budget,
preserving the full output on disk and leaving a head/tail preview plus
saved path in context
- add shared hook-output spilling under
`CODEX_HOME/hook_outputs/<thread_id>/<uuid>.txt`
- apply the cap to both `additionalContext`, `feedback_message`, and
`Stop` continuation fragments
Migrate token usage replay, rollback responses, and detached review
setup (a special case of forking) to be served from ThreadStore reads
rather direct rollout files.
- replay restored token usage from already-loaded `RolloutItem` history
instead of reopening `Thread.path`
- rebuild rollback responses from loaded `ThreadStore` snapshots and
history
- start detached reviews from store-backed parent history and stored
review-thread metadata
- remove obsolete app-server rollout-summary helper code that became
dead after the store-backed migration
- preserve response/notification ordering for resume, fork, rollback,
and detached review flows
- add integration test coverage for the affected paths
- Route `thread/metadata/update` through
`ThreadStore::update_thread_metadata`.
- Add `LocalThreadStore` git metadata patch support for set, partial
update, and clear semantics.
- Add some unit tests for the new thread store code
- Remove a lot of dead code/tests!
## Why
The external startup/login surface for this auth path should talk about
an access token instead of exposing the internal Agent Identity
terminology. Users should pass `CODEX_ACCESS_TOKEN` or pipe a token into
`codex login --with-access-token`; the old external env/flag spellings
are removed so there is only one supported user-facing path.
## What Changed
- Added `CODEX_ACCESS_TOKEN` as the supported environment variable for
this auth path.
- Added `codex login --with-access-token` as the supported stdin-based
login command.
- Removed the legacy `CODEX_AGENT_IDENTITY` env-var fallback and hidden
`--with-agent-identity` CLI alias.
- Updated CLI error, status, and stdin prompts to use access-token
language.
- Added coverage for access-token env loading, CLI login failure
behavior, and renamed login status text.
## Validation
- `cargo test -p codex-login`
- `cargo test -p codex-cli`
- `just fix -p codex-login`
- `just fix -p codex-cli`
## Summary
- Add a testable DNS lookup helper for the local or private host
precheck while preserving production `lookup_host` behavior.
- Add deterministic coverage for DNS timeout, lookup error, private
resolution, and public resolution decisions.
- Keep BUGB 15982 guarded without relying on ambient DNS timing or
resolver behavior.
## Why
BUGB 15982 was fixed by failing closed on DNS lookup errors and
timeouts. The existing regression covered lookup failure through real
DNS, but did not deterministically exercise the timeout branch. This PR
adds a small injection point so CI can cover that branch without
standing up slow authoritative DNS.
## Validation
- `cargo test -p codex-network-proxy host_resolves_to_non_public_ip --
--nocapture`
- `cargo test -p codex-network-proxy
host_blocked_rejects_allowlisted_hostname_when_dns_lookup_fails --
--nocapture`
- `cargo test -p codex-network-proxy`
- `just fmt`
- `just fix -p codex-network-proxy`
- `git diff --check`
## Tickets
- BUGB 15982
-
https://linear.app/openai/issue/BUGB-15982/codex-dns-timeout-fail-open-in-codex-network-proxy-bypasses
- Bugcrowd:
https://tracker.bugcrowd.com/openai/submissions/b2bf131d-db04-478f-85aa-cdd17ca8f604
## Why
App-server clients sometimes need argv-based local process execution
while sandbox policy is controlled outside Codex. Those environments can
reject sandbox-disabling paths before a command ever starts, even when
the caller intentionally wants unsandboxed execution.
This PR adds a distinct `process/*` API for that use case instead of
extending `command/exec` with another sandbox-disabling shape. Keeping
the new surface separate also makes the future removal of `command/exec`
simpler: clients that need explicit process lifecycle control can move
to the newer handle-based API without depending on `command/exec`
business logic.
## What changed
- Added v2 process lifecycle methods: `process/spawn`,
`process/writeStdin`, `process/resizePty`, and `process/kill`.
- Added process notifications: `process/outputDelta` for streamed
stdout/stderr chunks and `process/exited` for final exit status and
buffered output.
- Made `process/spawn` intentionally unsandboxed and omitted
sandbox-selection fields such as `sandboxPolicy` and
`permissionProfile`.
- Added client-supplied, connection-scoped `processHandle` values for
follow-up control requests and notification routing.
- Supported cwd, environment overrides, PTY mode and size, stdin
streaming, stdout/stderr streaming, per-stream output caps, and timeout
controls.
- Killed active process sessions when the originating app-server
connection closes.
- Wired the implementation through the modular `request_processors/`
app-server layout, with process-handle request serialization for
follow-up control calls.
- Updated generated JSON/TypeScript schema fixtures and documented the
new API in `codex-rs/app-server/README.md`.
- Added v2 app-server integration coverage in
`codex-rs/app-server/tests/suite/v2/process_exec.rs` for spawn
acknowledgement before exit, buffered output caps, and process
termination.
## Verification
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server`
---------
Co-authored-by: Owen Lin <owen@openai.com>
## Summary
Remove the hardcoded remote plugin ID prefix allow-list from app-server
uninstall routing. IDs that do not parse as local `plugin@marketplace`
IDs now flow through the remote uninstall path, where the existing
remote ID safety validation still rejects empty IDs, spaces, slashes,
and other unsafe characters before URL/cache use.
## Why
Plugin-service owns the backend remote plugin ID contract. Codex should
not require remote IDs to start with the local hardcoded prefixes
`plugins~`, `plugins_`, `app_`, `asdk_app_`, or `connector_`, because
newer backend ID families could otherwise be rejected before
plugin-service sees the request.
## Validation
- `just fmt`
- `cargo test -p codex-app-server plugin_uninstall`
- `just fix -p codex-app-server`
- `git diff --check`
## Why
Tool families already disagree on what their existing `duration` fields
mean, so lifecycle latency should live on the shared item envelope
instead of being inferred from per-tool execution fields. Carrying that
envelope through app-server notifications gives downstream consumers one
reusable timing signal without pretending every tool has the same
execution semantics.
## What changed
- Adds `started_at_ms` to core `ItemStartedEvent` values and
`completed_at_ms` to core `ItemCompletedEvent` values.
- Populates those timestamps in the shared session lifecycle emitters,
so protocol-native items get timing without each producer tracking its
own clock state.
- Exposes `startedAtMs` on app-server `item/started` notifications and
`completedAtMs` on `item/completed` notifications.
- Maps the lifecycle timestamps through the app-server boundary while
leaving legacy-converted notifications nullable when no lifecycle
timestamp exists.
- Regenerates the app-server JSON schema and TypeScript fixtures for the
notification-envelope change and updates downstream fixtures that
construct those notifications directly.
- Extends the existing web-search and image-generation integration flows
to assert the new lifecycle timestamps on the native item events.
## Verification
- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-tui -p codex-exec
-p codex-app-server-client`
- `cargo test -p codex-core --test all web_search_item_is_emitted`
- `cargo test -p codex-core --test all
image_generation_call_event_is_emitted`
- `cargo test -p codex-app-server-protocol`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20514).
* #18748
* #18747
* #17090
* #17089
* __->__ #20514
## Summary
Moves the WebRTC realtime sideband websocket join out of the voice start
critical path. Call creation still posts the SDP offer and session
config synchronously so the client gets the SDP answer, but the sideband
websocket now connects in the input task async and doesn't block
conversation state installation.
This lets the normal realtime input channels buffer text, handoff
output, and audio while the WebRTC sideband websocket is connecting. If
the sideband join fails while the conversation is still active, the task
sends a RealtimeEvent::Error through the existing events_tx / fanout
path.
To rephrase this:
* No longer blocked on sideband: the client can receive the SDP answer
earlier, set up the WebRTC peer connection, and let the media leg
progress while the sideband websocket joins.
* Still blocked on sideband: queued text, handoff output, and sideband
server events cannot flow until connect_webrtc_sideband(...).await
finishes and then run_realtime_input_task(...) starts
## Validation
- `env CODEX_SKIP_VENDORED_BWRAP=1 cargo test --manifest-path
codex-rs/Cargo.toml -p codex-core --test all
conversation_webrtc_start_posts_generated_session`
`CODEX_SKIP_VENDORED_BWRAP=1` is needed in this local environment
because `libcap.pc` is not installed for the vendored bubblewrap build.
## Testing
I tested this locally by running `cargo run -p codex-cli --bin codex --
--enable realtime_conversation` and invoking `/realtime`. Then, we get
logs emitted in `~/.codex/log/codex-tui.log`.
### Before the Change
Logging commit
(c0299e6edf)
```
2026-05-04T16:06:09.251956Z INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: starting realtime conversation
2026-05-04T16:06:09.251980Z INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: creating realtime call transport="webrtc"
2026-05-04T16:06:10.365722Z INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: realtime call created; sdp answer ready transport="webrtc" call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy elapsed_ms=1113 total_elapsed_ms=1113
2026-05-04T16:06:10.365843Z INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: connecting realtime sideband websocket call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy
2026-05-04T16:06:10.784528Z INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: connected realtime sideband websocket call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy elapsed_ms=418 total_elapsed_ms=1532
2026-05-04T16:06:10.784665Z INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: realtime conversation started
```
### After the Change
Logging commit
(c8b00ac21a)
```
2026-05-04T15:41:24.080363Z INFO ... codex_core::realtime_conversation: starting realtime conversation
2026-05-04T15:41:24.080434Z INFO ... codex_core::realtime_conversation: creating realtime call transport="webrtc"
2026-05-04T15:41:25.106906Z INFO ... codex_core::realtime_conversation: realtime call created; sdp answer ready transport="webrtc" call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy elapsed_ms=1026 total_elapsed_ms=1026
2026-05-04T15:41:25.107067Z INFO ... codex_core::realtime_conversation: spawned realtime sideband connection task transport="webrtc" total_elapsed_ms=1026
2026-05-04T15:41:25.107160Z INFO ... codex_core::realtime_conversation: realtime conversation started
2026-05-04T15:41:25.107185Z INFO codex_core::realtime_conversation: connecting realtime sideband websocket call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy
2026-05-04T15:41:25.107352Z INFO ... codex_core::realtime_conversation: sent realtime sdp answer to client
2026-05-04T15:41:26.076685Z INFO codex_core::realtime_conversation: connected realtime sideband websocket call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy elapsed_ms=969 total_elapsed_ms=1996
2026-05-04T15:41:26.573893Z INFO codex_core::realtime_conversation: realtime session updated realtime_session_id=sess_u0_Dbpi8nhak5eLjQZ73yhAy
2026-05-04T15:41:26.573970Z INFO codex_core::realtime_conversation: received realtime conversation event event=SessionUpdated { ... }
```
### Conclusion
Here we see that we saved about a half a second in conversation startup
(1532ms -> 969ms). This also checks out with my sanity tests; I was
seeing at most a second of saving.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Fixes#11678 by removing the Windows-specific
`PASTE_BURST_CHAR_INTERVAL` override. Windows now uses the same `8ms`
paste-burst character interval as macOS and Linux, which removes the
extra per-character hold that made fast typing and key repeat feel
delayed on Windows.
The paste-burst heuristic itself is unchanged, and the Windows-specific
active idle timeout remains in place. This PR only restores the shared
character-to-character burst threshold that decides whether adjacent
plain character events are part of a paste.
## Motivation
PR #9348 raised the Windows character interval from `8ms` to `30ms` to
protect the multiline paste behavior tracked in #2137, where pasted
newlines could be interpreted as submits in Windows terminals. That
fixed the paste failure, but it also made ordinary typing visibly laggy
because the TUI waits briefly before flushing a single typed character
while it checks whether a paste burst is forming.
The deployed behavior here is to remove that Windows-only delay and
return to the cross-platform threshold. Manual Windows validation of the
critical VS Code integrated terminal path shows multiline paste still
works with the final `8ms` value, including testing on VS Code
`1.107.0`.
## Testing
- `cargo test -p codex-tui`
- Manual Windows validation in VS Code integrated PowerShell with the
final `8ms` interval
## Why
Bazel CI was not actually exercising some sharded Rust integration-test
targets on macOS. The `rules_rust` sharding wrapper expects a symlink
runfiles tree, but this repo runs Bazel with `--noenable_runfiles`. In
that configuration the wrapper could fail to find the generated test
binary, produce an empty test list, and exit successfully. That made
targets such as `//codex-rs/core:core-all-test` look green even when
Cargo CI could still catch failures in the same Rust tests.
The coverage gap appears to have been introduced by
[#18082](https://github.com/openai/codex/pull/18082), which enabled
rules_rust native sharding on `//codex-rs/core:core-all-test` and the
other large Rust test labels. The manifest-runfiles setup itself
predates that change in
[#10098](https://github.com/openai/codex/pull/10098), but #18082 is
where the affected integration tests started running through the
incompatible rules_rust sharding wrapper.
[#18913](https://github.com/openai/codex/pull/18913) fixed the same
class of issue for wrapped unit-test shards, but integration-test shards
were still going through the rules_rust wrapper until this PR.
We still do not have the V8/code-mode pieces stable under the Bazel CI
cross-compile setup, so this keeps those tests out of Bazel while
restoring coverage for the rest of the sharded Rust integration suites.
Cargo CI remains responsible for V8/code-mode coverage for now.
This change did uncover a real failing core test on `main`:
`approved_folder_write_request_permissions_unblocks_later_apply_patch`.
That fix is split into
[#21060](https://github.com/openai/codex/pull/21060), which enables the
`apply_patch` tool in the test, teaches the aggregate core test binary
to dispatch the sandboxed filesystem helper, canonicalizes the macOS
temp patch target, and isolates the core test harness from managed
local/enterprise config. Keeping that fix separate lets this PR stay
focused on restoring Bazel coverage while documenting the first failure
it exposed.
## What changed
- Build sharded Rust integration tests as manual `*-bin` binaries and
run them through the existing manifest-aware `workspace_root_test`
launcher.
- Keep Bazel sharding on the launcher target so Rust test cases are
still distributed by stable test-name hashing.
- Configure Bazel CI to skip Rust tests whose names contain
`suite::code_mode::`.
- Exclude the standalone `codex-rs/code-mode` and `codex-rs/v8-poc`
unit-test targets from `bazel.yml`.
## Verification
- `bazel query --output=build //codex-rs/core:core-all-test` now shows
`workspace_root_test` wrapping `//codex-rs/core:core-all-test-bin`.
- `bazel test --test_output=all --nocache_test_results
--test_sharding_strategy=disabled //codex-rs/core:core-all-test
--test_filter=suite::request_permissions_tool::approved_folder_write_request_permissions_unblocks_later_apply_patch`
runs the actual Rust test body and passes.
- `bazel test --test_output=errors --nocache_test_results
--test_env=CODEX_BAZEL_TEST_SKIP_FILTERS=suite::code_mode::
//codex-rs/core:core-all-test` runs the sharded target with code-mode
skipped and passes overall locally, with one flaky attempt retried by
the existing `flaky = True` setting.
## Why
Fixes#21046.
Codex TUI 0.128.0 can show Backspace/Delete-related editor shortcuts in
`/keymap`, but Windows-style modified Backspace/Delete events were still
dropped by the composer because the default editor keymap did not
include those modified special-key variants. On Windows/CMD this meant
`Shift+Backspace` and `Shift+Delete` did not fall through to normal
character deletion, and `Ctrl+Backspace` / `Ctrl+Delete` did not perform
the word deletion users expect from Windows text inputs.
## What Changed
- Added default editor bindings for `shift-backspace` and `shift-delete`
so shifted delete keys keep normal grapheme deletion behavior.
- Added default editor bindings for `ctrl-backspace`,
`ctrl-shift-backspace`, `ctrl-delete`, and `ctrl-shift-delete` so
Windows-style word deletion works when terminals preserve those
modifiers.
- Added regression coverage for the resolved default keymap and textarea
behavior.
## How to Test
1. Start Codex in the TUI on Windows CMD or another terminal that
reports modified Backspace/Delete keys distinctly.
2. Type `hello world` in the composer.
3. Press `Ctrl+Backspace`; confirm `world` is removed and `hello `
remains.
4. Type `world` again, move the cursor before it, then press
`Ctrl+Delete`; confirm the next word is removed.
5. Type a few characters and press `Shift+Backspace` and `Shift+Delete`;
confirm they delete one character in the expected direction instead of
doing nothing.
6. Open `/keymap`, inspect the Editor deletion actions, and confirm the
modified Backspace/Delete aliases are visible as configurable defaults.
Targeted tests:
- `cargo test -p codex-tui keymap::tests`
- `cargo test -p codex-tui bottom_pane::textarea::tests`
- `cargo test -p codex-tui keymap_setup::tests`
## Why
The Bazel test coverage change exposed
`approved_folder_write_request_permissions_unblocks_later_apply_patch`,
and `rust-ci-full.yml` showed the same test failing on `main` on macOS.
There were two separate classes of problems here.
### Clean CI failure
The test emits an `apply_patch` tool call, but its config did not enable
the `apply_patch` tool, so the mocked response completed without an
`apply-patch-call` output. After enabling the tool, the same path also
needs the aggregate `codex-core` test binary to dispatch
`--codex-run-as-fs-helper`; sandboxed `apply_patch` uses that helper
under macOS Seatbelt.
The test now also canonicalizes the temporary patch target before
building the patch payload so the path matches normalized grants on
macOS, where `/var` paths often normalize to `/private/var`.
### Local/enterprise config isolation
The core test harness now builds its default test config with managed
config disabled, so host-managed enterprise config cannot alter these
tests. The request-permissions turns in this test also explicitly use
the user reviewer path, keeping the assertions focused on
`request_permissions` behavior rather than reviewer defaults from the
host.
## What Changed
- Enable `apply_patch` in
`approved_folder_write_request_permissions_unblocks_later_apply_patch`.
- Teach the core integration test binary to dispatch
`CODEX_FS_HELPER_ARG1`, matching the existing apply-patch and
linux-sandbox dispatch paths.
- Canonicalize the tempdir-backed patch target before creating the
patch.
- Ignore managed config in default core test configs and explicitly pin
this test to `ApprovalsReviewer::User`.
## Verification
Run outside the Codex app sandbox because these macOS tests
intentionally spawn Seatbelt:
- `cargo test -p codex-core
approved_folder_write_request_permissions_unblocks_later_apply_patch`
- `cargo test -p codex-core
approved_folder_write_request_permissions_unblocks_later_exec_without_sandbox_args`
## Why
Feedback reports do not currently surface a direct pointer to the last
model call, so investigations may require searching through many
requests in a session to find the bad response. Preserve the last
model-side IDs at response-stream time so immediate feedback reports
carry that breadcrumb.
## What changed
- Record `last_model_request_id` when a Responses stream exposes an
upstream request ID.
- Record `last_model_response_id` when the model response completes.
- Add unit coverage for the emitted feedback tags.
## Verification
- `cargo test -p codex-core
client::tests::response_stream_records_last_model_feedback_ids`
## Why
MCP servers can provide `instructions` that explain what their tools are
for. Directly exposed MCP namespaces already use those instructions when
a connector description is not available, but deferred `tool_search`
results did not preserve that fallback. The direct path falls back from
connector metadata to server instructions, while the deferred path only
carried `connector_description` and otherwise fell back to generic
namespace text.
That meant a plain MCP server could provide useful model-facing guidance
and still appear as `Tools in the X namespace.` whenever it was
discovered lazily through `tool_search`.
## What changed
- Store one model-facing `namespace_description` on `ToolInfo`, using
connector descriptions for connector-backed tools and server
instructions for plain MCP servers.
- Thread that namespace description through the `tool_search` source
list, search indexing, and returned namespace metadata.
- Add an end-to-end regression test for deferred non-app MCP search
results exposing server instructions as the namespace description.
## Verification
- `cargo test -p codex-tools
search_tool_description_lists_each_mcp_source_once --lib`
- `cargo test -p codex-core --test all
tool_search_uses_non_app_mcp_server_instructions_as_namespace_description`