## Why
`multi_agents_v2` is meant to be independently gated from the older
`collab` feature. The tool registry still treated the
collaboration-style agent tools as `collab`-only, so enabling
`multi_agents_v2` without `collab` omitted the v2 agent tools. Review
and guardian sub-sessions also need to keep agent spawning disabled even
when the outer session has `multi_agents_v2` enabled.
## What changed
- Include the collab-backed agent tools when either `multi_agents_v2` or
`collab` is enabled.
- Explicitly disable `multi_agents_v2` for review and guardian review
sub-sessions, matching the existing `spawn_csv` and `collab`
restrictions.
- Add a registry test that enables `multi_agents_v2`, disables `collab`,
and verifies the v2 agent tools are present while legacy `send_input`
and `resume_agent` remain hidden.
## Testing
- Added
`test_build_specs_multi_agent_v2_does_not_require_collab_feature`.
## Why
Fixes#20145.
`config/value/write` treats a JSON `null` value as a request to clear
the config key. Clearing a key that is already absent should be
idempotent, but clearing a nested key such as `features.personality`
from an empty `config.toml` returned `configPathNotFound` because
`clear_path` treated the missing `features` parent table as an error.
That makes app-server reset flows brittle because clients have to read
first and avoid sending a clear request unless the parent path already
exists.
## What Changed
- Updated app-server config clearing so missing intermediate tables, or
non-table parents, are treated as an unchanged no-op.
- Removed the now-unreachable `MergeError::PathNotFound` path from
config write merging.
- Added a regression test covering `features.personality = null` against
an empty user config.
## Verification
- `cargo test -p codex-app-server clear_missing_nested_config_is_noop`
- `cargo test -p codex-app-server` was run; the config manager unit
suite passed, but one unrelated integration test failed because
`turn_start_emits_thread_scoped_warning_notification_for_trimmed_skills`
expected `7` trimmed skills and observed `8`.
- `just fix -p codex-app-server`
1. Adds v2 plugin/share/save, plugin/share/list, and plugin/share/delete
RPCs.
2. Implements save by archiving a local plugin root, enforcing a size
limit, uploading through the workspace upload flow, and supporting
updates via remotePluginId.
3. Lists created workspace plugins
4. Deletes a previously uploaded/shared plugin.
## Why
#20271 increased the `90`-minute timeout in `rust-release.yml`, but it
did not update the reusable Windows workflow in
`rust-release-windows.yml`. As a result, the Windows release compile
jobs were still capped at `60` minutes and the `windows-x64` primary
build could continue timing out.
We are keeping the existing `90`-minute timeout in `rust-release.yml`.
That increase was still directionally correct because the top-level
release build benefits from extra headroom; the mistake was assuming it
also covered the reusable Windows jobs.
## What Changed
- increase the reusable Windows release workflow timeouts in
`rust-release-windows.yml` from `60` minutes to `90` minutes
- update the comment in `rust-release.yml` so it no longer implies that
the top-level timeout covers the Windows reusable jobs
## Why
After `hooks/list` exposes the hook inventory, clients need a way to
persist user hook preferences, make those changes effective in
already-open sessions, and distinguish user-controllable hooks from
managed requirements without adding another bespoke app-server write
API.
## What
- Extends `hooks/list` entries with effective `enabled` state.
- Persists user-level hook state under `hooks.state.<hook-id>` so the
model can grow beyond a single boolean over time.
- Uses the existing `config/batchWrite` path for hook state updates
instead of introducing a dedicated hook write RPC.
- Refreshes live session hook engines after config writes so
already-open threads observe updated enablement without a restart.
## Stack
1. openai/codex#19705
2. openai/codex#19778
3. This PR - openai/codex#19840
4. openai/codex#19882
## Reviewer Notes
The generated schema files account for much of the raw diff. The core
behavior is in:
- `hooks/src/config_rules.rs`, which resolves per-hook user state from
the config layer stack.
- `hooks/src/engine/discovery.rs`, which projects effective enablement
into `hooks/list` from source-derived managedness.
- `config/src/hook_config.rs`, which defines the new `hooks.state`
representation.
- `core/src/session/mod.rs`, which rebuilds live hook state after user
config reloads.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- [x] Move the allowlist out of core crate
- [x] Add Teams, SharePoint, Outlook Email, and Outlook Calendar to the
tool_suggest discoverable plugin allowlist
- [x] Add focused coverage for Microsoft curated plugin discovery
## Testing
- just fmt
- cargo test -p codex-core-plugins
- cargo test -p codex-core
list_tool_suggest_discoverable_plugins_returns_
## Why
Reused Guardian review trunks can still have older child-turn events
queued when a later review starts. The review waiter currently accepts
the first terminal event it sees from the shared child session, so a
stale `TurnComplete` can be attributed to the new review. That produces
impossible analytics combinations such as non-null TTFT with sub-10 ms
completion latency and zero token deltas on `trunk_reused` reviews.
## What changed
- Preserve the child turn id returned by the Guardian review
`Op::UserTurn` submission.
- Restrict Guardian review waiting to events correlated with that
submitted child turn.
- Restrict timeout/abort draining to terminal events for the same child
turn.
- Add regression coverage for stale prior-turn completions, stale
prior-turn errors, and interrupt draining in
`codex-rs/core/src/guardian/review_session.rs`.
## Verification
- `cargo test -p codex-core guardian::review_session::tests::`
- `cargo clippy -p codex-core --tests -- -D warnings`
## Why
Fixes#20264.
Side conversations are an ephemeral layer on top of the main chat.
Pressing `Ctrl+D` from an empty side-chat composer should unwind back to
the parent thread, matching the existing side-return behavior, instead
of falling through to the global quit shortcut and exiting Codex.
## What changed
The side-return shortcut matcher now treats `Ctrl+D` the same way it
already treats `Esc` and `Ctrl+C`. Because app-level side-return
handling runs before the chat widget's global quit handling, this
returns from `/side` while preserving normal `Ctrl+D` quit behavior
outside side conversations.
The existing shortcut coverage was updated to include lowercase and
uppercase `Ctrl+D` key events.
## Verification
- `cargo test -p codex-tui
side_return_shortcuts_match_esc_ctrl_c_and_ctrl_d`
- `cargo test -p codex-tui` starts successfully and the new shortcut
test passes, but the broader suite later aborts in the unrelated
existing test
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`
with a stack overflow.
Summary:
- Return from external agent import before session history import
finishes
- Run session import work in the background and emit the existing
completion notification when it is done
- Serialize session imports so duplicate requests do not create
duplicate imported threads
Verification:
- cargo test -p codex-app-server external_agent_config_
- cargo test -p codex-external-agent-sessions
- just fix -p codex-app-server
- just fix -p codex-external-agent-sessions
- git diff --check
## Summary
- Support Claude Code `ai-title` / `aiTitle` records when detecting and
importing external agent sessions.
- Preserve existing `custom-title` / `customTitle` precedence; only fall
back to `aiTitle` when no custom title is present.
- Add coverage for both detection and import title selection, including
the custom-title-over-ai-title case.
## Testing
- `cargo test -p codex-external-agent-sessions`
- `just fix -p codex-external-agent-sessions`
## Why
We need a way to list the available hooks to expose via the TUI and App
so users can view and manage their hooks
## What
- Adds `hooks/list` for one or more `cwd` values that returns discovered
hook metadata
## Stack
1. openai/codex#19705
2. This PR - openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882
## Review Notes
The generated schema files account for most of the raw diff, these files
have the core change:
- `hooks/src/engine/discovery.rs` builds the inventory entries during
hook discovery while leaving runtime handlers focused on execution.
- `app-server/src/codex_message_processor.rs` wires `hooks/list` into
the app-server flow for each requested `cwd`.
- `app-server-protocol/src/protocol/v2.rs` defines the new v2
request/response payloads exposed on the wire.
### Core Changes
`core/src/plugins/manager.rs` adds `plugins_for_layer_stack(...)` so
`skills/list` and `hooks/list`can resolve plugin state for each
requested `cwd`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
update the local login success page to match the Codex desktop auth UX
use theme-aware colors and an inline 20px Codex mark
keep the actual localhost success page aligned with the browser auth UX
PR
## Tests
<img width="1728" height="1117" alt="Screenshot 2026-04-29 at 12 00
34 PM"
src="https://github.com/user-attachments/assets/76a40c3f-07c3-452c-97da-e7c43717cd2c"
/>
## Summary
Enforce FileSystemSandboxPolicy protected metadata names in the Linux
bubblewrap adapter so `.git`, `.agents`, and `.codex` remain read only
inside writable workspace roots unless the policy grants an explicit
write carveout.
## Scope
1. Translate protected metadata names from FileSystemSandboxPolicy into
bubblewrap masks for existing metadata paths.
2. Represent missing protected metadata paths as guarded mount targets
so agents cannot create `.git`, `.agents`, or `.codex` under writable
roots.
3. Preserve normal git discovery for existing repos, worktrees, and
parent repos.
4. Keep explicit user write grants working when policy allows a
protected metadata path directly.
## Not in scope
1. No shell preflight UX.
2. No TUI runtime profile propagation.
3. No macOS Seatbelt changes in this PR.
## Reviewer focus
1. This should be reviewed as the Linux enforcement adapter for the
policy primitive from PR 19846.
2. macOS enforcement already landed in PR 19847.
3. The important invariant is that `FileSystemSandboxPolicy` is the
source of truth for `.git`, `.agents`, and `.codex`.
## Validation
1. `git diff` whitespace check passed.
2. `cargo fmt` check passed with the existing stable rustfmt warning
about `imports_granularity`.
3. Full Linux sandbox Cargo test suite passed on the devbox.
4. Devbox forty six case suite passed at head
`012accb703c13bd28df5b40079a9bf183036336a`.
5. Devbox summary: pass 46, fail 0.
6. The devbox suite was run through `just c sandbox linux`.
7. Focused repo test for Viyat parent repo case passed on the devbox.
## Summary
- remove the Windows-specific unified-exec environment block from tool
selection
- keep `unified_exec` default-off on Windows unless the feature is
explicitly enabled
- normalize model-provided `shell_type = unified_exec` to
`shell_command` when the feature is disabled
- drop obsolete tests tied to the removed environment gate and keep the
feature-flag regression coverage
## Why
Now that the session/long-lived process backend is implemented for the
Windows sandbox, we don't need to hard disable it anymore. We will be
rolling out slowly using a feature gate.
## Impact
This allows manual Windows opt-in in CLI and app-backed flows while
preserving the existing default-off behavior for Windows users.
---------
Co-authored-by: canvrno-oai <kbond@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Summary:
- Add a checked-in codex-core public API listing generated by
cargo-public-api.
- Add scripts/regen-public-api.sh with an embedded crate list,
auto-install for cargo-public-api 0.51.0, pinned nightly, and --check
mode.
- Add Rust CI jobs on the codex Linux x64 runner pool to verify the
listing stays up to date.
Testing:
- bash -n scripts/regen-public-api.sh
- just regen-public-api --check
- yq '.' .github/workflows/rust-ci.yml
.github/workflows/rust-ci-full.yml
- git diff --check
## Summary
Persisted subagent parent/child topology currently leaks through
`StateRuntime`'s SQLite-specific thread-spawn helpers. This PR
introduces a narrow `AgentGraphStore` boundary so follow-up work can
route graph operations through a local or remote store without coupling
orchestration code directly to the state DB graph API.
## Changes
- Adds the new `codex-agent-graph-store` crate.
- Defines a flat `AgentGraphStore` trait for the v1 graph surface:
upsert edge, set edge status, list direct children, and list
descendants.
- Adds public graph types for `ThreadSpawnEdgeStatus`,
`AgentGraphStoreError`, and `AgentGraphStoreResult`.
- Implements `LocalAgentGraphStore` on top of an existing
`codex_state::StateRuntime`, preserving today's SQLite-backed
`thread_spawn_edges` behavior.
- Registers the crate in Cargo/Bazel metadata.
This PR only adds the local contract and implementation; call-site
migration and the remote gRPC store are left to the follow-up PRs in the
stack.
## Testing
- `cargo test -p codex-agent-graph-store`
The new unit tests cover local parity with the existing `StateRuntime`
graph methods, `Open`/`Closed` filtering, status updates, and stable
breadth-first descendant ordering.
Plugin MCP servers are loaded from plugin manifests rather than
top-level `[mcp_servers]`, so their tool approval preferences need to be
stored and applied through the owning plugin config. Without this,
choosing "Always allow" for a plugin MCP tool could write a preference
that was not reliably used on later tool calls.
## Summary
- Add plugin-scoped MCP policy config under
`plugins.<plugin>.mcp_servers`, including server enablement, tool
allow/deny lists, server defaults, and per-tool approval modes.
- Overlay plugin MCP policy onto manifest-provided server configs when
plugins are loaded.
- Route persistent "Always allow" writes for plugin MCP tools back to
the owning `plugins.<plugin>.mcp_servers.<server>.tools.<tool>` config
entry.
- Reload user config after persisting an approval and make the plugin
load cache config-aware so stale plugin MCP policy is not reused after
`config.toml` changes.
- Regenerate the config schema and add coverage for plugin MCP policy
loading, approval lookup, persistence, and stale-cache prevention.
## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-core --lib plugin_mcp`
## Why
`x-codex-turn-metadata` is sent as an HTTP/WebSocket header, but Codex
was serializing the metadata JSON with raw UTF-8 string contents. When a
workspace path contains non-ASCII characters, common HTTP stacks can
reject or corrupt that header before the request reaches the provider.
Fixes#17468. Also addresses the duplicate WebSocket report in #19581.
## What changed
- Added `codex_utils_string::to_ascii_json_string`, a shared helper that
serializes JSON normally while escaping non-ASCII string content as
`\uXXXX`.
- Switched turn metadata header serialization, including merged
Responses API client metadata, to use the ASCII-safe JSON helper.
- Added coverage for non-ASCII workspace paths and non-ASCII client
metadata while preserving the same parsed JSON values.
## Verification
- `cargo test -p codex-utils-string`
- `cargo test -p codex-core turn_metadata`
- `just bazel-lock-check`
## Why
We have run into two avoidable problems when introducing async trait
APIs in Rust:
- `#[async_trait]` has caused materially worse build times in this
repository.
- `#[allow(async_fn_in_trait)]` makes it too easy to ship a public trait
without spelling out whether the returned future is `Send`, which hides
an important part of the trait contract.
We already have a good example of the preferred alternative in
[#16630](https://github.com/openai/codex/pull/16630) /
[`3c7f013f9735`](https://github.com/openai/codex/commit/3c7f013f9735),
but that guidance currently lives only as prior art in the codebase.
This PR documents the rule in `AGENTS.md` so contributors are more
likely to follow the native RPITIT pattern before these two shortcuts
spread further.
## What Changed
- added Rust guidance in `AGENTS.md` discouraging both `#[async_trait]`
and `#[allow(async_fn_in_trait)]`
- pointed contributors to the native RPITIT pattern with explicit `Send`
bounds on the returned future
- clarified that implementations may still use `async fn` when they
satisfy that trait contract
## Verification
- docs-only change; no tests run
Summary
- Add `[features.apps_mcp_path_override]` config with a `path` field for
overriding only the built-in apps MCP path.
- Keep existing host/base URL derivation unchanged and append the
configured path after that base.
- Regenerate the config schema with the custom feature-config case.
Test Plan
- Not run for latest revision; only `just fmt` and `just
write-config-schema` were run.
- Earlier revision: `cargo test -p codex-features`
- Earlier revision: `cargo test -p codex-mcp`
## Summary
- Keep the preferred ChatGPT login callback port `1455` first.
- Preserve the existing `/cancel` recovery for stale Codex login
servers.
- Fall back to the registered localhost callback port `1457` when `1455`
remains unavailable.
## Why
Cursor and Codex Desktop both use the ChatGPT account login callback
server. On Windows, Cursor can already be listening on `127.0.0.1:1455`
/ `[::1]:1455`, causing Codex Desktop sign-in to fail with:
`Local callback port 1455 is already in use on this machine.`
Codex already attempted to cancel a stale Codex login server on that
port, but if the listener does not release the port, the old behavior
was to fail. The new behavior falls back to `1457`, which matches the
fixed redirect URI being registered server-side in
`openai/openai#863817`. This keeps the OAuth `redirect_uri` inside
Hydra's exact allow-list instead of choosing an arbitrary ephemeral
port.
## Validation
- `just fmt`
- `cargo test -p codex-login`
- `git diff --check HEAD~1..HEAD`
## Why
The precursor PR keeps successful client responses typed until
app-server's outgoing response seam. This follow-up uses that seam to
move successful client-response analytics out of individual handlers and
into the shared sender path, while keeping filtering decisions inside
`codex-analytics`.
## What changed
- Emit successful client-response analytics centrally from
`OutgoingMessageSender::send_response`.
- Remove duplicate handler-local response tracking for the current
thread/turn lifecycle responses.
- Keep analytics ingestion selective inside `AnalyticsEventsClient`, so
unrelated client traffic is ignored before cloning or boxing.
- Collapse client-response analytics facts onto one typed path and
normalize payloads in the reducer.
- Add direct client-filter coverage plus sender-level coverage for the
centralized forwarding path.
## Verification
- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests --lib`
## Summary
- Fetch remote plugin detail before sending the uninstall request.
- Use the detail response to derive the marketplace namespace and plugin
name for cache cleanup.
- Stop the uninstall before the backend POST if detail lookup fails, so
backend state and local cache state do not diverge.
## Testing
- `just fmt`
- `cargo test -p codex-app-server plugin_uninstall`
- `cargo test -p codex-core-plugins`
- `git diff --check`
## Why
`pr17088` adds typed server-originated request/response plumbing, but
successful client responses are still erased into bare JSON-RPC `result`
values before app-server can make any typed decision about them.
This precursor PR keeps successful client responses typed until the
outgoing response seam. It is intentionally limited to
protocol/app-server plumbing so the analytics behavior change can review
separately on top.
## What changed
- Add `ClientResponsePayload` as the pre-serialization client response
body type.
- Route app-server successful response paths through the typed payload
seam while preserving existing handler-local analytics behavior.
- Keep `InterruptConversation` JSON-RPC-only because it has no
`ClientResponse` variant.
- Move the new payload conversion tests into a dedicated protocol test
module.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server-protocol`
## Why
[#17088](https://github.com/openai/codex/pull/17088) changed
`OutgoingMessageSender::new` to require an `AnalyticsEventsClient`, but
one `command_exec` test added earlier on `main` still called the old
one-argument constructor. That leaves current `main` failing to compile
in Bazel and argument-comment-lint jobs.
## What changed
- Pass `AnalyticsEventsClient::disabled()` to the missed
`OutgoingMessageSender::new` test call site in `command_exec.rs`.
## Verification
- `cargo test -p codex-app-server
timeout_or_cancellation_reports_cancellation_without_timeout_exit_code`
## Summary
- Tighten `tool_suggest` guidance so it prefers explicit plugin install
requests, while still allowing a connector install when the relevant
plugin is already installed and a needed connector from that plugin is
missing.
- Tell the model not to call `tool_suggest` in parallel with other
tools.
## Testing
- `cargo test -p codex-tools tool_suggest`
- `cargo test -p codex-core tool_suggest`
## Why
Codex analytics needs a typed seam for app-server-originated
request/response traffic so future tool-approval analytics can consume
those facts without adding bespoke callsite tracking each time. Server
responses arrive as JSON-RPC `id + result` payloads, so analytics has to
reconstruct the matching typed response from the original typed request
while that request context still exists in app-server.
This also puts analytics on the app-server outbound path, which needs to
avoid keeping the runtime alive during shutdown. The final ownership fix
keeps the normal strong auth-manager retention in analytics and makes
the external-auth refresh bridge hold a weak back-reference to
`OutgoingMessageSender`, breaking the runtime cycle at the bridge
boundary instead of exposing retention policy through the analytics
client API.
## What changed
- Adds typed `ServerRequest` and `ServerResponse` analytics facts, plus
`AnalyticsEventsClient::track_server_request` and
`track_server_response`.
- Renames the existing client-side facts to `ClientRequest` and
`ClientResponse` so reducers can distinguish client-to-server traffic
from server-to-client traffic.
- Adds `ServerRequest::response_from_result`, allowing a stored typed
request to decode the matching typed server response from a raw JSON-RPC
result payload.
- Threads `AnalyticsEventsClient` through `OutgoingMessageSender` and
records targeted server requests, replayed targeted requests, and
matching targeted responses with the responding connection id needed for
correlation.
- Intentionally leaves broadcast server requests/responses out of
analytics for now because the current model is per connection, while
broadcasts fan one logical request out across multiple connections.
- Breaks the app-server shutdown cycle by storing
`Weak<OutgoingMessageSender>` in `ExternalAuthRefreshBridge` and
upgrading it only when an external-auth refresh is actually requested.
- Keeps reducer ingestion of the new server-side facts as no-ops for
now; this PR is plumbing for later tool-approval analytics work.
## Verification
- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests::`
- Covers typed-response reconstruction plus the targeted, replayed,
broadcast-exclusion, and response-attribution analytics paths.
## Follow-up
This PR intentionally stops at ingestion plumbing, so `ServerRequest`
and `ServerResponse` facts are still reducer no-ops. Once a follow-up PR
adds real downstream analytics output for those facts:
- replace the temporary pre-reducer observation seam with reducer tests
for the emitted event shape;
- add end-to-end coverage in `app-server/tests/suite/v2/analytics.rs`
for the real app-server workflow and captured analytics payload;
- remove the temporary sender-level observer tests added here in favor
of the real-output coverage above.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17088).
* #18748
* #18747
* #17090
* #17089
* #20241
* #20239
* __->__ #17088
## Why
This bug is exposed by Guardian/auto-review approvals. With the managed
network proxy enabled, a blocked network request can be reported back
through the network approval service as an approval denial after the
command has already started. Before this change, the shell and unified
exec runtimes registered those network approval calls, but did not have
a way to observe an async proxy denial as a cancellation/failure signal
for the running process.
The result was confusing: Guardian/auto-review could correctly deny
network access, but the command path could keep running or unregister
the approval without surfacing the denial as the command failure.
## What Changed
- `NetworkApprovalService` now attaches a cancellation token to active
and deferred network approvals.
- Proxy-denial outcomes are recorded only for active registrations,
cancel the owning token, and are consumed when the approval is
finalized.
- The shell runtime combines the normal command timeout with the
network-denial cancellation token.
- Unified exec stores the deferred network approval object, terminates
tracked processes when the proxy denial arrives, and returns the denial
as a process failure while polling or completing the process.
- Tool orchestration passes the active network approval cancellation
token into the sandbox attempt and preserves deferred approval errors
instead of silently unregistering them.
- App-server `command/exec` now handles the combined
timeout-or-cancellation expiration variant used by the runtime.
## Verification
- `cargo test -p codex-core network_approval --lib`
- `cargo clippy -p codex-app-server --all-targets -- -D warnings`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
---------
Co-authored-by: Codex <noreply@openai.com>
- Fetches and caches remote /installed plugin state
- Lets skills/list load skills from remote-installed cached plugins
without requiring a local marketplace entry
- Routes plugin list/startup/install/uninstall changes through async
plugin cache invalidation and MCP refresh
## Summary
- include the live auto-review trunk rollout when `/feedback` uploads
logs
- upload that attachment as
`auto-review-rollout-<parent-thread-id>.jsonl` so it is distinguishable
from the parent rollout
- show the same auto-review attachment name in the TUI consent popup
## Scope
- this only covers the live cached auto-review trunk for the current
parent thread
- it does not add durable historical parent->auto-review lookup
- it does not add persisted rollout support for ephemeral parallel
review forks
## UI
<img width="599" height="185" alt="Screenshot 2026-04-28 at 1 17 18 PM"
src="https://github.com/user-attachments/assets/6a0e79c2-5d21-4702-8a89-f765778bc9e9"
/>
## Validation
- `cargo test -p codex-core
cached_guardian_subagent_exposes_its_rollout_path`
- `cargo test -p codex-feedback`
- `cargo test -p codex-app-server`
- `cargo test -p codex-tui feedback_upload_consent_popup_snapshot`
- `cargo test -p codex-tui
feedback_good_result_consent_popup_includes_connectivity_diagnostics_filename`
## Known unrelated local failures
- `cargo test -p codex-core` currently fails in the pre-existing proxy
env snapshot test
`tools::runtimes::tests::maybe_wrap_shell_lc_with_snapshot_keeps_user_proxy_env_when_proxy_inactive`
- `cargo test -p codex-tui` currently hits pre-existing `status::*`
snapshot drift unrelated to this change
## Follow-Up
- persist parallel auto-review fork sessions so /feedback can include
their rollout history too
- attach each persisted fork as its own clearly named file, for example
auto-review-rollout-<parent-thread-id>-fork <n>.jsonl, instead of
merging multiple Guardian sessions into one attachment
- keep the same live-session-only scope initially; durable historical
parent -> auto-review lookup can remain a separate decision if we later
need feedback from resumed sessions
## Summary
- add a regression test for
`InterAgentCommunication::to_response_input_item`
- assert replayed inter-agent messages keep `phase:
Some(MessagePhase::Commentary)`
## Test plan
- `cargo test -p codex-protocol`
- `just argument-comment-lint`
Summary:
- Add codex-thread-manager-sample, a one-shot binary that starts a
ThreadManager thread, submits a prompt, and prints the final assistant
output.
- Pass ThreadStore into ThreadManager::new and expose
thread_store_from_config for existing callsites.
- Build the sample Config directly with only --model and prompt inputs.
Verification:
- just fmt
- cargo check -p codex-thread-manager-sample -p codex-app-server -p
codex-mcp-server
- git diff --check
Tests: Not run per request.
### Summary
- Path-based local thread reads currently return rollout/session git
metadata directly, so `thread/resume` can disagree with persisted SQLite
metadata for the same thread.
- Merge non-null SQLite git fields over rollout-path reads while keeping
rollout values as fallbacks for fields SQLite does not know.
- Add focused regression coverage for rollout-path reads so persisted
branch updates are preserved during resume.
### Testing
- `cargo test -p codex-thread-store`
## Why
This is part 3 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.
With `AppCommand` now explicit, the internal app event bus can carry TUI
commands directly instead of bouncing through core `Op` values.
## What changed
- Changed `AppEvent::CodexOp` and `AppEvent::SubmitThreadOp` to carry
`AppCommand`.
- Updated app-event senders and direct emitters to submit `AppCommand`
values.
- Adjusted tests to match `AppCommand` or convert back through
`into_core()` where they intentionally assert legacy payload equality.
## Verification
- `cargo test -p codex-tui --no-run`
## Why
This is part 2 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.
Before the TUI event bus can stop carrying core `Op` values,
`AppCommand` needs to be an owned TUI command shape rather than a thin
wrapper around `Op`.
## What changed
- Replaced the opaque `AppCommand(Op)` wrapper with explicit owned
variants for the commands the TUI submits.
- Preserved `into_core()` so this layer does not yet change the
app/thread submission boundary.
- Kept existing core leaf types for now so this remains a mechanical
command-shape refactor.
## Verification
- `cargo check -p codex-tui`
In some setups the summary or raw content can be dropped between
requests. This triggers a check in the reducer which expects that the
messages should remain identical between requests.
This PR relaxes the checks to only focus on the encrypted ID instead. It
also changes the reducer to keep the most rich version of the message
observed during the rollout (this ensures that we don't accidentally
lose the CoT nor summary when available).
## Summary
Some improvements to Windows process-management issues from
https://github.com/openai/codex/pull/15578
- bound the elevated runner pipe-connect handshake instead of waiting
forever on blocking pipe connects
- terminate the spawned runner if that handshake fails, so timeout/error
paths do not leave a stray `codex-command-runner.exe`
- loop on partial `WriteFile` results when forwarding stdin in the
elevated runner, so input is not silently truncated
- fix the concrete HANDLE/SID cleanup paths in the runner setup code
- keep draining driver-backed stdout/stderr after exit until the backend
closes, instead of dropping the tail after a fixed 200ms grace period
- reuse `LocalSid` for SID ownership and add more explanatory comments
around the ownership/concurrency-sensitive code paths
## Why
The original PR fixed a lot of Windows session plumbing, but there were
still a few sharp process-lifecycle edges:
- some elevated runner handshakes could block forever
- the new timeout path could still orphan the spawned runner process
- stdin forwarding still assumed a single `WriteFile` consumed the whole
buffer
- a few raw HANDLE/SID error paths still leaked
- driver-backed output could still lose the last chunk of stdout/stderr
on slower backends
## Validation
- `cargo fmt -p codex-windows-sandbox -p codex-utils-pty`
- `cargo test -p codex-utils-pty`
- `cargo test -p codex-windows-sandbox finish_driver_spawn`
- `cargo test -p codex-windows-sandbox runner_`
Ran a local test matrix of unified-exec and shell_tool tests, all
passing
## Why
This is part 1 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.
This first layer reduces the size of the later `chatwidget` diff by
mechanically moving MCP startup bookkeeping out of the central widget
file without changing the event shapes or behavior.
## What changed
- Extracted MCP startup status handling into
`tui/src/chatwidget/mcp_startup.rs`.
- Kept the existing core event types in place for this purely mechanical
move.
- Updated the MCP startup tests to import the moved test-only event
types directly.
## Verification
- `cargo test -p codex-tui chatwidget::tests::mcp_startup`
## Why
The paused goal statusline currently points users at `/goal` to unpause
a goal, but bare `/goal` is the summary command and does not change the
goal state. Instead of making `/goal` mutate state only when a goal is
paused, this gives the action an explicit command that reads naturally
in the UI.
## What Changed
- Replace `/goal unpause` with `/goal resume` for reactivating a paused
goal.
- Update the paused goal statusline and `/goal` summary copy to point at
`/goal resume`.
## Why
`agents.max_depth` is a legacy multi-agent v1 guard. Multi-agent v2 uses
task-path routing and its own session/thread limits, so v2 should not
reject nested `spawn_agent` calls just because the thread-spawn depth
has reached the v1 maximum.
Keeping the v1 depth guard active in v2 prevents deeper task trees even
though the v2 path still needs the depth value only for lineage and
task-path metadata.
## What Changed
- Removed the depth-limit rejection from the multi-agent v2
`spawn_agent` handler while still computing child depth for lineage/path
metadata.
- Made the depth-based disabling of legacy `SpawnCsv`/`Collab` tools
apply only when `Feature::MultiAgentV2` is disabled.
- Added `multi_agent_v2_spawn_agent_ignores_configured_max_depth` to
cover a v2 child spawning another agent when `agent_max_depth = 1`,
while the existing v1 depth-limit tests continue to enforce the legacy
behavior.
## Verification
- `cargo test -p codex-core
multi_agent_v2_spawn_agent_ignores_configured_max_depth -- --nocapture`
- `cargo test -p codex-core depth_limit -- --nocapture`
- `cargo test -p codex-core tools::handlers::multi_agents::tests --
--nocapture`
## Summary
Fix the Windows sandbox PTY spawn path to pass the pseudoconsole handle
value directly into `UpdateProcThreadAttribute`.
## Why
Sandboxed `unified_exec` PTY sessions on Windows were failing during
child process startup with `0xc0000142` (`STATUS_DLL_INIT_FAILED`). In
practice this showed up as PowerShell DLL init popups when the sandboxed
background-terminal path tried to launch an interactive shell.
The root cause was that we were passing a pointer to a local `isize`
variable instead of the pseudoconsole handle value in the form Windows
expects for `PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE`.
## Validation
- `cargo build -p codex-windows-sandbox --bins`
- Reproduced the real sandboxed `codex exec` flow with
`windows.sandbox_private_desktop=true`
- Verified a `tty=true` interactive session launched through the normal
PowerShell wrapper, printed `READY`, accepted follow-up stdin, and
exited cleanly
- Confirmed no new `0xc0000142` / `Application Popup` events appeared
after the successful repro
## Summary
- Rewrite migrated external-agent hook commands by replacing the full
hook script path token instead of only the `.claude/hooks/` segment.
- Preserve quoting around the full rewritten target path so script names
with spaces, absolute paths, and shell operators/redirection continue to
work.
- Apply `.claude/settings.local.json` over `.claude/settings.json` for
config, MCP, and plugin migration so local scope matches Claude settings
precedence.
- Skip legacy command markdown without `description` frontmatter,
including README-style docs under `.claude/commands`.
## Root Cause
The previous hook rewrite handled `.claude/hooks/` as a substring
replacement. For absolute source commands, that left the original
project-root prefix before the newly quoted `.codex/hooks` directory,
producing invalid commands like
`project/'project/.codex/hooks'/script.sh`.
The migration also only used project `settings.json` for
config/MCP/plugin decisions, so local settings such as
`disabledMcpjsonServers` could be ignored even though Claude gives local
settings higher precedence than project settings.
## Validation
- `just fmt`
- `cargo test -p codex-external-agent-migration`
- `cargo test -p codex-app-server external_agent_config`
- `just fix -p codex-external-agent-migration`
- `just fix -p codex-app-server`
- `git diff --check`
## Why
The explicit profile path from #20117 is meant for standalone testing,
but it still inherited the
shell cwd and all managed requirements implicitly. The pre-existing
launcher path even called out
that it did not support a separate cwd yet in
[`debug_sandbox.rs`](509453f688/codex-rs/cli/src/debug_sandbox.rs (L174-L179)).
For a standalone command, the useful default is to let the caller choose
the project directory being
tested and to avoid administrator-provided constraints unless the caller
explicitly wants to test
those too.
## What changed
- Add explicit-profile-only `-C/--cd DIR`, and use that cwd for both
profile resolution and command
execution.
- Add explicit-profile-only `--include-managed-config`.
- Make explicit profile mode skip managed requirement sources by
default, including cloud
requirements, MDM requirements, `/etc/codex/requirements.toml`, and the
legacy managed-config
requirements projection.
- Preserve all existing invocations outside the explicit-profile path.
## Stack
1. #20117 `sandbox-ui-profile`
2. #20118 `sandbox-ui-config` --> this PR
Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.
## Tests ran
- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_`
- `cargo test -p codex-core
load_config_layers_can_ignore_managed_requirements`
- `cargo test -p codex-core
load_config_layers_includes_cloud_requirements`
- macOS branch-binary smoke on the rebased top of stack: `-C` changed
execution cwd, explicit
profile mode omitted managed proxy env under `env -i`, and
`--include-managed-config` restored it.
- Linux devbox branch-binary smoke on the rebased top of stack: `-C`
changed execution cwd for
built-in and user-defined explicit profiles.
Messages sent with `followup_task` already arrive at their target
recipient promptly (at message boundaries while sampling, or after the
pending tool call completes) -- having `interrupt` is not worth the
added complexity.
## Why
`codex sandbox` is useful for exercising sandbox behavior directly, but
before this stack the CLI
only picked up permission profiles indirectly from the active config.
The existing debug-sandbox path
already compiled `[permissions]` profiles through normal config loading,
as covered by the existing
profile tests in
[`debug_sandbox.rs`](de2ccf9473/codex-rs/cli/src/debug_sandbox.rs (L715-L760)).
This adds the smallest stable entry point first: an explicit profile
selector that reuses the same
config machinery as normal Codex config, so standalone testing becomes
possible without changing
current no-selector behavior.
## What changed
- Add additive `--permissions-profile NAME` support to `codex sandbox
macos|linux|windows`.
- Resolve built-in and user-defined profile names by feeding
`default_permissions` through the
existing config compilation path instead of inventing a sandbox-only
parser.
- Make an explicit selector win over an ambient active profile's legacy
`sandbox_mode`.
- Keep the existing no-selector behavior unchanged.
## Stack
1. #20117 `sandbox-ui-profile` --> this PR
2. #20118 `sandbox-ui-config`
Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.
## Tests ran
- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_parses_permissions_profile`
- `cargo test -p codex-core
cli_override_takes_precedence_over_profile_sandbox_mode`
- macOS branch-binary smoke on the rebased top of stack: built-in
`:workspace` and user-defined
profiles both executed successfully through `--permissions-profile`.
- Linux devbox branch-binary smoke on the rebased top of stack: built-in
`:workspace` and
user-defined profiles both executed successfully through
`--permissions-profile`.
## Summary
Starts the process of getting rid of `--full-auto`, with some
concessions:
1. Fully removes the command from the tui, since it just resolves to the
default permissions there, and encourages users to use the one-time
trust flow if they're not in a trusted repo.
2. Marks the command as deprecated in `codex exec`, in case users are
actively relying on this. We'll remove in an upcoming n+X release.
3. Cleans up some of the `codex sandbox` cli logic, to keep supporting
legacy sandbox policies for now.
This isn't the cleanest setup, but I think it is worthwhile to warn
users for one release before hard-removing it.
## Testing
- [x] Updated unit tests
## Summary
- Change `EnvironmentProvider` to return concrete `Environment`
instances instead of `EnvironmentConfigurations`.
- Make `DefaultEnvironmentProvider` provide the provider-visible `local`
environment plus optional `remote` environment from
`CODEX_EXEC_SERVER_URL`.
- Keep `EnvironmentManager` as the concrete cache while exposing its own
explicit local environment for `local_environment()` fallback paths.
## Validation
- `just fmt`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`PermissionProfile` is the canonical runtime permission model in the
Rust workspace, but the Linux sandbox helper still accepted a legacy
`SandboxPolicy` plus separate filesystem and network policy flags. That
translation layer made the helper interface harder to reason about and
left `linux-sandbox`-specific callers and tests coupled to the legacy
policy representation.
This change moves the helper onto `PermissionProfile` directly so the
Linux sandbox plumbing matches the rest of the permission stack.
## What changed
- changed `codex-linux-sandbox` to accept `--permission-profile` and
derive the runtime filesystem and network policies internally
- updated the in-process seccomp and legacy Landlock path in
`codex-rs/linux-sandbox` to operate on `PermissionProfile`
- updated Linux sandbox argv construction in `codex-rs/sandboxing`,
`codex-rs/core`, and the CLI debug sandbox path to pass the canonical
profile instead of serializing compatibility policy projections
- simplified the Linux sandbox tests to build the exact permission
profile under test, including the managed-proxy path and
direct-runtime-enforcement carveout coverage
- removed helper-local `SandboxPolicy` usage from `bwrap` tests where
`FileSystemSandboxPolicy` is already the value being exercised
## Testing
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-linux-sandbox` (on this macOS host, the crate
compiled cleanly and its Linux-only tests were cfg-gated)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-cli --no-run`
## Summary
Amazon Bedrock Mantle's OpenAI-compatible endpoint now lives under
`/openai/v1`, and the GPT-5.4 Mantle model ID no longer uses the `-cmb`
suffix. This updates Codex's built-in Bedrock provider configuration so
generated providers and the static Bedrock catalog use the current
endpoint and model ID.
## Changes
- Update the Bedrock Mantle base URL from
`https://bedrock-mantle.{region}.api.aws/v1` to
`https://bedrock-mantle.{region}.api.aws/openai/v1`.
- Update the Amazon Bedrock default base URL in
`codex-model-provider-info`.
- Change the Bedrock GPT-5.4 catalog slug from `openai.gpt-5.4-cmb` to
`openai.gpt-5.4`.
- Align provider and catalog tests with the new URL and model ID.
## Test Plan
- Manual smoke test:
```shell
target/debug/codex \
-m openai.gpt-5.4 \
-c 'model_provider="amazon-bedrock"' \
-c 'model_providers.amazon-bedrock.aws.region="us-west-2"'
```
follow up of #19442. The app server now exposes provider-derived bounds
through a new v2 `modelProvider/read` method. The response reports the
configured provider map key as `modelProvider` and returns the effective
capability booleans so clients can align their UI with the same
provider-owned limits used by core.
Fixes test that often fails locally when running `cargo test`
- Add an app-server test helper that combines managed-config isolation
with custom env overrides.
- Isolate `HOME` / `USERPROFILE` in plugin-list workspace settings tests
so host home marketplaces do not affect results.
Fix for #19925
Restore the `Working` indicator after a streamed final answer finishes
when a user steer message is sent.
Add regression coverage for long output plus a mid-stream steer:
`cargo test -p codex-tui
final_answer_completion_restores_status_indicator_for_pending_steer`
Duplication/testing steps:
1. Start a new thread and ask for a long response.
2. While the response is streaming, submit a steer message.
3. When the first response finishes, observe whether `Working...` is
shown while waiting for the steer message response.
## Summary
This fixes the CI regression introduced by
[#20040](https://github.com/openai/codex/pull/20040).
That PR migrated several `apply_patch_cli` tests from direct
`codex.submit(Op::UserTurn { ... })` calls to `harness.submit(...)`.
`harness.submit()` waits for `TurnComplete` before returning, which
drains the same event stream that these tests use to assert `TurnDiff`,
`PatchApplyUpdated`, and related live events. The regressed tests then
timed out waiting for events that had already been consumed.
This change restores a no-wait submit path for the event-observing
`apply_patch_cli` tests so they can watch the turn stream directly
again.
## What Changed
- added a local `submit_without_wait(...)` helper in
`codex-rs/core/tests/suite/apply_patch_cli.rs`
- switched the `apply_patch_cli` tests that assert live turn events back
to that helper
- left the profile-backed `harness.submit(...)` migration in place for
tests that only care about final filesystem or tool output state
## Why macOS Looked Green
In the failing run
[25084487331](https://github.com/openai/codex/actions/runs/25084487331),
`//codex-rs/core:core-all-test` was cached on macOS, so the regressed
tests were not rerun there. The Linux GNU, Linux MUSL, and Windows Bazel
jobs reran the target and exposed the failure.
## Verification
- `cargo test -p codex-core apply_patch_ -- --nocapture`
- previously failing local cases now pass again:
- `apply_patch_cli_move_without_content_change_has_no_turn_diff`
- `apply_patch_turn_diff_for_rename_with_content_change`
- `apply_patch_aggregates_diff_across_multiple_tool_calls`
## Why
Unsupported features must fail closed and Codex must not expose
OpenAI-hosted fallback paths when the active provider cannot support
them. In practice, Bedrock should not surface app connectors, MCP
servers, tool search/suggestions, image generation, web search, or JS
REPL until those paths are explicitly supported for that provider.
This PR moves that decision into provider-owned capability metadata
instead of scattering Bedrock-specific checks across callers.
## What changed
- Adds `ProviderCapabilities` to `codex-model-provider`, with default
support for existing providers and a Bedrock override that disables
unsupported launch surfaces.
- Adds `ToolCapabilityBounds` to `codex-tools` so provider capability
limits can clamp otherwise-enabled tool config.
- Applies capability bounds when building session and review-thread tool
config.
- Routes MCP/app connector configuration through
`McpManager::mcp_config`, which filters configured MCP servers and app
connectors based on the active provider.
- Updates app-server MCP list/read paths to use the filtered MCP config.
- Adds coverage for default provider capabilities, Bedrock disabled
capabilities, and optional tool-surface clamping.
## Testing
built locally and verified that bedrock responses api now return without
errors calling unsupported tools.
## Why
This PR expands the migration path so Codex can detect and import MCP
server config, hooks, commands, and subagents configs in a Codex-native
shape.
## What changed
- Added a `codex-external-agent-migration` crate that owns conversion
logic for external-agent MCP servers, hooks, commands, and subagents.
- Extended the app-server external-agent config detection/import API
with migration item types for MCP server config, hooks, commands, and
subagents.
## Migration strategy
The migration is intentionally conservative: Codex only imports
external-agent config that can be represented safely in Codex today.
Unsupported or ambiguous config is skipped instead of being partially
translated into behavior that may not match the source system.
- **MCP servers**: import supported stdio and HTTP MCP server
definitions into `mcp_servers`. Disabled servers and servers filtered
out by source `enabledMcpjsonServers` / `disabledMcpjsonServers` are
skipped. Project-scoped MCP entries from `.claude.json` are included
when they match the repo path.
- **Hooks**: import only supported command hooks into
`.codex/hooks.json`. Unsupported hook features such as conditional
groups, async handlers, prompt/http hooks, or unknown fields are
skipped. Referenced hook scripts are copied into `.codex/hooks/`,
preserving any existing target scripts.
- **Commands**: import supported external commands as Codex skills under
`.agents/skills/source-command-*`. Commands that rely on source runtime
expansion such as `$ARGUMENTS`, `$1`, `@file` references, shell
interpolation, or colliding generated names are skipped.
- **Subagents**: import valid subagent Markdown files into
`.codex/agents/*.toml` when they have the minimum Codex agent fields.
Source model names are not migrated, so imported agents keep the user’s
Codex default model; compatible reasoning effort and sandbox mode are
migrated when present.
- **Skills and project guidance**: copy missing skill directories into
`.agents/skills` and migrate `CLAUDE.md` guidance into `AGENTS.md`,
rewriting source-agent terminology to Codex terminology where
appropriate.
- **Detection details**: detected migration items include lightweight
details for UI preview, such as MCP server names, hook event names,
generated command skill names, and subagent names. Import still
recomputes from disk instead of trusting details as the source of truth.
- Adds focused coverage for the new migration behavior and app-server
import flow.
## Verification
- `cargo test -p codex-external-agent-migration`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server external_agent_config`
- `just bazel-lock-check`
## Summary
- Add `disable_tool_suggest` to app and plugin config, schema, and
TypeScript output
- Exclude disabled connectors and plugins from tool suggestion discovery
- Persist "never show again" tool-suggestion choices back into
`config.toml`
- Update config docs and add coverage for connector and plugin
suppression
## Testing
- Added and updated unit tests for config persistence and tool-suggest
filtering
- Not run (not requested)
## Summary
- Removes `SandboxPolicy` from the hooks test suite.
- Submits hook-related turns with explicit `PermissionProfile` values
for disabled, read-only, and workspace-write cases.
- Preserves the managed-network hook test by configuring and submitting
a workspace-write profile with enabled network, allowing the existing
requirements-backed proxy path to remain covered.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Removes `SandboxPolicy` from the RMCP client test suite.
- Adds shared read-only user-turn helpers that submit
`PermissionProfile::read_only()` plus the legacy compatibility
projection required by the current `Op::UserTurn` shape.
- Keeps sandbox metadata assertions intact by deriving the expected
legacy `sandboxPolicy` value from the same read-only profile used for
the turn.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Removes the remaining `SandboxPolicy` usage from the compaction test
suite.
- Adds a small local helper for direct `Op::UserTurn` construction so
these tests send `PermissionProfile::Disabled` plus the legacy
compatibility projection required by the protocol field.
- Keeps the existing danger/full-access behavior while exercising the
canonical permission profile path.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Updates the zsh-fork test helper to configure `PermissionProfile`
directly instead of constructing a legacy `SandboxPolicy`.
- Sends permission-profile-backed turns from the skill approval zsh-fork
tests so the runtime and request path exercise the canonical permissions
model.
- Leaves the broader approvals suite on legacy policies for now, except
for the zsh-fork test that shares this helper.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
This migrates the macOS request-permissions tool tests from legacy
`SandboxPolicy` setup to `PermissionProfile` setup. The tests still
exercise the same workspace-write baseline and request-permission
grants, but the canonical permissions value is now the profile.
## Changes
- Replaces the `workspace_write_excluding_tmp()` helper with a
`PermissionProfile::workspace_write_with()` helper.
- Applies test config through `Permissions::set_permission_profile()`.
- Uses `turn_permission_fields()` for `Op::UserTurn` compatibility
fields.
- Removes the `SandboxPolicy` import from `request_permissions_tool.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This removes the explicit `SandboxPolicy` constructors from
`core/tests/suite/prompt_caching.rs`. The tests still exercise the same
prompt-cache invariants across permission and turn-context changes, but
the permission source is now `PermissionProfile`.
## Changes
- Uses `PermissionProfile::workspace_write_with()` for workspace-write
override scenarios.
- Uses `PermissionProfile::Disabled` for the no-sandbox per-turn
override.
- Projects profiles through `turn_permission_fields()` or
`to_legacy_sandbox_policy()` only to populate compatibility fields on
existing ops.
- Removes the `SandboxPolicy` import from `prompt_caching.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This migrates `core/tests/suite/exec_policy.rs` away from legacy
`SandboxPolicy` turn construction. These tests all use no-sandbox turns
to exercise exec-policy behavior, so `PermissionProfile::Disabled` is
the canonical representation.
## Changes
- Replaces direct `SandboxPolicy::DangerFullAccess` turn fields with
`PermissionProfile::Disabled`.
- Uses `turn_permission_fields()` to populate the compatibility
`sandbox_policy` field required by `Op::UserTurn`.
- Removes the `SandboxPolicy` import from `exec_policy.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This removes another test-only `SandboxPolicy` dependency by configuring
`permissions_messages.rs` with a `PermissionProfile` directly. The test
still verifies the rendered compatibility permissions text, but now
obtains the legacy projection from the loaded `Config` rather than using
`SandboxPolicy` as the source of truth.
## Changes
- Builds the workspace-write test setup with
`PermissionProfile::workspace_write_with()`.
- Applies that profile through `Permissions::set_permission_profile()`.
- Uses `Config::legacy_sandbox_policy()` only for the expected
`PermissionsInstructions` compatibility rendering.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This continues the test-side migration away from `SandboxPolicy` by
removing the remaining legacy policy setup in
`core/tests/suite/tools.rs`. The affected test was already modeling a
profile-backed filesystem policy with a deny-read glob, so configuring
the test through `Permissions::set_permission_profile()` is a better
match for the behavior being exercised.
## Changes
- Drops the `SandboxPolicy` import from `core/tests/suite/tools.rs`.
- Configures the glob deny-read shell test directly with a
`PermissionProfile` instead of creating a legacy read-only policy first.
- Submits the test turn with the session permission profile so the
deny-read glob remains active for the command under test.
## Verification
- `cargo check -p codex-core --tests`
## Why
The core item tests still had a cluster of plan-mode `Op::UserTurn`
literals that used `SandboxPolicy::DangerFullAccess` and omitted
`permission_profile`. These tests are validating emitted item lifecycle
events, so keeping them on the legacy sandbox-only turn shape adds noise
to the broader permissions migration without testing legacy behavior.
## What Changed
- Adds a local `disabled_plan_turn()` helper that preserves the existing
`std::env::current_dir()` turn cwd behavior.
- Uses `turn_permission_fields(PermissionProfile::Disabled, cwd)` to
populate both the compatibility `sandbox_policy` and canonical
`permission_profile` fields.
- Replaces the plan-mode hand-built turns in
`codex-rs/core/tests/suite/items.rs`, removing all `SandboxPolicy`
references from that file and reducing remaining `codex-rs/core/tests`
`SandboxPolicy` files from 16 to 15.
## Verification
- `cargo check -p codex-core --tests`
## Why
This stack is retiring direct `SandboxPolicy` construction from tests so
core coverage exercises the same `PermissionProfile` turn path used by
runtime code. `safety_check_downgrade.rs` still submitted each test turn
as `SandboxPolicy::DangerFullAccess` with no permission profile, even
though the tests are about model verification/reroute behavior rather
than legacy sandbox conversion.
## What Changed
- Adds a local `disabled_text_turn()` helper that derives both the
compatibility `sandbox_policy` and canonical `permission_profile` from
`PermissionProfile::Disabled`.
- Replaces repeated hand-built `Op::UserTurn` literals in
`codex-rs/core/tests/suite/safety_check_downgrade.rs` with that helper.
- Removes all `SandboxPolicy` references from the safety-check suite,
reducing the remaining `codex-rs/core/tests` files that mention
`SandboxPolicy` from 17 to 16.
## Verification
- `cargo check -p codex-core --tests`
## Why
This stack is removing direct `SandboxPolicy` usage from test code so
new tests exercise the same `PermissionProfile` path that runtime code
now treats as canonical. `view_image.rs` still built `Op::UserTurn`
requests with `SandboxPolicy::DangerFullAccess` and no permission
profile, which kept another core test module on the legacy turn shape.
## What Changed
- Adds a small `disabled_user_turn()` helper for the view-image suite
that derives the compatibility `sandbox_policy` and canonical
`permission_profile` from `PermissionProfile::Disabled`.
- Replaces repeated direct `Op::UserTurn` literals in
`codex-rs/core/tests/suite/view_image.rs` with that helper.
- Removes all `SandboxPolicy` references from `view_image.rs`, reducing
the remaining `codex-rs/core/tests` files that mention `SandboxPolicy`
from 18 to 17.
## Verification
- `cargo check -p codex-core --tests`
## Summary
- Migrates `model_switching.rs` and `personality.rs` direct
`Op::UserTurn` construction from legacy `SandboxPolicy` literals to
`PermissionProfile`-backed turn fields.
- Adds small local helpers in each file so tests keep asserting
model/personality behavior without repeating permission plumbing.
- Reduces `rg -l '\bSandboxPolicy\b' codex-rs/core/tests` from 20 files
to 18; `codex-rs/tui` remains at zero `SandboxPolicy` references.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
# Why
`plugin_hook_sources_run_with_plugin_env_and_plugin_source` can still
fail on Windows after the earlier file-based assertion cleanup because
the hook process itself occasionally exceeds the old 5s timeout under CI
load. When that happens, the hook run ends as `Failed` before the test
can inspect its structured output.
The Windows Bazel failure showed the hook run itself failing after
nearly 8 seconds:
```text
---- engine::tests::plugin_hook_sources_run_with_plugin_env_and_plugin_source stdout ----
thread 'engine::tests::plugin_hook_sources_run_with_plugin_env_and_plugin_source' panicked at hooks/src\engine\mod_tests.rs:428:5:
assertion failed: `(left == right)`
Diff < left / right > :
<Failed
>Completed
...
test result: FAILED. 78 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.96s
```
# What
- raise the flaky plugin hook env test timeout from 5s to 10s so it
matches the other executed hook tests in this module
# Validation
- `cargo test -p codex-hooks`
## Summary
- Replace legacy sandbox config setup in delegate and telemetry tests
with direct `PermissionProfile` configuration.
- Move no-sandbox and read-only test turns in `tools.rs`,
`code_mode.rs`, `user_shell_cmd.rs`, and `model_visible_layout.rs` from
legacy `SandboxPolicy` values to `PermissionProfile` helpers, while
leaving the deny-glob read-only compatibility case for a later targeted
cleanup.
- Use `PermissionProfile::read_only()` where tests need managed
read-only behavior and `PermissionProfile::Disabled` where they
intentionally need no sandbox.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 27
files after #20013 to 22 files.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Migrate another batch of direct `Op::UserTurn` test construction from
legacy `SandboxPolicy` values to `PermissionProfile` inputs via
`turn_permission_fields()`.
- Replace a one-off read-only `SandboxPolicy` bridge in the macOS exec
test with `PermissionProfile::read_only()`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 32
files at the start of the cleanup stack to 27 files.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p codex-core`
## Summary
- Add `turn_permission_fields()` so tests that construct `Op::UserTurn`
directly can provide a canonical `PermissionProfile` while still filling
the required legacy `sandbox_policy` compatibility field.
- Migrate direct user-turn construction in core integration tests from
`SandboxPolicy::DangerFullAccess` to `PermissionProfile::Disabled`.
- Continue reducing direct `SandboxPolicy` usage in
`codex-rs/core/tests`, from 41 files after #20010 to 32 files in this
PR.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
## Why
The `codex-issue-digest` skill was producing more detail than the daily
digest needed, and broad all-area digests could miss active issues. In
particular, issue #16088 had substantial recent comments and reactions
but did not appear in the weekly all-areas output because GitHub search
was using default relevance ranking and the collector could exhaust its
candidate cap before later search queries got a fair sample.
That made the digest look quieter than the underlying user activity and
made threshold tuning misleading.
## What changed
- Make the digest summary headline-first and summary-only by default.
- Add an explicit opt-in flow for `## Details`, so the issue table is
shown only when requested or when the prompt asks for details upfront.
- Update the collector to request GitHub issue search results with
`sort=updated` and `order=desc`.
- Apply the search candidate cap per query instead of globally across
all queries.
- Bump the collector script version to `3`.
- Add tests that cover updated sorting and per-query candidate limits.
## Verification
- `pytest
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `ruff check
.codex/skills/codex-issue-digest/scripts/collect_issue_digest.py
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `git diff --check`
- Reran the all-areas weekly collector and confirmed #16088 is now
included with `55` interactions.
## Why
Remote-control app-server enrollments have both an internal server id
and the environment id exposed to remote-control clients. App-server
clients need one current status snapshot that says whether remote
control is usable and which environment id, if any, is exposed.
A temporary websocket disconnect is not itself an identity change.
Account changes, stale enrollment invalidation, successful
re-enrollment, and missing ChatGPT auth are meaningful status changes.
Disabled remote control remains `disabled` regardless of auth or SQLite
state. SQLite startup failure disablement and enrollment persistence
failures are handled in #20068; this PR reports the resulting effective
status to clients.
## What changed
- Adds v2 `remoteControl/status/changed` carrying `state` and
`environmentId`.
- Adds `RemoteControlConnectionState` values: `disabled`, `connecting`,
`connected`, and `errored`.
- Exposes remote-control status updates through `RemoteControlHandle`
using a Tokio watch channel.
- Always sends the current remote-control status snapshot to newly
initialized app-server clients.
- Broadcasts status changes to initialized app-server clients when state
or environment id changes.
- Treats missing ChatGPT auth as an `errored` status while leaving it
retryable because auth can change at runtime.
- Clears `environmentId` when enrollment is cleared for account changes,
auth loss, stale backend invalidation, or disabled remote control.
- Updates app-server protocol schema fixtures, generated TypeScript,
app-server README, remote-control tests, and TUI exhaustive notification
matches.
## Stack
- Builds on #20068.
## Verification
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server transport::remote_control --lib`
- `cargo check -p codex-tui`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
Right now, if Codex winds up in a state with auth but it can't refresh
the token, the user is left with an unhelpful message that says to log
out and log back in again.
Ultimately, we should prevent that from happening but if it does,
returning None will allow the caller to redirect the user back to the
login page
## Summary
- Add `PermissionProfile`-based turn submission helpers to
`core_test_support`, while keeping the legacy `SandboxPolicy` helper for
tests that intentionally exercise legacy fallback behavior.
- Switch the default `TestCodex::submit_turn()` path to send a real
`PermissionProfile` plus the required legacy compatibility projection in
`Op::UserTurn`.
- Migrate straightforward app/search/shell/truncation tests from
`SandboxPolicy::{DangerFullAccess, ReadOnly}` to
`PermissionProfile::{Disabled, read_only}`.
- Add a TUI compatibility projection helper for legacy app-server fields
so non-legacy writable roots are preserved instead of being downgraded
to read-only.
- Fix remote start/resume/fork sandbox-mode projection to classify any
managed profile with writable roots as workspace-write, not only
profiles that can write `cwd`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 47
files to 41 files without changing production behavior.
## Testing
- `cargo check -p codex-core --tests`
- `cargo test -p codex-tui
compatibility_profile_preserves_unbridgeable_write_roots`
- `cargo test -p codex-tui
sandbox_mode_preserves_non_cwd_write_roots_for_remote_sessions`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
## Why
The proxy matches allow and deny rules against normalized host strings.
Scoped IPv6 literals can arrive in equivalent forms, such as
`fd00::1%eth0`, `[fd00::1%eth0]`, or `[fd00::1%25eth0]`. Policy should
canonicalize those spellings without erasing scope granularity: an
unscoped rule like `fd00::1` should still cover scoped requests for that
address, while a scoped rule like `fd00::1%eth0` should remain exact to
that scope.
## What changed
- preserve IPv6 scope IDs during host normalization and canonicalize
`%25scope` to `%scope`
- match policy against the exact normalized host plus the unscoped IP
base for scoped literals
- keep local-address explicit allow checks aligned with the same
scoped/unscoped semantics
- add focused coverage for scoped IPv6 normalization, scoped allow
rules, and scoped deny rules in `network-proxy`
## Security impact
A request cannot bypass a broad deny rule by adding an IPv6 scope
suffix. At the same time, scoped policy remains precise:
`deny=fd00::1%eth0` affects that scoped spelling without collapsing
`fd00::1%eth1` onto the same key, and `allow=fe80::1%eth0` does not
implicitly allow other scopes.
## Verification
- `just fmt`
- `cargo test -p codex-network-proxy`
- `just fix -p codex-network-proxy`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: evawong-oai <evawong@openai.com>
The test was flaky because it was checking the right thing in a
roundabout way.
What it wanted to prove:
- plugin hooks receive the right environment variables.
What it actually did:
1. Run a plugin hook.
2. Have that hook write those env vars into a temporary `env.json` file.
3. After the hook finished, read `env.json` back from disk.
On Windows, that last file was sometimes not there when the test tried
to read it, so the test failed with `read env log: file not found`. The
hook system itself was not what the test failure was directly proving;
the test was failing on the extra filesystem side effect it introduced.
The fix is to stop using a temp file as the proof mechanism. The hook
now prints the env values in its normal structured output, and the test
asserts on the output that the hook engine already captures. So we still
verify the same behavior, but without depending on a separate file being
created and read back correctly on Windows.
This PR adds a new feature to the `/plugins` menu that gives users the
ability to add new plugin marketplaces. It introduces an Add Marketplace
tab to the right of installed marketplaces, a source prompt, loading and
error states, and the app-server request flow needed to perform the
install. After a successful `marketplace/add`, the popup refreshes back
into the newly added marketplace tab so the new plugins are immediately
visible.
- Add an Add Marketplace tab to the `/plugins` menu
- Prompt for marketplace source input from git repo, URL, or local path
- Show loading and error states during `marketplace/add`
- Refresh plugin data after success and switch into the newly added
marketplace tab
- Add tests and snapshot updates
## Why
Plugins can bundle lifecycle hooks, but Codex previously only discovered
hooks from user, project, and managed config layers. This adds the
plugin discovery and runtime plumbing needed for plugin-bundled hooks
while keeping execution behind the `plugin_hooks` feature flag.
## What
- Discovers plugin hook sources from each plugin's default
`hooks/hooks.json`.
- Supports `plugin.json` manifest `hooks` entries as either relative
paths or inline hook objects.
- Plumbs discovered plugin hook sources through plugin loading into the
hook runtime when `plugin_hooks` is enabled.
- Marks plugin-originated hook runs as `HookSource::Plugin`.
- Injects `PLUGIN_ROOT` and `CLAUDE_PLUGIN_ROOT` into plugin hook
command environments.
- Updates generated schemas and hook source metadata for the plugin hook
source.
## Stack
1. This PR - openai/codex#19705
2. openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882
## Reviewer Notes
- Core logic is in `codex-rs/core-plugins/src/loader.rs` and
`codex-rs/hooks/src/engine/discovery.rs`
- Moved existing / adding new tests to
`codex-rs/core-plugins/src/loader_tests.rs` hence the large diff there
- Otherwise mostly plumbing and minor schema updates
### Core Changes
The `codex-rs/core` changes are limited to wiring plugin hook support
into existing core flows:
- `core/src/session/session.rs` conditionally pulls effective plugin
hook sources and plugin hook load warnings from `PluginsManager` when
`plugin_hooks` is enabled, then passes them into `HooksConfig`.
- `core/src/hook_runtime.rs` adds the `plugin` metric tag for
`HookSource::Plugin`.
- `core/config.schema.json` picks up the new `plugin_hooks` feature
flag, and `core/src/plugins/manager_tests.rs` updates fixtures for the
added plugin hook fields.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Rollout traces need an identifier that can be used to correlate a Codex
inference with upstream Responses API, proxy, and engine logs. The
reduced trace model already exposed `upstream_request_id`, but it was
being populated from the Responses API `response.id`. That value is
useful for `previous_response_id` chaining, but it is not the transport
request id that upstream systems key on.
This PR separates those concepts so trace consumers can reliably answer
both questions:
- which Responses API response did this inference produce?
- which upstream request handled it?
## Structure
The change keeps the upstream request id at the same lifecycle level as
the provider stream:
- `codex-api` captures the `x-request-id` HTTP response header when the
SSE stream is created and exposes it on `ResponseStream`. Fixture and
websocket streams set the field to `None` because they do not have that
HTTP response header.
- `codex-core` carries that stream-level id into `InferenceTraceAttempt`
when recording terminal stream outcomes. Completed, failed, cancelled,
dropped-stream, and pre-response error paths all record the id when it
is available.
- `rollout-trace` now records both identifiers in raw terminal inference
events and response payloads: `response_id` for the Responses API
`response.id`, and `upstream_request_id` for `x-request-id`.
- The reducer stores both fields on `InferenceCall`. It also uses
`response_id` for `previous_response_id` conversation linking, which
removes the old accidental dependency on the misnamed
`upstream_request_id` field.
- Terminal inference reduction now consumes the full terminal payload
(`InferenceCompleted`, `InferenceFailed`, or `InferenceCancelled`) in
one place. That keeps status, partial payloads, response ids, and
upstream request ids consistent across success, failure, cancellation,
and late stream-mapper events.
## Why This Shape
`x-request-id` is a property of the HTTP/provider response envelope, not
an SSE event. Capturing it once in `codex-api` and plumbing it through
terminal trace recording avoids trying to infer the value from stream
contents, and it preserves the id even when the stream fails or is
cancelled after only partial output.
Keeping `response_id` separate from `upstream_request_id` also makes the
reduced trace model less surprising: `response_id` remains the
conversation-continuation id, while `upstream_request_id` is the
operational correlation id for upstream debugging.
## Validation
The PR updates trace and reducer coverage for:
- reading `x-request-id` from SSE response headers;
- storing the true upstream request id on completed inference calls;
- preserving upstream request ids for cancelled and late-cancelled
inference streams;
- keeping `previous_response_id` reconstruction tied to `response_id`
rather than transport request ids.
## Why
Remote control depends on the app-server SQLite state DB for persisted
enrollment identity. If the state DB cannot be opened at startup,
continuing with remote control enabled leaves the process in a
misleading state where enrollment identity cannot be read or persisted.
Feature-disabled remote control remains disabled regardless of SQLite
state. This only changes the case where remote control is requested but
the SQLite state DB is unavailable.
## What changed
- Logs SQLite state DB initialization failures instead of dropping the
error silently.
- Treats remote control as effectively disabled when the SQLite state DB
is unavailable.
- Prevents `RemoteControlHandle::set_enabled(true)` from enabling remote
control later in the same process if the state DB was unavailable at
startup.
- Keeps the existing behavior that disabled remote control does not
validate or connect to the remote-control URL.
- Makes persisted enrollment load/update failures propagate as
remote-control errors instead of silently falling back to in-memory
state.
- Makes the direct websocket connection path fail when called without a
SQLite state DB.
- Adds coverage for startup without a state DB, later handle enablement
with no state DB, and direct websocket connection without a state DB.
## Verification
- `cargo test -p codex-app-server transport::remote_control --lib`
- `just fix -p codex-app-server`
## Why
MultiAgentV2 `wait_agent` currently clamps short waits to a fixed 10
second minimum. That default is still useful for preventing tight
polling loops, but it is too rigid for environments that need faster
mailbox wake-up checks or a larger minimum to discourage frequent
polling.
This PR makes the minimum wait timeout configurable from the existing
MultiAgentV2 feature config section, so operators can tune the behavior
without changing the legacy multi-agent tool surface.
## What Changed
- Added `features.multi_agent_v2.min_wait_timeout_ms`.
- Defaulted the new setting to the existing 10 second floor.
- Validated the configured value as `1..=3600000`, matching the existing
one hour maximum wait bound.
- Applied the configured minimum to MultiAgentV2 `wait_agent` runtime
clamping.
- Plumbed the configured minimum into the `wait_agent` tool schema,
including the effective default when the minimum is above the normal 30
second default.
- Regenerated `core/config.schema.json`.
## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib multi_agent_v2`
- `just fix -p codex-core`
## Why
The proxy checks the requested host before opening the upstream
connection, but DNS can resolve an allowed hostname to a loopback,
private, or other non-public address after that first decision. Without
a final check on the actual socket target, a request that looks
acceptable at the hostname layer can still connect to a local service
once resolution completes.
## What changed
- add a shared TCP connector check for direct proxy egress
- use that path for HTTP, `CONNECT`, SOCKS5, and MITM upstream
connections
- keep configured upstream proxy hops on the existing proxy path
- add direct-connector coverage for allowed and rejected local targets
## Security impact
Direct proxy egress now rechecks the resolved socket address before
connecting, closing the gap between hostname policy evaluation and the
final network target.
## Verification
- `cargo test -p codex-network-proxy`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Agent Identity sessions can represent Business and Enterprise ChatGPT
workspaces, but cloud requirements were skipped before fetch. That meant
workspace-managed requirements were not loaded for Agent Identity even
when the JWT carried the same account identity and plan information that
normal ChatGPT token auth exposes.
This PR now sits on top of the Agent Identity stack through
[#19764](https://github.com/openai/codex/pull/19764). Because
[#19763](https://github.com/openai/codex/pull/19763) moved task
registration into Agent Identity auth loading, cloud requirements no
longer needs a separate runtime-initialization step before building the
backend client.
## What changed
- Stop skipping `CodexAuth::AgentIdentity` in the cloud requirements
loader.
- Share the cloud requirements eligibility check between startup load
and background cache refresh.
- Rely on eagerly loaded Agent Identity auth so backend requests can
attach task-scoped `AgentAssertion` headers.
- Decode Agent Identity JWT `plan_type` as the auth-layer plan type,
then convert it through a shared `auth::PlanType` -> `account::PlanType`
mapping.
- Add the missing serde alias for the `education` plan string and add
coverage for raw Agent Identity plan aliases such as `hc` and
`education`.
## Testing
- `cargo test -p codex-agent-identity -p codex-login -p
codex-cloud-requirements -p codex-protocol`
## Why
Initialized app-server RPCs no longer need to bottleneck behind one
request processor path. Running them concurrently improves
responsiveness, but several request families still mutate shared state
or depend on ordered side effects. Those stateful families need an
auditable serialization contract so concurrency does not reorder thread,
config, auth, command, watcher, MCP, or similar state transitions.
This PR keeps that boundary explicit: stateful work is serialized by the
smallest useful key, while intentionally read-only or externally
concurrent work remains unkeyed. In particular, `thread/list` and
`thread/turns/list` explicitly have no serialization because they
primarily read append-only rollout storage and should continue to be
served concurrently.
## What changed
- Adds `ClientRequest::serialization_scope()` in `app-server-protocol`
and requires every client request definition to declare its
serialization behavior.
- Introduces keyed request scopes for thread, thread path, command exec
process, fuzzy search session, fs watch, MCP OAuth, and global state
buckets such as config, account auth, memory, and device keys.
- Routes initialized app-server RPCs through per-key FIFO serialization
while allowing unkeyed initialized requests to run concurrently.
- Cancels in-flight initialized RPC work when the connection disconnects
or the app-server exits so spawned request tasks do not outlive their
session.
- Adds focused coverage for representative keyed and unkeyed
serialization scopes, including explicitly concurrent
`thread/turns/list` behavior.
## Validation
- Added protocol tests for representative keyed serialization scopes and
intentionally unkeyed request families.
- Added app-server request serialization tests covering per-key FIFO
behavior, concurrent unkeyed execution, disconnect shutdown, and config
read-after-write ordering.
- Local focused protocol validation after the latest rebase is currently
blocked by packageproxy failing to resolve locked `rustls-webpki
0.103.13`; CI is expected to provide the full validation signal.
## Why
The log DB writer batches tracing events before inserting them into
SQLite, but `tokio::time::interval` produces an immediate first tick.
That meant the inserter could flush the first accepted log entry before
`batch_size` was reached, making
`configured_batch_size_flushes_without_explicit_flush` timing-sensitive
in CI.
## What Changed
- Consume the interval's startup tick before entering the inserter loop,
so interval flushing starts after the configured delay.
- Remove the test's startup sleep, which was masking the race instead of
proving the batch-size behavior.
## Validation
- `cargo test -p codex-state`
- `cargo test -p codex-state
configured_batch_size_flushes_without_explicit_flush` passed 3
consecutive focused runs
- PR checks passed across `rust-ci`, Bazel, `ci`, `sdk`, `cargo-deny`,
Codespell, blob-size policy, and CLA
## Why
The Linux managed-proxy bridge helpers are long-lived child processes in
the sandbox networking path. Before this change they stayed dumpable and
the network seccomp profile did not block cross-process memory syscalls,
so another same-user process could potentially inspect or modify bridge
memory instead of interacting only through the intended proxy interface.
## What changed
- reuse the shared `codex-process-hardening` helper to mark bridge
helper children non-dumpable before they begin serving
- deny `process_vm_readv` and `process_vm_writev` in the existing
network seccomp filter
## Security impact
Bridge helpers are less exposed to same-user cross-process inspection or
memory writes, which reduces the chance that sandboxed code can
interfere with proxy support processes outside the intended IPC path.
## Verification
- `cargo test -p codex-process-hardening`
- `cargo test -p codex-linux-sandbox`
- attempted `cargo check -p codex-linux-sandbox --target
x86_64-unknown-linux-gnu`; blocked on missing `x86_64-linux-gnu-gcc` on
this macOS host
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Slow Codex turns are easier to debug when token usage is visible in the
trace itself, without joining against separate analytics. This adds
token usage to existing turn-handling spans for regular user turns only.
[Example
turn](https://openai.datadoghq.com/apm/trace/9d353efa2cb5de1f4c5b93dc33c3df04?colorBy=service&graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=3555541504891512675&spanViewType=metadata&traceQuery=)
<img width="1447" height="967" alt="Screenshot 2026-04-24 at 3 03 07 PM"
src="https://github.com/user-attachments/assets/ab7bb187-e7fc-41f0-a366-6c44610b2b2c"
/>
## What Changed
Added response-level token fields on completed handle_responses spans:
gen_ai.usage.input_tokens
gen_ai.usage.cache_read.input_tokens
gen_ai.usage.output_tokens
codex.usage.reasoning_output_tokens
codex.usage.total_tokens
Added aggregate token fields on regular turn spans:
codex.turn.token_usage.*
Added an explicit regular-turn opt-in via
SessionTask::records_turn_token_usage_on_span() so this is not coupled
to span-name strings.
## Testing
- `cargo test -p codex-otel`
- `cargo test -p codex-core
turn_and_completed_response_spans_record_token_usage`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-otel`
- Manual local Electron/app-server smoke test: regular user turn emits
the new span fields
Known status: `cargo test -p codex-core` was attempted and failed in
unrelated existing areas: config approvals, request-permissions,
git-info ordering, and subagent metadata persistence.
## Why
The migration away from `SandboxPolicy` needs new configs to start from
permissions profiles instead of deriving profiles from legacy sandbox
modes. Existing users can have empty `config.toml` files, and we should
not rewrite user-owned config files that may live in shared
repositories.
This PR introduces built-in profile names so an empty config can resolve
to a canonical `PermissionProfile`, while explicit named `[permissions]`
profiles still behave predictably.
## What changed
- Adds built-in `default_permissions` profile names:
- `:read-only` maps to `PermissionProfile::read_only()`.
- `:workspace` maps to the workspace-write profile, including
project-root metadata carveouts.
- `:danger-no-sandbox` maps to `PermissionProfile::Disabled`, preserving
the distinction between no sandbox and a broad managed sandbox.
- Reserves the `:` prefix for built-in profiles so user-defined
`[permissions]` profiles cannot collide with future built-ins.
- Allows `default_permissions` to reference a built-in profile without
requiring a `[permissions]` table.
- Makes an otherwise empty config choose a built-in profile by
trust/platform context: trusted or untrusted project roots use
`:workspace` when the platform supports that sandbox, while roots
without a trust decision use `:read-only`.
- Keeps legacy `sandbox_mode` configs on the legacy path, and still
rejects user-defined `[permissions]` profiles that omit
`default_permissions` so we do not silently guess among custom profiles.
- Preserves compatibility behavior for implicit defaults: bare
`network.enabled = true` allows runtime network without starting the
managed proxy, explicit profile proxy policy still starts the proxy, and
implicit workspace/add-dir roots keep legacy metadata carveouts.
## Verification
- `cargo test -p codex-core builtin --lib`
- `cargo test -p codex-core profile_network_proxy_config`
- `cargo test -p codex-core
implicit_builtin_workspace_profile_preserves_add_dir_metadata_carveouts`
- `cargo test -p codex-core
permissions_profiles_network_enabled_allows_runtime_network_without_proxy`
- `cargo test -p codex-core
permissions_profiles_proxy_policy_starts_managed_network_proxy`
## Documentation
Public Codex config docs should mention these built-in names when the
`[permissions]` config format is ready to document as stable.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19900).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* #20008
* __->__ #19900
## Why
Managed sessions use `NO_PROXY` to keep a small set of destinations on
the direct path by default. The old default also bypassed all IPv4
link-local addresses in `169.254.0.0/16`, which includes metadata
endpoints such as `169.254.169.254`. Because `NO_PROXY` is evaluated by
the client before the request reaches the managed proxy, requests to
that range could skip proxy-side allowlist and local-binding checks
entirely. On hosts where a link-local metadata service is reachable,
that creates a path to sensitive environment metadata or credentials
outside the intended enforcement point.
## What changed
- remove the default IPv4 link-local `169.254.0.0/16` bypass from the
managed proxy environment
- keep the existing loopback and private-network defaults unchanged
- update the regression assertion to lock in the narrower default
## Security impact
Link-local requests now stay on the managed-proxy path by default, so
the proxy can apply configured policy before they reach metadata-style
endpoints or other link-local services.
## Verification
- `cargo test -p codex-network-proxy`
Co-authored-by: Codex <noreply@openai.com>
## Summary
This extends external agent detection/import beyond config artifacts so
Codex can detect recent sessions files from the external agent home and
import them into Codex rollout history.
## What changed
- Added a focused `external_agent_sessions` module for:
- session discovery
- source-record parsing
- rollout construction
- import ledger tracking
- Wired session detection/import into the app-server external agent
config API.
- Added compaction handling so large imported sessions can be resumed
safely before the first follow-up turn.
## Testing
Added coverage for:
- recent-session detection
- custom-title handling
- recency filtering
- dedupe and re-detect-after-source-change behavior
- visible imported turn construction
- backward-compatible import payload deserialization
- end-to-end RPC import flow
- rejection of undetected session paths
- repeat-import behavior
- large-session compaction before first follow-up
Ran:
- `cargo test -p codex-app-server external_agent_config_import_ --test
all`
## Summary
- exit shell mode when `Esc` is pressed while the absorbed `!` is the
only input
- add direct regression coverage plus a composer snapshot for the
restored normal prompt state
## Root cause
Shell mode stores the leading `!` outside the editable textarea. After
typing only `!`, the textarea is empty but the composer is still in bash
mode, so the existing empty-composer `Esc` handling never runs.
## Validation
- `just fmt`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::esc_exits_empty_shell_mode`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::footer_mode_snapshots`
- `cargo insta pending-snapshots`
`cargo test -p codex-tui` still reports unrelated existing `/status`
snapshot drift in this local environment because the rendered
permissions text is `workspace-write with network access` instead of the
older `read-only` fixture text.
Move local resume and fork cwd filtering to `thread/list` instead of
filtering in the TUI. This makes the `/resume` menu feel slightly faster
to load when working in repos with many historical threads, and
centralizes the cwd filtering in app-server.
**Affected:**
- /resume from inside the TUI.
- codex resume with no session ID and without --last
- codex resume --all
- codex fork with no session ID and without --last
- codex fork --all
**Not affected:**
- codex resume <id>
- codex fork <id>
- codex resume --last
- codex fork --last
Steps to test performance improvement in a real Codex environment:
- Launch `codex resume` using compiled binary in a directory that has
seen many threads.
- Launch `codex resume` using release binary in same directory.
- Observe difference in time-to-full-page as threads load.
## Summary
- suggest Plan mode when the current composer draft contains the
standalone word `plan`
- shares the Codex App heuristics for detection
- excludes things line `/plan` and the word plan in shell mode
- reuse the existing `Shift+Tab` mode cycle and add thread-scoped
dismissal with `Esc`
- replace the normal footer hint while the reminder is visible so the
statusline stays anchored
https://github.com/user-attachments/assets/01123ae8-cee6-4e95-b563-44655c071cde
## Why
The desktop app already nudges users toward Plan mode when their draft
clearly signals planning intent. The TUI had the underlying `/plan` and
`Shift+Tab` flows, but no equivalent reminder at the moment the user was
most likely to benefit from them.
## Details
The reminder is shown only when Plan mode is available, the draft
contains standalone `plan`, the user is not already in Plan mode, the
composer is actionable, and the current thread has not dismissed the
reminder. Slash-command and shell-command drafts are excluded.
The first implementation used an extra composer row, but that moved the
statusline whenever the heuristic fired. This version keeps the layout
stable by rendering the reminder in the existing footer row instead.
## Validation
- `INSTA_UPDATE=always cargo test -p codex-tui
chatwidget::tests::plan_mode::plan_mode_nudge -- --nocapture`
- `just fmt`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui`
- `cargo insta pending-snapshots`
- `git diff --check`
## Why
Network access approval prompts were showing the generic retry reason,
which made auto-review focus on the blocked connection instead of the
command that caused it. This makes network approvals easier to assess by
telling the reviewer to evaluate whether the triggering command was
authorised by the user and within policy, and to treat the network call
as acceptable when it is a reasonable consequence of that command.
## What changed
- Split guardian approval request prompt rendering so `NetworkAccess`
has a dedicated branch.
- For network requests, show `Network approval context` and `Network
access JSON` instead of `Retry reason` / `Planned action JSON`.
- Added regression coverage for the network approval prompt wording and
for omitting retry reason in this case.
## Verification
- `cargo test -p codex-core
guardian::tests::build_guardian_prompt_items_explains_network_access_review_scope`
## Why
- Without change: MCP tool call spans include request-side details such
as server, tool, call ID, connector, session, and turn.
- Issue: Some useful telemetry is only known by the MCP server after it
handles the tool call, such as target identity or whether the call
triggered a user-facing flow.
## What Changed
- With change: Codex reads allowlisted telemetry from
`_meta["codex/telemetry"]["span"]` and records it on the
`mcp.tools.call` span.
- Adds span fields for `codex.mcp.target.id` and
`codex.mcp.user_flow.triggered`, with strict type checks and bounded
target ID length.
## Verification
`codex-rs/core/src/mcp_tool_call_tests.rs`
## Why
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with `session_id` and `turn_id`.
- Issue: MCP servers may want the turn start timestamp to measure
internal latency relative to turn start.
## What Changed
- With change: turn metadata now includes `turn_started_at_unix_ms`,
which is propagated to MCP tool calls in
`_meta["x-codex-turn-metadata"]`.
## Verification
- `codex-rs/core/src/mcp_tool_call_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/src/turn_timing_tests.rs`
- `codex-rs/core/tests/responses_headers.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
## Why
Several bug reports describe thread shutdown (including subagent
threads) leaving stdio MCP server processes behind. These reports all
point at the same lifecycle gap: Codex launches stdio MCP servers, but
the session-level shutdown path does not explicitly close MCP clients or
terminate the server process tree.
Fixes#12491Fixes#12976Fixes#18881Fixes#19469
## History
This is best understood as a regression/coverage gap in MCP session
lifecycle management, not as stdio MCP cleanup being absent all along.
#10710 added process-group cleanup for stdio MCP servers, but that
cleanup only runs when the `RmcpClient`/transport is dropped. The older
reports (#12491 and #12976) came after that cleanup existed, which
suggests the remaining problem was that some higher-level shutdown paths
kept the MCP manager alive or replaced it without explicitly draining
clients. The newer reports (#18881 and #19469) exposed the same family
around manager replacement and shutdown.
## What changed
- Added an explicit stdio MCP process handle in `codex-rmcp-client` so
local MCP servers terminate their process group and executor-backed MCP
servers call the executor process terminator.
- Added `RmcpClient::shutdown()` and manager-level MCP shutdown draining
so session shutdown, channel-close fallback, MCP refresh, and connector
probing stop owned MCP clients.
- Added regression coverage that starts a stdio MCP server, begins an
in-flight blocking tool call, shuts down the client, and asserts the
server process exits.
## Verification
- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-mcp`
- `just fix -p codex-rmcp-client`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
- Manual before/after validation with a temporary repro script:
- Pre-fix binary from `HEAD^` (`fed0a8f4fa`): reproduced the leak with
surviving MCP server and child PIDs, `survivors=[77583, 77592]`,
`leaked=true`.
- Post-fix binary from this branch (`67e318148b`): verified both MCP
processes were gone after interrupting `codex exec`, `survivors=[]`,
`leaked=false`.
## Why
Fixes#19814.
The TUI's current `Worked for ...` timing behavior is a leftover from
#9599. At that point, models could emit multiple assistant messages in
one turn for preambles/commentary, but the TUI did not yet have a
reliable signal that an assistant message was the final answer when it
started streaming. To avoid showing an ever-growing elapsed time on each
preamble separator, #9599 made the separator timer incremental by
tracking elapsed time since the previous separator.
That workaround is no longer the right model for the final
completed-turn display. Since then, #16638 added protocol-native turn
timing, including `duration_ms` on turn completion. With that cumulative
duration available at the point where the TUI renders the completed-turn
separator, the UI can show the actual turn duration directly instead of
carrying per-separator timing state.
## What Changed
- Thread `duration_ms` into `ChatWidget::on_task_complete` from both
legacy `TurnCompleteEvent` handling and app-server `TurnCompleted`
notifications.
- Use `duration_ms` for the final `Worked for ...` separator, falling
back to the status indicator timer only when the protocol duration is
unavailable.
- Keep mid-turn separators before later assistant text as plain visual
dividers instead of clocked `Worked for ...` separators.
- Remove the old incremental separator timer state and helper
(`last_separator_elapsed_secs` / `worked_elapsed_from`).
- Add a snapshot regression test for a turn that runs a command and then
completes with a final answer, verifying the final separator uses the
cumulative turn duration.
## Verification
- `cargo test -p codex-tui
final_worked_for_uses_cumulative_turn_duration_snapshot`
- `just fix -p codex-tui`
Manual repro prompt:
```text
Manual timing repro. First send a short preamble/commentary sentence before using tools. Then run exactly this shell command: sleep 75; echo MANUAL_TIMING_DONE. After the command finishes, give a final answer that says "done". Do not skip the preamble.
```
After this change, the mid-turn break before the final answer should be
a plain divider, and the final completed-turn separator should show
`Worked for ...` using the cumulative turn duration.
Before:
<img width="414" height="102" alt="Screenshot 2026-04-27 at 10 09 01 PM"
src="https://github.com/user-attachments/assets/b9e2ce01-2460-40e4-a5c4-c9ba8add2557"
/>
After:
<img width="485" height="149" alt="Screenshot 2026-04-27 at 10 09 07 PM"
src="https://github.com/user-attachments/assets/d24089ae-d4e2-41b6-b966-07c98706ead4"
/>
## Summary
Make FileSystemSandboxPolicy the semantic source of truth for project
root metadata protection. Under writable roots, `.git`, `.codex`, and
`.agents` stay protected unless user policy grants an explicit write
rule for that metadata path.
## Scope
1. Add `protected_metadata_names` to `WritableRoot`.
2. Teach `FileSystemSandboxPolicy::can_write_path_with_cwd` to reject
protected metadata writes under writable roots unless explicitly
allowed.
3. Default workspace write profiles to protect `.git`, `.codex`, and
`.agents`.
4. Add the Linux fallback setup needed before Linux enforcement lands
later in the stack.
## Reviewer Focus
1. The policy decision belongs in FileSystemSandboxPolicy, not shell
command parsing.
2. Legacy SandboxPolicy remains a compatibility projection, not the
source of the new rule.
3. Explicit user write rules can still opt into these metadata paths.
## Stack
1. Policy primitive: this PR
2. macOS Seatbelt adapter: #19847
3. Shell preflight UX: #19848
4. Runtime profile propagation: #19849
5. Linux bubblewrap adapter: #19852
## Validation
1. codex protocol permissions tests
2. formatting for codex protocol and codex linux sandbox
3. diff whitespace check
## Why
The TUI currently handles keyboard shortcuts as hard-coded event matches
spread across app, composer, pager, list, approval, and navigation code.
That makes shortcuts hard to customize, makes displayed hints easy to
drift from actual behavior, and makes future keymap work riskier because
there is no central action inventory.
This PR adds the foundation for configurable, action-based keymaps
without adding the interactive remapping UI yet. Onboarding
intentionally stays on fixed startup shortcuts because users cannot
reasonably configure keymaps before completing onboarding.
This is PR1 in the keymap stack:
- PR1: #18593: configurable keymap foundation
- PR2: #18594: `/keymap` picker and guided remapping UI
- PR3: #18595: Vim composer mode and the remap option
## Design Notes
The new model resolves named actions into concrete runtime bindings once
from config, then passes those bindings to the UI surfaces that handle
input or render shortcut hints.
The main concepts are:
- **Context**: a scope where an action is active, such as `global`,
`chat`, `composer`, `editor`, `pager`, `list`, or `approval`.
- **Action**: a named operation inside a context, such as
`global.open_transcript`, `composer.submit`, or `pager.close`.
- **Binding**: one or more single-key shortcuts assigned to an action,
written as config strings such as `ctrl-t`, `alt-backspace`, or
`page-down`. Multi-step sequences such as `ctrl-x ctrl-s`, `g g`, or
leader-key flows are not part of this PR.
- **Resolution order**: context-specific config wins first, supported
global fallbacks come next, and built-in defaults fill in anything
unset.
- **Explicit unbinding**: an empty array removes an action binding in
that scope and does not fall through to a fallback binding.
- **Conflict validation**: a resolved keymap rejects duplicate active
bindings inside the same scope so one keypress cannot dispatch two
actions.
## What Changed
- Added `TuiKeymap` config support under `[tui.keymap]`, including typed
contexts/actions, key alias normalization, generated schema coverage,
and user-facing config errors.
- Added `RuntimeKeymap` resolution in `codex-rs/tui/src/keymap.rs`,
including fallback precedence, built-in defaults, explicit unbinding,
and per-context conflict validation.
- Rewired existing TUI handlers to consume resolved keymap actions
instead of directly matching hard-coded keys in each component.
- Updated key hint rendering and footer/pager/list surfaces so displayed
shortcuts follow the resolved keymap.
- Kept onboarding shortcuts fixed in
`codex-rs/tui/src/onboarding/keys.rs` instead of exposing them through
`[tui.keymap]`.
## Validation
The branch includes focused coverage for config parsing, key
normalization, runtime fallback resolution, explicit unbinding,
duplicate-key conflict validation, default keymap consistency,
onboarding startup key behavior, and UI hint snapshots affected by
resolved key bindings.
## Why
Codex enables enhanced keyboard reporting while the TUI owns the
terminal. In iTerm2, exiting the TUI with Ctrl+C can intermittently
leave the parent shell receiving raw CSI-u / `modifyOtherKeys` fragments
instead of normal key input.
Final terminal cleanup should put the parent shell back into normal
keyboard reporting even if the terminal misses the usual stack pop.
Fixes#19553.
## What Changed
- Move TUI keyboard enhancement setup and detection into
`tui/src/tui/keyboard_modes.rs`.
- Add an exit-only `restore_after_exit()` path that performs the normal
keyboard enhancement pop plus unconditional keyboard enhancement and
`modifyOtherKeys` resets.
- Keep temporary restore paths, such as external-editor handoff, using
the balanced stack pop behavior.
## Confidence
Medium. This is a speculative fix: I was not able to reproduce the
reported iTerm2 behavior manually, but the symptoms line up with
terminal keyboard reporting state surviving Codex exit. The added reset
sequences are scoped to final TUI shutdown and should be harmless when
the terminal is already clean.
## Why
Memory startup runs in the background after an eligible turn, but it can
consume Codex backend quota at exactly the wrong time: when the user is
already near a rate-limit boundary. This PR adds a guard so the memory
pipeline backs off when the Codex rate-limit snapshot says the remaining
budget is too low.
## What Changed
- Added `memories.min_rate_limit_remaining_percent` with a default of
`25`, clamped to `0..=100`, and regenerated `core/config.schema.json`.
- Added `codex-rs/memories/write/src/guard.rs`, which fetches Codex
backend rate limits before memory startup and skips phase 1 / phase 2
when the Codex limit is reached or either tracked window is above the
configured usage ceiling.
- Keeps startup best-effort: non-Codex auth or rate-limit fetch/client
failures preserve the existing memory startup behavior.
- Records a `codex.memory.startup` counter with
`status=skipped_rate_limit` when startup is skipped.
- Added config parsing/clamping coverage and guard unit tests.
## Verification
- Added `codex-rs/memories/write/src/guard_tests.rs` for threshold,
primary/secondary window, and reached-limit behavior.
- Added config tests for TOML parsing and clamping.
## Summary
AgentIdentity runtime loading currently registers tasks against a single
hardcoded AuthAPI base URL. That works for production, but local and
staging validation may need registration to target a different
authapi-login-provider without baking internal staging service URLs into
the OSS binary.
This PR adds a small config surface for
`agent_identity_authapi_base_url` and threads it through the existing
auth-loading path as a direct argument. Explicit config wins. Without
config, task registration keeps using the production AuthAPI URL,
matching the current default behavior.
## Stack
1. openai/codex#19762 - `refactor: make auth loading async` (merged)
2. openai/codex#19763 - `refactor: load agent identity runtime eagerly`
3. This PR - `fix: configure AgentIdentity AuthAPI base URL`
4. openai/codex#19764 - `feat: verify agent identity JWTs with JWKS`
## Design decisions
- Keep the existing auth-loading shape and pass the new value as an
argument. This avoids another wrapper loader and keeps the call path
readable.
- Add config instead of embedding internal staging URLs. Environments
that need a non-production AuthAPI can configure it explicitly.
- Keep the default AuthAPI registration URL as production.
`chatgpt_base_url` remains separate and is used by the follow-up JWKS
verification PR for fetching public keys from the ChatGPT backend route.
- Resolve the AuthAPI base URL inside AgentIdentity loading, because
task registration is the only consumer of this value.
## Testing
Tests: targeted Rust checks, AgentIdentity auth tests, config schema
regeneration, formatter/fix pass, and whitespace diff check.
## Why
Memory startup was tied to thread lifecycle events such as create, load,
and fork. That can run memory work before a thread receives real user
input, and it makes startup cost scale with thread management instead of
actual turns. Moving the trigger to `thread/sendInput` keeps memory
startup aligned with the first real user turn and lets it use the
current thread config at turn time.
The idea is to prevent ghost cost due to pre-warm triggered by the app
Turn-based startup can also make global phase-2 consolidation easier to
request repeatedly, so this adds a success cooldown and tightens the
default startup scan window.
## What Changed
- Start `codex_memories_write::start_memories_startup_task` after a
non-empty `thread/sendInput` turn is submitted, instead of from thread
create/load/fork paths:
d4a6885b78/codex-rs/app-server/src/codex_message_processor.rs (L6477-L6487)
- Expose `CodexThread::config()` so app-server can pass the live config
into memory startup at turn time.
- Add a six-hour successful-run cooldown for global phase-2
consolidation via `SkippedCooldown`:
d4a6885b78/codex-rs/state/src/runtime/memories.rs (L963-L966)
- Reduce memory startup defaults to at most 2 rollouts over 10 days:
d4a6885b78/codex-rs/config/src/types.rs (L31-L34)
## Verification
Updated the memory runtime coverage around phase-2 reclaim behavior,
including `phase2_global_lock_respects_success_cooldown`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Phase 2 still needs to choose the most relevant stage-1 memory outputs
by usage and recency, but exposing that ranking as the rendered
`raw_memories.md` order creates unnecessary large diff. Usage-count or
timestamp changes can reshuffle otherwise unchanged memories, making the
workspace diff noisy and giving the consolidation prompt a misleading
recency signal from file position.
This fix will reduce token consumption
## What Changed
- Keep the existing top-N Phase 2 selection ranking by `usage_count`,
`last_usage`, `source_updated_at`, and `thread_id`.
- Return the selected rows in stable ascending `thread_id` order before
syncing Phase 2 filesystem inputs.
- Update the memory README, raw memories header, and consolidation
prompt so they describe the stable order and tell the prompt to use
metadata and workspace diffs instead of file order as the recency
signal.
- Adjust the memory runtime tests to use deterministic thread IDs and
assert the stable return order separately from the ranked selection
semantics.
## Test Coverage
- Existing memory runtime tests in
`codex-rs/state/src/runtime/memories.rs` now cover the stable returned
ordering for Phase 2 inputs.
---------
Co-authored-by: Codex <noreply@openai.com>
Keep extracting memories out of core and moving the write trigger in the
app-server
This is temporary and it should move at the client level as a follow-up
This makes core fully independant from `codex-memories-write`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
MultiAgentV2 sessions need startup guidance that matches the role of the
thread that is actually being created. Root agents and subagents have
different responsibilities, and forked subagents can inherit parent
rollout history. If the parent hint is carried into the child context,
the child can see stale or conflicting developer guidance before its own
session-specific context is added.
## What changed
- Added `features.multi_agent_v2.root_agent_usage_hint_text` and
`features.multi_agent_v2.subagent_usage_hint_text` config fields,
including schema/config parsing support.
- Injected the matching root or subagent hint into the initial context
as its own developer message when `multi_agent_v2` is enabled.
- Filtered configured MultiAgentV2 usage-hint developer messages out of
forked parent history so a child thread receives fresh guidance for its
own session source/config.
- Added targeted coverage for config parsing, initial-context rendering,
feature-config deserialization, and forked-history filtering.
## Context examples
With this config:
```toml
[features.multi_agent_v2]
enabled = true
root_agent_usage_hint_text = "Root guidance."
subagent_usage_hint_text = "Subagent guidance."
```
A root thread initial context renders the root hint as a standalone
developer message:
```text
[developer]
<existing developer context, when present>
[developer]
Root guidance.
```
A subagent thread initial context renders the subagent hint instead:
```text
[developer]
<existing developer context, when present>
[developer]
Subagent guidance.
```
When a subagent forks parent history, any parent developer message whose
text exactly matches the configured MultiAgentV2 root or subagent hint
is omitted from the forked history before the child receives its fresh
subagent hint.
## Why
Addresses #9274
Running `codex update` currently starts an interactive Codex session
with `update` as the prompt. That is a rough edge for users who expect a
direct self-update command after seeing the existing update notice, and
it forces them to copy the suggested package-manager command manually.
## What changed
- Added a top-level `codex update` subcommand.
- Reused the existing install-channel detection and update command
runner that the TUI already uses for update prompts.
- Exposed the update-action lookup from `codex-tui` so the CLI can
invoke the same behavior.
- Added CLI coverage to ensure `codex update` is parsed as a subcommand
instead of becoming an interactive prompt.
## Verification
- `cargo test -p codex-cli`
- `cargo test -p codex-tui update_action::tests`
## Why
`PermissionProfile` is now the canonical internal permissions
representation, but the app-server wire shape is still intentionally
unstable while the migration continues. Stable app-server clients should
not see or generate code for these fields until the wire format settles.
## What changed
- Marks every app-server v2 field that sends `PermissionProfile` as
experimental, including `command/exec`, `thread/start`, `thread/resume`,
`thread/fork`, and `turn/start` request/response payloads.
- Enables per-field experimental inspection for `command/exec`, so
`permissionProfile` is gated without making the entire method
experimental.
- Fixes the generated TypeScript schema filter to be comment-aware. The
previous scanner treated apostrophes inside doc comments as string
delimiters, so some experimental fields leaked into stable TypeScript
even though stable JSON was filtered correctly.
## Verification
- `cargo test -p codex-app-server-protocol`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19899).
* #19900
* __->__ #19899
## Why
After thread sessions have a required `PermissionProfile`, the TUI no
longer needs to cache a separate legacy `SandboxPolicy` in
`ThreadSessionState`. Keeping the legacy field would reintroduce two
permission authorities in the session cache and make later
replay/switching logic easier to get wrong.
This PR keeps legacy app-server compatibility at the ingestion boundary:
old `sandbox` response values are still accepted, but they are
immediately converted to a cwd-anchored profile.
## What Changed
- Removes `ThreadSessionState.sandbox_policy`.
- Updates active-session permission syncing to write only the current
`PermissionProfile`.
- Updates thread-read/replay/test fixtures to use profiles as the cached
session permission source.
- Leaves legacy `sandbox` fields in app-server request/response protocol
paths unchanged; those are compatibility boundaries and are converted
before entering cached TUI state.
## Verification
- `cargo test -p codex-tui thread_session_state::tests --lib`
- `cargo test -p codex-tui
inactive_thread_started_notification_initializes_replay_session --lib`
- `cargo test -p codex-tui thread_events --lib`
- `just fix -p codex-tui`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19776).
* #19900
* #19899
* __->__ #19776
## Why
Remote TUI resume uses the app-server websocket client. That client
inherited tungstenite's default `16 MiB` frame limit, so a large saved
session could make `thread/resume` return a single JSON-RPC response
frame that the client rejected before the TUI could deserialize or
render it.
Fixes#19837
## What Changed
- Configure the remote app-server websocket client with a bounded `128
MiB` max frame/message size.
- Preserve the concrete remote worker exit reason when completing
pending requests after a transport/read failure instead of replacing it
with a generic channel-closed error.
- Add a regression test that sends a single `>16 MiB` JSON-RPC response
frame and verifies the typed request succeeds.
Note: This isn't a perfect fix. It really just moves the limit to a much
larger value. I looked at a bunch of other potential fixes (both
server-side and client-side), and they all involved significant
complexity, had backward-compatibility impact, or impacted performance
of common use cases. This simple fix should address the vast majority of
remote use cases.
## Verification
I reproed the problem locally using a long rollout. Verified that fix
addresses connection drop.
## Why
`ThreadConfigSnapshot` is used by app-server and thread metadata code as
a stable view of active runtime settings. Keeping both `sandbox_policy`
and `permission_profile` in the snapshot duplicates permission state and
makes it possible for the legacy projection to drift from the canonical
profile.
The legacy `sandbox` value is still needed at app-server compatibility
boundaries, so this PR derives it on demand from the snapshot profile
and cwd instead of storing it.
## What Changed
- Removes `ThreadConfigSnapshot.sandbox_policy`.
- Adds `ThreadConfigSnapshot::sandbox_policy()` as a compatibility
projection from `permission_profile` plus `cwd`.
- Updates app-server response/metadata code and tests to call the
projection only where legacy fields still exist.
- Keeps snapshot construction profile-only so split filesystem rules,
disabled enforcement, and external enforcement remain represented by the
canonical profile.
## Verification
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
dispatch_reclaims_stale_global_lock_and_starts_consolidation --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19775).
* #19900
* #19899
* #19776
* __->__ #19775
## Why
`SessionConfiguredEvent` is the internal event that tells clients what
permissions are active for a session. Emitting both `sandbox_policy` and
`permission_profile` leaves two possible authorities and forces every
consumer to decide which one to honor. At this point in the migration,
the profile is expressive enough to represent managed, disabled, and
external sandbox enforcement, so the internal event can be profile-only.
The wire compatibility concern is older serialized events or rollout
data that only contain `sandbox_policy`; those still need to
deserialize.
## What Changed
- Removes `sandbox_policy` from `SessionConfiguredEvent` and makes
`permission_profile` required.
- Adds custom deserialization so old payloads with only `sandbox_policy`
are upgraded to a cwd-anchored `PermissionProfile`.
- Updates core event emission and TUI session handling to sync
permissions from the profile directly.
- Updates app-server response construction to derive the legacy
`sandbox` response field from the active thread snapshot instead of from
`SessionConfiguredEvent`.
- Updates yolo-mode display logic to treat both
`PermissionProfile::Disabled` and managed unrestricted filesystem plus
enabled network as full-access, while still preserving the distinction
between no sandbox and external sandboxing.
## Verification
- `cargo test -p codex-protocol session_configured_event --lib`
- `cargo test -p codex-protocol serialize_event --lib`
- `cargo test -p codex-exec session_configured --lib`
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox
--lib`
- `cargo test -p codex-tui session_configured --lib`
- `cargo test -p codex-tui
yolo_mode_includes_managed_full_access_profiles --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19774).
* #19900
* #19899
* #19776
* #19775
* __->__ #19774
## Why
Fixes#19475.
`codex exec` can finish successfully and then emit an `ERROR` on stderr:
```text
failed to record rollout items: thread <id> not found
```
That happens because shutdown closes the live thread writer before
emitting `ShutdownComplete`. The terminal event was still using the
normal `send_event_raw` path, so it tried to append rollout items
through a recorder that had already been removed. The answer is correct,
but wrappers that treat stderr as failure can retry completed exec runs.
This looks like a likely recent regression from
[#18882](https://github.com/openai/codex/pull/18882), which routed live
thread writes through `ThreadStore` and added the shutdown-time live
writer close. I have not bisected this, so the PR treats #18882 as the
likely source based on the affected shutdown code path rather than a
proven first-bad commit.
## What Changed
`ShutdownComplete` now bypasses rollout persistence after thread
shutdown and is delivered directly to clients. The shutdown path still
records the protocol event in the rollout trace before delivery,
preserving trace visibility without attempting a post-shutdown
thread-store append.
The change also adds a regression test with the in-memory thread store
to assert that shutdown creates and shuts down the live thread without
appending another item after shutdown.
Addresses #19856
## Summary
- Clarifies that external code contributions are invitation only.
- Points contributors to `docs/contributing.md` for the full policy
instead of using the previous warning phrasing.
## Summary
Adds the standard Codex `User-Agent` to shared default headers so the
responses-api WS handshake carries the same client OS and version
context as HTTP requests.
## Testing
- `cargo test -p codex-core
build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
- `cargo test -p codex-core --test all responses_websocket`
## Summary
AgentIdentity auth previously registered the process task lazily behind
a `OnceCell`. That meant the auth object could be constructed before its
runtime task binding was known.
This PR makes AgentIdentity auth load the runtime task at auth load time
and stores the resulting process task id directly on the auth object.
The model-provider call path can then read a concrete task id instead of
handling a missing lazy value.
## Stack
1. [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762) (merged)
2. **This PR:** [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [fix: configure AgentIdentity AuthAPI base
URL](https://github.com/openai/codex/pull/19904)
4. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)
## Important call sites
| Area | Change |
| --- | --- |
| `AgentIdentityAuth::load` | Registers the process task during auth
loading and stores `process_task_id`. |
| `CodexAuth::from_agent_identity_jwt` | Awaits AgentIdentity auth
loading. |
| model-provider auth | Reads a concrete `process_task_id` instead of an
optional lazy value. |
| AgentIdentity auth tests | Mock task registration now covers eager
runtime allocation. |
## Design decisions
AgentIdentity auth now treats task registration as part of constructing
a usable auth object. That matches how callers use the value: once auth
is present, the model-provider path expects the task-scoped assertion
data to be ready.
## Testing
Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
- Marks `/title` and `/statusline` as available during active tasks.
- Extends the existing slash-command availability test coverage to
include these commands alongside `/goal`.
## Why
`ThreadSessionState` is the TUI's cached view of an app-server session.
To make `PermissionProfile` the canonical runtime permissions model,
cached thread sessions need to always have a profile instead of treating
the profile as an optional supplement to a legacy `sandbox` response
field.
The main compatibility concern is older app-server v2 lifecycle
responses that only include `sandbox` and omit `permissionProfile`:
- `thread/start` -> `ThreadStartResponse.sandbox`
- `thread/resume` -> `ThreadResumeResponse.sandbox`
- `thread/fork` -> `ThreadForkResponse.sandbox`
Those responses must still hydrate correctly when the TUI is pointed at
an older app-server. This PR converts the legacy `sandbox` value into a
`PermissionProfile` immediately at response ingestion time, using the
response `cwd`, so cached sessions do not carry an optional profile that
can later reinterpret cwd-bound grants against a different thread cwd.
This fallback is intentionally boundary compatibility. The follow-up PRs
in this stack continue the cleanup by making `SessionConfiguredEvent`
profile-only, deriving sandbox projections from snapshots only when an
API still needs them, and then removing `sandbox_policy` from
`ThreadSessionState`.
## What Changed
- Makes `ThreadSessionState.permission_profile` required.
- Converts legacy app-server response `sandbox` values into a
`PermissionProfile` at ingestion time using the response cwd.
- Ensures `thread/read` hydration does not reuse a primary session
profile that may be anchored to a different cwd; it uses the active
widget permission settings for the read thread fallback instead of
reusing cached primary-session permissions.
- Keeps the app-server request path unchanged: embedded sessions send
profiles, while remote sessions continue using legacy sandbox overrides
for compatibility.
## Verification
- `cargo test -p codex-tui thread_read --lib`
- `cargo test -p codex-tui
permission_settings_sync_preserves_active_profile_only_rules --lib`
- `cargo test -p codex-tui
resume_response_restores_turns_from_thread_items --lib`
- `cargo test -p codex-tui thread_session_state::tests --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19773).
* #19900
* #19899
* #19776
* #19775
* #19774
* __->__ #19773
## Summary
- Remove `ghost_snapshot` / `GhostCommit` from the Responses API surface
and generated SDK/schema artifacts.
- Keep legacy config loading compatible, but make undo a no-op that
reports the feature is unavailable.
- Clean up core history, compaction, telemetry, rollout, and tests to
stop carrying ghost snapshot items.
## Testing
- Unit tests passed for `codex-protocol`, `codex-core` targeted undo and
compaction flows, `codex-rollout`, and `codex-app-server-protocol`.
- Regenerated config and app-server schemas plus Python SDK artifacts
and verified they match the checked-in outputs.
## Why
Recent `main` CI had repeated flakes in the plugin fixture tests:
- `codex-core::all
suite::plugins::explicit_plugin_mentions_inject_plugin_guidance` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
- `codex-core::all suite::plugins::plugin_mcp_tools_are_listed` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
The failures were in the same plugin/MCP fixture family: assertions
expected sample plugin guidance or tool inventory, but the test could
observe the session before the sample MCP server had finished startup.
## Root Cause
`explicit_plugin_mentions_inject_plugin_guidance` submitted the user
turn immediately after constructing the session. MCP startup is
asynchronous, so on a slower or busier CI runner the prompt could be
built before the sample plugin MCP server had reported its tools. That
made the test depend on scheduler timing rather than the fixture being
ready.
`plugin_mcp_tools_are_listed` already needed the same readiness
condition, but its wait logic was local to that test.
## What Changed
- Added a shared `wait_for_sample_mcp_ready` helper for the plugin
fixture tests.
- Wait for `McpStartupComplete` before submitting the explicit plugin
mention turn.
- Reuse the same readiness helper in the MCP tool-listing test.
## Why This Should Be Reliable
The tests now wait for the explicit readiness signal from the sample MCP
server before asserting guidance or tools derived from that server. This
removes the startup race while still exercising the real fixture path,
so the assertions should only run after the plugin inventory is
deterministic.
## Verification
- `cargo test -p codex-core --test all plugins::`
- GitHub CI for this PR is passing.
## Summary
- Extracted the shared filesystem types and `ExecutorFileSystem` trait
into a new `codex-file-system` crate
- Switched `codex-config` and `codex-git-utils` to depend on that crate
instead of `codex-exec-server`
- Kept `codex-exec-server` re-exporting the same API for existing
callers
## Testing
- Ran `cargo test -p codex-file-system`
- Ran `cargo test -p codex-git-utils`
- Ran `cargo test -p codex-config`
- Ran `cargo test -p codex-exec-server`
- Ran `just fix -p codex-file-system`, `just fix -p codex-git-utils`,
`just fix -p codex-config`, `just fix -p codex-exec-server`
- Ran `just fmt`
- Updated and verified the Bazel module lockfile
## Summary
Disallow fileParams metadata for custom MCPs
Restricts Codex openai/fileParams handling to the first-party codex_apps
MCP server. Custom MCP servers may still advertise the metadata, but
Codex now ignores it for upload rewriting, preventing non-Apps tools
from receiving signed OpenAI file refs for local paths. Added a
regression test for the allowed and denied cases.
## Why
This continues the permissions migration by making legacy config default
resolution produce the canonical `PermissionProfile` first. The legacy
`SandboxPolicy` projection should stay available at compatibility
boundaries, but config loading should not create a legacy policy just to
immediately convert it back into a profile.
Specifically, when `default_permissions` is not specified in
`config.toml`, instead of creating a `SandboxPolicy` in
`codex-rs/core/src/config/mod.rs` and then trying to derive a
`PermissionProfile` from it, we use `derive_permission_profile()` to
create a more faithful `PermissionProfile` using the values of
`ConfigToml` directly.
This also keeps the existing behavior of `sandbox_workspace_write` and
extra writable roots after #19841 replaced `:cwd` with `:project_roots`.
Legacy workspace-write defaults are represented as symbolic
`:project_roots` write access plus symbolic project-root metadata
carveouts. Extra absolute writable roots are still added directly and
continue to get concrete metadata protections for paths that exist under
those roots.
The platform sandboxes differ when a symbolic project-root subpath does
not exist yet.
* **Seatbelt** can encode literal/subpath exclusions directly, so macOS
emits project-root metadata subpath policies even if `.git`, `.agents`,
or `.codex` do not exist.
* **bwrap** has to materialize bind-mount targets. Binding `/dev/null`
to a missing `.git` can create a host-visible placeholder that changes
Git repo discovery. Binding missing `.agents` would not affect Git
discovery, but it would still create a host-visible project metadata
placeholder from an automatic compatibility carveout. Linux therefore
skips only missing automatic `.git` and `.agents` read-only metadata
masks; missing `.codex` remains protected so first-time project config
creation goes through the protected-path approval flow. User-authored
`read` and `none` subpath rules keep normal bwrap behavior, and `none`
can still mask the first missing component to prevent creation under
writable roots.
## What Changed
- Adds profile-native helpers for legacy workspace-write semantics,
including `PermissionProfile::workspace_write_with()`,
`FileSystemSandboxPolicy::workspace_write()`, and
`FileSystemSandboxPolicy::with_additional_legacy_workspace_writable_roots()`.
- Makes `FileSystemSandboxPolicy::workspace_write()` the single legacy
workspace-write constructor so both `from_legacy_sandbox_policy()` and
`From<&SandboxPolicy>` include the project-root metadata carveouts.
- Removes the no-carveout `legacy_workspace_write_base_policy()` path
and the `prune_read_entries_under_writable_roots()` cleanup that was
only needed by that split construction.
- Adds `ConfigToml::derive_permission_profile()` for legacy sandbox-mode
fallback resolution; named `default_permissions` profiles continue
through the permissions profile pipeline instead of being reconstructed
from `sandbox_mode`.
- Updates `Config::load()` to start from the derived profile, validate
that it still has a legacy compatibility projection, and apply
additional writable roots directly to managed workspace-write filesystem
policies.
- Updates Linux bwrap argument construction so missing automatic
`.git`/`.agents` symbolic project-root read-only carveouts are skipped
before emitting bind args; missing `.codex`, user-authored `read`/`none`
subpath rules, and existing missing writable-root behavior are
preserved.
- Adds coverage that legacy workspace-write config produces symbolic
project-root metadata carveouts, extra legacy workspace writable roots
still protect existing metadata paths such as `.git`, and bwrap skips
missing `.git`/`.agents` project-root carveouts while preserving missing
`.codex` and user-authored missing subpath rules.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19772).
* #19776
* #19775
* #19774
* #19773
* __->__ #19772
## Why
The remaining review, interrupt, fuzzy search, feedback, and git-diff
handlers still had local send-error branches that obscured otherwise
simple request handling. This final slice flattens those handlers
without changing the public protocol behavior.
## What Changed
- Streamlined review start, turn interrupt, fuzzy search session,
feedback upload, and git diff handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Converted validation and upload failures into returned JSON-RPC errors
where that avoids nested `send_error`/`return` blocks.
- Left unrelated sandbox setup and notification code untouched.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::review --
--test-threads=1`
## Summary
- Add the `enable_mcp_apps` feature flag to the `codex-features`
registry
- Keep it under development and disabled by default
## Testing
- Unit tests for `codex-features` passed
- Formatting passed
Implements #18162
This updates the TUI terminal title to show an explicit action-required
state when Codex is blocked on user approval or input. The terminal
title now uses the activity title item to cover both active work and
blocked-on-user states, while still accepting the legacy spinner config
value.
Changes
- Rename the terminal title item from `spinner` to `activity` while
preserving legacy config compatibility
- Show `[ ! ] Action Required `while approval or input overlays are
active, with a blinking `[ . ]` alternate state
- Suppress the normal working spinner while Codex is blocked on user
action
- Add targeted coverage for action-required title behavior and legacy
title-item parsing
Testing
- Trigger an approval or input modal and confirm the tab title
alternates between `[ ! ] Action Required` and `[ . ] Action Required`
- Disable the activity title item and confirm the action-required title
does not appear
- Resolve the prompt and confirm the title returns to the normal
spinning/idel state
https://github.com/user-attachments/assets/e9ecc530-a6be-4fd7-b9a6-d550a790eb2c
## Why
Turn and realtime handlers had nested validation and send-error branches
that made the request path longer than the behavior warranted. This
slice keeps the same request semantics while letting the handlers return
errors from the failing step.
## What Changed
- Streamlined turn start, injected item, and turn steer request handling
in `codex-rs/app-server/src/codex_message_processor.rs`.
- Applied the same result-returning shape to realtime session response
handlers.
- Preserved existing request validation and thread-manager interactions.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::turn_start --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::turn_steer --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::thread_inject_items --
--test-threads=1`
## Why
Thread resume and fork had some of the deepest error-handling
indentation in this area because helpers emitted request errors
directly. Returning those failures gives the handlers a single request
boundary while preserving the async pending-resume behavior.
## What Changed
- Converted thread resume helpers in
`codex-rs/app-server/src/codex_message_processor.rs` to return `Result`
values for validation and view loading failures.
- Applied the same pattern to thread fork request handling.
- Simplified pending resume error construction by using the shared
JSON-RPC error helpers.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::thread_resume --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::thread_fork --
--test-threads=1`
Records cancelled inference streams when Codex stops consuming a
provider response before `response.completed`, preserving complete
output items observed before cancellation.
Also closes still-running inference calls when the owning turn ends, so
reduced rollout traces do not leave stale `Running` inference nodes.
Covered by focused reducer coverage and a core stream-drop test for
partial output preservation.
## Why
The thread read/list handlers mostly assemble views, but their error
handling was interleaved with response emission. Returning view-building
errors from the helper path keeps those handlers focused on data
assembly.
## What Changed
- Added a small mapper for `ThreadReadViewError` to JSON-RPC errors in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Streamlined thread list, loaded-thread, read, turn-list, and summary
handlers to produce result values for the request boundary.
- Kept the existing invalid-request vs internal-error distinctions for
missing or unreadable thread data.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all conversation_summary --
--test-threads=1`
**note**: a large chunk of this diff comes from regenerating Python
types after app-server schema changes on `main`.
This is PR 3 of 3 for the Python SDK PyPI publishing split. PR #18862
refreshed the generated SDK surface, and PR #18865 made the runtime
package publishable as `openai-codex-cli-bin`; this final PR makes the
SDK package publishable as `openai-codex-app-server-sdk` and pins both
packages to the same Codex runtime version.
The key idea is that the published SDK version is the Codex runtime
version. That one version now drives the SDK package version, the exact
runtime dependency, the client version reported by the SDK, and the
bootstrap runtime pin. This keeps release-time versioning in one lane
instead of scattering checked-in literals through the package.
## What changed
- Rename the SDK distribution from `codex-app-server-sdk` to
`openai-codex-app-server-sdk` for conflict-free PyPI publishing.
- Use `stage-sdk --codex-version ...` with one Codex version for both
the SDK package version and exact `openai-codex-cli-bin` dependency.
- Preserve hidden legacy `--runtime-version` / `--sdk-version` args only
to reject mismatched versions during staging.
- Map PEP 440 package versions back to Codex release tags for runtime
setup downloads, e.g. `0.116.0a1` -> `rust-v0.116.0-alpha.1`.
- Derive `codex_app_server.__version__`, the default
`AppServerConfig.client_version`, and
`_runtime_setup.pinned_runtime_version()` from the SDK package/project
version instead of hardcoding duplicate version strings.
- Carry the current generated SDK refresh from `main` so
`generate-types` stays clean after recent app-server schema changes.
- Update `sdk/python/uv.lock` for the renamed editable package.
## Validation
- `uv run --extra dev pytest` in `sdk/python` -> 59 passed, 37 skipped.
- Targeted `uv run ruff check` for the touched SDK files.
- `git diff --check`.
- Staged runtime with `--codex-version rust-v0.116.0-alpha.1
--platform-tag macosx_11_0_arm64`.
- Staged SDK with `--codex-version rust-v0.116.0-alpha.1`.
- Built runtime wheel, SDK wheel, and SDK sdist.
- `twine check /tmp/codex-python-pr3-build/dist/*` -> passed.
- Clean venv smoke installed `openai-codex-app-server-sdk==0.116.0a1`
from local dist and pulled `openai-codex-cli-bin==0.116.0a1`.
- Smoke imports passed for `Codex` and `bundled_codex_path()`.
## Summary
- shard `//codex-rs/exec:exec-all-test` into 8 Bazel shards
- keep the existing `no-sandbox` test tag unchanged
## Why
The Windows Bazel lane has been timing out this aggregated integration
test target at the default 300s test timeout. The target runs the
combined `codex-rs/exec/tests/all.rs` integration binary; sharding lets
Bazel split the Rust test cases across parallel test actions instead of
running the whole integration suite as one long action.
## Validation
Not run locally, per the Codex repo workflow for development-phase
changes.
Co-authored-by: Codex <noreply@openai.com>
## Why
Thread mutation handlers had many short error branches whose only job
was to emit a JSON-RPC error and stop. This slice keeps those errors
visible, but lets each handler build a result and return early from
validation helpers instead of nesting the main path.
## What Changed
- Streamlined thread archive/unarchive, rename, memory, metadata,
rollback, compact, background terminal, shell, and guardian handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Reused shared JSON-RPC error constructors in
`codex-rs/app-server/src/bespoke_event_handling.rs` for rollback-related
request failures.
- Preserved direct `send_error` calls where they remain the simplest
boundary for pending async event responses.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::thread_rollback --
--test-threads=1`
### Summary
- `thread/list` filtered filesystem results already overlay state DB
metadata, but the existing merge only filled missing git fields.
- Prefer non-null SQLite git metadata over stale non-null rollout values
so persisted branch/SHA/origin updates are reflected in filtered thread
lists.
- Update the focused merge test to cover stale filesystem git metadata
being replaced by state-backed values.
### Testing
now getting expected icons
<img width="426" height="913" alt="Screenshot 2026-04-27 at 1 45 45 PM"
src="https://github.com/user-attachments/assets/027fb7e7-f54d-4353-8423-cb76f3c8f5ac"
/>
## Why
The thread start handler mixed request validation, thread construction,
dynamic-tool validation, and JSON-RPC error emission in one nested flow.
Returning request errors from the helper path makes the successful setup
path easier to follow.
## What Changed
- Reworked `thread/start` handling in
`codex-rs/app-server/src/codex_message_processor.rs` so helper methods
return `Result` and the handler emits one result.
- Moved dynamic-tool validation failures into returned JSON-RPC errors
instead of local `send_error` branches.
- Preserved the existing thread creation and task-spawning behavior.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::dynamic_tools --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::turn_start --
--test-threads=1`
## Why
The experimental `PermissionProfile` API had both `:cwd` and
`:project_roots` special filesystem paths, which made the permission
root ambiguous. This PR removes the unstable `current_working_directory`
special path before the permissions API is stabilized, so callers use
`:project_roots` for symbolic project-root access.
## What changed
- Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol
and app-server protocol models, plus regenerated app-server
JSON/TypeScript schemas.
- Replaces internal `:cwd` permission entries with `:project_roots`
entries.
- Keeps the existing cwd-update behavior for legacy-shaped
workspace-write profiles, while removing the deleted
`CurrentWorkingDirectory` case from that compatibility path.
- Keeps `PermissionProfile::workspace_write()` as the reusable symbolic
workspace-write helper, with docs noting that `:project_roots` entries
resolve at enforcement time.
- Updates app-server docs/examples and approval UI labeling to stop
advertising `:cwd` as a permission token.
## Compatibility
Persisted rollout items may contain the old
`{"kind":"current_working_directory"}` tag from earlier experimental
`permissionProfile` snapshots. This PR keeps that tag as a
deserialize-only alias for `ProjectRoots { subpath: None }`, while
continuing to serialize only the new `project_roots` tag.
## Follow-up
This PR intentionally does not introduce an explicit project-root set on
`SessionConfiguration` or runtime sandbox resolution. Today, the
resolver still uses the active cwd as the single implicit project root.
A follow-up should model project roots separately from tool cwd so
`:project_roots` entries can resolve against the configured project
roots, and resolve to no entries when there are no project roots.
## Verification
- `cargo test -p codex-protocol permissions:: --lib`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-sandboxing -p codex-exec-server --lib`
- `cargo test -p codex-core session_configuration_apply_ --lib`
- `cargo test -p codex-app-server
command_exec_permission_profile_project_roots_use_command_cwd --test
all`
- `cargo test -p codex-tui
thread_read_session_state_does_not_reuse_primary_permission_profile
--lib`
- `cargo test -p codex-tui
preset_matching_accepts_workspace_write_with_extra_roots --lib`
- `cargo test -p codex-config --lib`
## Why
Fixes#19702.
The TUI markdown renderer could visually attach the next list marker to
a fenced code block inside the previous list item, even when the source
markdown included a blank line before the next item. That made
block-heavy loose lists harder to read, while the desired behavior is
still to keep simple lists compact.
## What changed
- Track whether the current rendered list item contains a code block.
- Preserve one blank separator before the following list marker only
when the previous item contained a code block.
- Add regression coverage for both paths: code-block list items keep the
separator, and simple loose list items stay compact.
## Verification
- `cargo test -p codex-tui markdown_render`
I also manually verified that the bug exists before and is fixed after.
## Before
<img width="437" height="240" alt="Screenshot 2026-04-26 at 1 19 01 PM"
src="https://github.com/user-attachments/assets/3bc9d64d-2dba-40d9-9d6b-a1d0b3c0f728"
/>
## After
<img width="410" height="269" alt="Screenshot 2026-04-26 at 1 18 54 PM"
src="https://github.com/user-attachments/assets/19c15bee-da32-455e-a7cb-e05eb85f4ea0"
/>
## Why
Fixes#7744. Approval modals can currently appear while the user is
typing ahead in the TUI composer, which lets plain letters like `y` or
`a` get consumed as approval shortcuts instead of staying in the draft
input.
## What changed
- Track recent composer typing activity in `bottom_pane/mod.rs`.
- Delay new approval overlays for 1 second while the composer is active,
keeping delayed requests queued until the user is idle.
- Preserve the existing active-overlay behavior so approvals that arrive
while an approval modal is already open are still queued into that
overlay.
- Prune delayed approvals when app-server resolution says the request
has already been handled.
## Verification
Added unit coverage for immediate approvals, delayed approvals, idle
deadline reset, typed shortcut letters staying in the composer, shortcut
handling after the delay, and resolved delayed-request pruning.
Focused `codex-tui` test groups pass locally. The full `cargo test -p
codex-tui` run currently aborts in
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`;
that same test also fails when run alone with the same stack overflow.
Manual reviewer check:
1. Start the TUI from the repo root:
```bash
RUST_LOG=trace just codex \
-c log_dir=<temp-log-dir> \
--ask-for-approval untrusted \
--sandbox workspace-write
```
2. Submit this prompt:
```text
create a file text.txt on my desktop
```
3. While the agent is preparing the approval request, immediately type
text such as `ya this should stay in the composer`.
4. Confirm the typed-ahead `y`/`a` remains in the composer instead of
approving the request.
5. Stop typing for about 1 second; the approval modal should then
appear.
6. Once the modal is visible, press `y` and confirm the approval
shortcut works normally.
## Why
`codex resume` regressed after
[#18502](https://github.com/openai/codex/pull/18502) changed the default
`thread/list` scan-and-repair path for metadata-filtered listings. The
TUI resume picker uses `thread/list` with source/provider/cwd filters
and `useStateDbOnly: false`, which is the intended
correctness-preserving mode: it should still consult the filesystem so
healthy, missing, or stale SQLite state can be repaired.
The regression was that #18502 made that filtered, filesystem-backed
path call `reconcile_rollout` for every filesystem hit, and then call it
again for each SQLite hit. When `reconcile_rollout` does not already
have extracted rollout items, it falls back to loading the full JSONL
rollout. That changed the resume picker’s first page from a cheap
rollout-head scan plus SQLite read-repair into full-file reads for large
sessions, so a few long threads could dominate TUI startup/resume
latency.
This change addresses the regression by keeping `useStateDbOnly: false`
on the correctness-preserving path while avoiding unnecessary full JSONL
reads for rows the filesystem scan has already validated.
Source/provider/cwd filters can be decided from rollout-head metadata,
so non-search resume listings only need the lightweight read-repair path
for filesystem hits. Full reconciliation is still used for DB-only
filtered rows because those can be stale false positives, and for search
listings because search can depend on title metadata that may require
scanning the full rollout.
This fixes#19483.
## What changed
- For non-search filtered listings, repair filesystem hits with the
lightweight `read_repair_rollout_path` path instead of full
`reconcile_rollout`.
- Track thread IDs proven by the filesystem scan and only fully
reconcile SQLite-filtered hits that the filesystem scan did not return,
preserving stale-DB false-positive cleanup without full-reading every
healthy rollout.
- Leave search listings on full reconciliation, since search depends on
full title metadata rather than only source/provider/cwd metadata from
the rollout head.
## Verification
- `cargo test -p codex-rollout list_threads`
- `cargo test -p codex-app-server thread_list`
Clamp original-detail image patch estimates to the current 10k patch
budget so large images cannot inflate local context accounting without
bound. Add regression coverage for an over-budget image.
Fixesopenai/codex#19806.
fixes#19486
### Problem
Right now dynamic deferred tools are filtered at normal-turn prompt
building time, rather than upstream while building the `ToolRouter`
itself. This causes issues because dynamic deferred tools are then
wrongly included in the router's `model_visible_specs`, which is what
the compaction request-building flow relies on.
### Fix
Move the dynamic deferred tool filtering to `ToolRouter` creation time
to solve this problem for every request that relies on `ToolRouter` for
`model_visible_specs`, which solves the issue generically.
### Tests
Added unit + integration tests to ensure dynamic deferred tools are
omitted from `model_visible_specs` and compaction request respectively.
Tested against live `/compact` endpoint; raw deferred dynamic tools
without `tool_search` returned `400` (current bug), while the filtered
payload (this fix) returns `200`.
## Why
Account login/logout and command exec handlers were doing local error
sends in the middle of each handler. That made these request flows
branch heavily even though most of the logic is validate, perform the
operation, and return the response.
## What Changed
- Converted ChatGPT/API-key login, login cancel, logout, rate-limit, and
add-credit handlers in
`codex-rs/app-server/src/codex_message_processor.rs` to compute `Result`
values and send them once at the request boundary.
- Applied the same shape to command exec start/write/resize/terminate
handlers.
- Kept side-effect notifications in the same places after successful
request handling.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::account --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::command_exec --
--test-threads=1`
## Why
All Bazel CI jobs are currently blocked in the `setup-bazelisk` step
while trying to download Bazelisk.
[`bazelbuild/setup-bazelisk`](https://github.com/bazelbuild/setup-bazelisk)
is archived, and its README now recommends migrating to
[`bazel-contrib/setup-bazel`](https://github.com/bazel-contrib/setup-bazel),
so leaving our workflows on the archived action leaves CI exposed to
exactly this sort of outage.
Because `v8-canary` now consumes the shared local `setup-bazel-ci`
action, that workflow also needs to trigger when the action changes.
Without that follow-up, Bazel bootstrap regressions specific to the V8
canary path could be skipped by the workflow path filters.
## What Changed
- Switched `.github/actions/setup-bazel-ci/action.yml` from
`bazelbuild/setup-bazelisk` to `bazel-contrib/setup-bazel`, pinned to
`0.19.0`.
- Left `bazelisk-version` unset so GitHub-hosted runners can use their
preinstalled Bazelisk instead of downloading `1.x` at job start.
- Updated `.github/workflows/rusty-v8-release.yml` and
`.github/workflows/v8-canary.yml` to use the shared `setup-bazel-ci`
action instead of referencing `setup-bazelisk` directly.
- Added `.github/actions/setup-bazel-ci/**` to the `pull_request` and
`push` path filters in `.github/workflows/v8-canary.yml` so changes to
the shared Bazel setup action still run the canary workflow.
- Kept the existing repository-cache and Windows-specific Bazel setup
logic intact.
This keeps Bazel version selection anchored by `.bazelversion` while
removing the failing dependency on the archived setup action.
## Verification
- Searched `.github/` to confirm there are no remaining `setup-bazelisk`
references.
- Parsed the updated workflow and action YAML locally with Ruby's
`YAML.load_file`.
## Why
The `build-test` workflow stages a representative `codex` npm tarball by
asking `scripts/stage_npm_packages.py` to look up a past `rust-release`
run for a hardcoded release version. That started failing in CI because
the representative version in `.github/workflows/ci.yml` was stale:
- the workflow was still using `0.115.0`
- `stage_npm_packages.py` resolves native artifacts by looking for a
`rust-release` run on the `rust-v<version>` branch
- that lookup no longer found a matching run for `rust-v0.115.0`, so the
smoke test failed before it could stage the package
This PR makes that smoke test depend on a known-good recent release run
instead of an older branch lookup that is no longer reliable.
## What Changed
- Updated the representative release version in
`.github/workflows/ci.yml` from `0.115.0` to `0.125.0`.
- Added an explicit `WORKFLOW_URL` pointing at a recent successful
`rust-release` run:
`https://github.com/openai/codex/actions/runs/24901475298`.
- Passed that URL to `scripts/stage_npm_packages.py` via
`--workflow-url` so the job can reuse the expected native artifacts
directly instead of relying on `gh run list --branch rust-v<version>` to
discover them.
That keeps the npm staging smoke test representative while making it
less sensitive to older release branch history disappearing from the
GitHub Actions lookup path.
## Verification
- Inspected the failing CI log from `build-test` and confirmed the
failure came from `scripts/stage_npm_packages.py` being unable to
resolve `rust-v0.115.0`.
- Confirmed that
`https://github.com/openai/codex/actions/runs/24901475298` is a
successful `rust-release` run for `rust-v0.125.0`.
## Summary
Auth loading used to expose synchronous construction helpers in several
places even though some auth sources now need async work. This PR makes
the auth-loading surface async and updates the callers to await it.
This is intentionally only plumbing. It does not change how
AgentIdentity tokens are decoded, how task runtime ids are allocated, or
how JWT signatures are verified.
## Stack
1. **This PR:** [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762)
2. [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)
## Important call sites
| Area | Change |
| --- | --- |
| `codex-login` auth loading | `CodexAuth` and `AuthManager`
construction paths now await auth loading. |
| app-server startup | Auth manager construction is awaited during
initialization. |
| CLI/TUI/exec/MCP/chatgpt callers | Existing auth-loading calls now
await the same behavior. |
| cloud requirements storage loader | The loader becomes async so it can
share the same auth construction path. |
| auth tests | Tests that load auth now run in async contexts. |
## Testing
Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
## Why
The plugin, app, and skills handlers had a lot of repeated
`send_error`/`return` branches that made the success path hard to scan.
This slice keeps behavior the same while moving fallible steps into
local response-producing helpers, so the request boundary can send one
result.
## What Changed
- Converted plugin list/install/uninstall handlers in
`codex-rs/app-server/src/codex_message_processor/plugins.rs` to return
`Result<*Response, JSONRPCErrorError>` from helper methods and call
`send_result` once.
- Added local error-mapping helpers for plugin install/uninstall and
marketplace failures.
- Applied the same mechanical shape to app list, skills list/config, and
marketplace add/remove/upgrade handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::app_list --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::plugin_ --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::skills_list --
--test-threads=1`
## Why
Fixes#19632.
When a delegated agent requests approval for an in-progress file change,
the parent TUI handles that request from an inactive thread. The app
server already sent the `FileChange` item with the proposed diff, but
the inactive-thread approval path was not recovering and rendering it
the same way as the active-thread path.
The result was an inconsistent approval prompt: main-thread edits show a
normal patch preview history item before the approval modal, while
delegated edits did not show that preview in the transcript flow.
## What Changed
- Recover buffered or historical `FileChange` item changes when building
inactive-thread file-change approval requests.
- Reuse the app-server file-change conversion helper for both live
transcript rendering and inactive-thread approvals.
- Render recovered delegated patches as a normal patch preview history
cell before the approval modal.
- Keep apply-patch approval modals focused on the decision prompt and
optional metadata; they do not render a synthetic command line or embed
the diff body.
## Manual Repro And Verification
I manually reproduced the issue using a file under `~/Desktop` so the
write would require approval.
Before the fix:
1. Ask the main thread: `Use apply_patch, not shell redirection or
Python, to create ~/Desktop/bug1.txt with three short lines.`
2. Observe the expected TUI shape: the transcript shows a normal patch
preview such as `• Added ~/Desktop/bug1.txt (+N -0)` above the approval
modal, and the modal contains only the approval prompt/options without a
synthetic command line.
3. Ask for the delegated path: `Spawn a worker. Have it use apply_patch,
not shell redirection or Python, to create ~/Desktop/bug1.txt with four
short lines.`
4. Observe the delegated approval is inconsistent: the parent view does
not render the proposed patch as the normal transcript preview before
the modal, so the diff context is missing from the stream or appears
inside the modal instead of in the history flow.
After the fix:
1. Repeat the delegated worker prompt with `apply_patch`.
2. Confirm the parent view renders the same normal patch preview history
cell (`• Added ~/Desktop/bug1.txt (+N -0)` plus the diff) immediately
before the approval modal.
3. Confirm the approval modal remains focused on the decision prompt.
For delegated approvals it may show the worker thread label, but it
should not show a `$ apply_patch` command line or embed the diff body in
the modal.
## Why
`!` shell commands are currently surfaced as "Bash mode", which is
misleading for users running shells such as PowerShell or zsh. Those
commands also bypass the persistent prompt history path, so they cannot
be recalled after starting a new session.
Fixes#19613.
## What changed
- Rename the TUI footer label and related test wording from "Bash mode"
to "Shell mode".
- Persist accepted `!` shell commands to prompt history with the leading
`!`, so recall restores the composer into shell mode across sessions.
- Add coverage for immediate and queued shell-command submissions
emitting the prompt-history update.
## Verification
- `cargo test -p codex-tui bang_shell`
- `cargo test -p codex-tui shell_command_uses_shell_accent_style`
- `cargo test -p codex-tui footer_mode_snapshots`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`
Manually verified fix after confirming presence of bug prior to fix.
## Why
Fixes#19508.
In a fresh TUI session, pressing `Esc` twice entered the rewind
transcript overlay even though there was no user message to rewind to.
That produced an empty header-only transcript view and exposed a rewind
flow that could not select a valid target.
## What changed
The backtrack flow now checks whether a user-message rewind target
exists before opening the transcript preview. If no target exists, Codex
stays in the main TUI and shows `No previous message to edit.` instead
of opening an empty overlay.
The same guard applies when starting rewind preview from the transcript
overlay, and the first `Esc` no longer advertises the “edit previous
message” hint when there is no previous message available.
Snapshot coverage was added for the unavailable rewind info message,
along with a small target-detection test.
## Why
Phase 2 can now claim the global consolidation lock on startup even when
the git-backed memory workspace is already clean. The clean-workspace
path still finalized through the normal Phase 2 success path, which
clears and re-marks `selected_for_phase2` rows. That made no-op startups
perform avoidable writes to `stage1_outputs`, creating unnecessary DB
I/O and contention when no memory files changed.
## What Changed
- Added a preserving-selection Phase 2 finalizer in `codex-state` that
only marks the global job row as succeeded.
- Kept the existing `mark_global_phase2_job_succeeded` behavior for real
consolidation runs, where the selected Phase 2 snapshot must be
rewritten.
- Switched the `succeeded_no_workspace_changes` branch in
`core/src/memories/phase2.rs` to use the preserving-selection finalizer.
- Added a regression test that installs a SQLite trigger on
`stage1_outputs` and verifies the clean finalizer performs zero updates
there.
## Testing
- `cargo test -p codex-state`
- `cargo test -p codex-core memories::tests::phase2`
## Why
The Phase 2 memories job row is only the global lock for the git-backed
memory workspace. Manual memory edits do not enqueue new Stage 1 work,
so a Phase 2 row with `retry_remaining = 0` could be skipped before the
worker ever claimed the lock and generated `phase2_workspace_diff.md`.
That left workspace-only changes unconsolidated after repeated failures,
even when retry backoff had elapsed and the filesystem had real diffable
work.
## What Changed
- Allow `try_claim_global_phase2_job` to claim the Phase 2 lock after
the retry budget is exhausted, while still respecting active `retry_at`
backoff and fresh running leases.
- Treat `SkippedRetryUnavailable` for Phase 2 as backoff-only, and
update the outcome docs to match.
- Clamp Phase 2 retry bookkeeping at zero when failed attempts are
recorded.
## Verification
- Added
`phase2_global_lock_can_be_claimed_after_retry_budget_is_exhausted` to
cover the exhausted-budget lock claim path.
- Ran `cargo test -p codex-state`.
## Why
This PR make the `morpheus` agent (memory phase 2) use a git diff to
start it's consolidation. The workflow is the following:
1. The agent acquire a lock
2. If `.codex/memories` does not exist or is not a git root, initialize
everything (and make a first empty commit)
3. Update `raw_memories.md` and `rollout_summaries/` as before.
Basically we select max N phase 1 memories based on a given policy
4. We use git (`gix`) to get a diff between the current state of
`.codex/memories` and the last commit.
5. Dump the diff in `phase2_workspace_diff.md`
6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
7. Wait for `morpheus` to be done
8. Re-create a new `.git` and make one single commit on it. We do this
because we don't want to preserve history through `.git` and this is
cheap anyway
9. We release the lock
On top of this, we keep the retry policies etc etc
The goals of this new workflow are:
* Better support of any memory extensions such as `chronicle`
* Allow the user to manually edit memories and this will be considered
by the phase 2 agent
As a follow-up we will need to add support for user's edition while
`morpheus` is running
## What Changed
- Added memory workspace helpers that prepare the git baseline, compute
the diff, write `phase2_workspace_diff.md`, and reset the baseline after
successful consolidation.
- Updated Phase 2 to sync current inputs into `raw_memories.md` and
`rollout_summaries/`, prune old extension resources, skip clean
workspaces, and run the consolidation subagent only when the workspace
has changes.
- Tightened Phase 2 job ownership around long-running consolidation with
heartbeats and an ownership check before resetting the baseline.
- Simplified the prompt and state APIs so DB watermarks are bookkeeping,
while workspace dirtiness decides whether consolidation work exists.
- Updated the memory pipeline README and tests for workspace diffs,
extension-resource cleanup, pollution-driven forgetting, selection
ranking, and baseline persistence.
## Verification
- Added/updated coverage in `core/src/memories/tests.rs`,
`core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
and `core/tests/suite/memories.rs`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`features.multi_agent_v2.max_concurrent_threads_per_session` is meant to
be the MultiAgentV2-specific session thread cap: it counts the root
thread and all open subagent threads. The previous implementation kept
this surface tied to `agents.max_threads`, which made it a global
subagent-only cap and allowed the legacy setting to coexist with
MultiAgentV2.
## What Changed
- Added `max_concurrent_threads_per_session` to
`[features.multi_agent_v2]` with default `4`.
- Removed the `[agents] max_concurrent_threads_per_session` alias to
`agents.max_threads`.
- When MultiAgentV2 is enabled, reject `agents.max_threads` and derive
the existing internal subagent slot limit as
`max_concurrent_threads_per_session - 1`.
- Regenerated `core/config.schema.json` and added coverage for the new
config semantics.
## Result
```
➜ codex git:(jif/clean-multi-agent-v2-config) codex -c features.multi_agent_v2.enabled=true -c features.multi_agent_v2.max_concurrent_threads_per_session=3
╭────────────────────────────────────────────────────╮
│ >_ OpenAI Codex (v0.0.0) │
│ │
│ model: gpt-5.5 xhigh fast /model to change │
│ directory: ~/code/codex │
╰────────────────────────────────────────────────────╯
Tip: Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager.
› Can you try to spawn 4 agents
• I’ll try to start four lightweight agents at once and report exactly what the runtime accepts.
• Spawned Russell [no-apps] (gpt-5.5 xhigh)
└ Spawn probe 1: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Spawned Descartes [no-apps] (gpt-5.5 xhigh)
└ Spawn probe 2: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Agent spawn failed
└ Spawn probe 3: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Agent spawn failed
└ Spawn probe 4: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• The runtime accepted the first two and rejected the next two with agent thread limit reached. I’m checking whether the two accepted probes have returned cleanly, then I’ll close them if needed.
```
---------
Co-authored-by: Codex <noreply@openai.com>
Problem: Maintainers need a shared way to run Codex GitHub issue digests
without copying large prompts or relying on manual GitHub page
summaries.
Solution: Add a reusable codex-issue-digest skill with a deterministic
GitHub collector, owner/all-label windows, reaction-aware activity
metrics, scaled attention markers, and focused tests.
## Why
After config and requirements store canonical profiles, exec requests
should not cache a derived `SandboxPolicy`. The cached legacy value can
drift from the richer profile state, and most execution paths already
have the filesystem and network runtime policies they need.
## What Changed
- Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest`
and `codex_core::sandboxing::ExecRequest`.
- Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper
for the Windows and legacy call sites that still need a `SandboxPolicy`
projection.
- Updates Windows filesystem override setup and unified exec policy
serialization to derive that compatibility policy at the boundary.
- Updates Unix escalation reruns and direct shell requests to
reconstruct exec requests from `PermissionProfile` plus runtime
filesystem/network policy, without carrying a cached legacy policy.
- Adjusts sandboxing manager tests to assert the effective profile
rather than the removed legacy field.
## Verification
- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-sandboxing manager`
- `cargo test -p codex-core
exec_server_params_use_env_policy_overlay_contract`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-core exec::tests`
- `cargo test -p codex-core sandboxing::tests`
## Why
Auto-review can deny an action that the user later decides they want to
retry. Today there is no TUI surface for selecting a recent denial and
sending explicit approval context back into the session, so users have
to restate intent manually and the retry can be reviewed without the
original denied action context.
This adds a narrow TUI-driven path for approving a recent denied action
while still keeping the retry inside the normal auto-review flow.
## What Changed
- Added `/auto-review-denials` to open a picker of recent denied
auto-review actions.
- Added a small in-memory TUI store for the 10 most recent denied
auto-review events.
- Selecting a denial sends the structured denied event back through the
existing core/app-server op path.
- Core now injects a developer message containing the approved action
JSON rather than the full assessment event.
- Auto-review transcript collection now preserves this specific approval
developer message so follow-up review sessions can see the user approval
context.
- Added TUI snapshot/unit coverage for the picker and approval dispatch
path.
- Added core coverage for retaining the approval developer message in
the auto-review transcript.
## Verification
- `cargo test -p codex-core
collect_guardian_transcript_entries_keeps_manual_approval_developer_message`
- `cargo test -p codex-tui auto_review_denials`
- `cargo test -p codex-tui
approving_recent_denial_emits_structured_core_op_once`
## Notes
This intentionally keeps retries going through auto-review. The approval
signal is context for the exact previously denied action, not a blanket
bypass for similar future actions.
## Why
The remaining migration work still needs `SandboxPolicy` at a few
compatibility boundaries, but those projections should come from one
canonical path. Keeping ad hoc legacy projections scattered through
app-server, CLI, and config code makes it easy for behavior to drift as
`PermissionProfile` gains fidelity that the legacy enum cannot
represent.
## What Changed
- Adds `Permissions::legacy_sandbox_policy(cwd)` and
`Config::legacy_sandbox_policy()` as the compatibility projection from
the canonical `PermissionProfile`.
- Adds `Permissions::can_set_legacy_sandbox_policy()` so legacy inputs
are checked after they are converted into profile semantics.
- Updates app-server command handling, Windows sandbox setup, session
configuration, and sandbox summaries to use the centralized projection
helper.
- Leaves `SandboxPolicy` in place only for boundary inputs/outputs that
still speak the legacy abstraction.
## Verification
- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-tui
permissions_selection_history_snapshot_full_access_to_default --
--nocapture`
- `cargo test -p codex-tui
permissions_selection_sends_approvals_reviewer_in_override_turn_context
-- --nocapture`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_history_snapshot_full_access_to_default
--test_output=errors`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_sends_approvals_reviewer_in_override_turn_context
--test_output=errors`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19734).
* #19737
* #19736
* #19735
* __->__ #19734
# Why
Requirements support host-specific
`remote_sandbox_config.hostname_patterns`, but config loading previously
resolved and passed the system hostname through every config-loading
path even when no requirements layer used `remote_sandbox_config`. On
machines where hostname lookup is slow, startup and app-server config
reads paid for a feature that was not active.
We only need the hostname when a requirements layer actually declares
`remote_sandbox_config`, so this moves hostname resolution to the single
requirements merge point and keeps all other config callers unaware of
hostname matching.
# What
- Removed the eager `host_name` plumbing from
`load_config_layers_state`, `load_requirements_toml`, `ConfigBuilder`,
app-server `ConfigManager`, network proxy loading, and related call
sites.
- Resolve the hostname inside
`merge_requirements_with_remote_sandbox_config` only when the incoming
requirements contain `remote_sandbox_config`.
## Why
Several execution paths still converted profile-backed permissions into
`SandboxPolicy` and then rebuilt runtime permissions from that legacy
shape. Those round trips are unnecessary after the preceding PRs and can
lose split filesystem semantics. Core approval and escalation should
carry the resolved profile directly.
## What Changed
- Removes `sandbox_policy` from `ResolvedPermissionProfile`; the
resolved permission object now carries the canonical `PermissionProfile`
directly.
- Updates exec-policy fallback, shell/unified-exec interception,
escalation reruns, and related tests to pass profiles instead of legacy
policies.
- Removes legacy additional-permission merge helpers that built an
effective `SandboxPolicy` before rebuilding runtime permissions.
- Keeps legacy projections only at compatibility boundaries that still
require `SandboxPolicy`, not in core permission computation.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394).
* #19737
* #19736
* #19735
* #19734
* #19395
* __->__ #19394
## Summary
Increase `core-all-test`'s Bazel shard count from `8` to `16`.
## Why
[#19609](https://github.com/openai/codex/pull/19609) restored
`bazel.yml` to a 30-minute timeout and increased `app-server-all-test`'s
shard count because the bigger timeout risk was not just a cold Windows
build. The more common problem was a long `rust_test()` shard failing
and getting retried multiple times.
Recent `main` runs show that `//codex-rs/core:core-all-test` still has
the same shape of problem on Windows:
- [Run
24943931330](https://github.com/openai/codex/actions/runs/24943931330)
reported `//codex-rs/core:core-all-test` as flaky after first-attempt
failures in shard `5/8` and shard `8/8`.
- Those retries were driven by
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
and
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`.
- The failed shard attempts in that run took `272.61s` and `259.27s`
before retrying, which is exactly the sort of wall-clock cost that burns
through the 30-minute budget.
- [Run
24966332583](https://github.com/openai/codex/actions/runs/24966332583)
also retried `//codex-rs/tui:tui-unit-tests` after
`app::tests::update_memory_settings_updates_current_thread_memory_mode`
failed once on Windows.
- [Run
24965527138](https://github.com/openai/codex/actions/runs/24965527138)
and its linked [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/ac1a8265-06fa-4da5-9552-4715b7965bce)
show the other half of the problem: when Windows cache reuse is weak,
the `bazel test //...` step can already consume `24m11s` on its own,
leaving very little headroom for flaky retries.
Increasing `core-all-test` to `16` shards does not fix the flaky tests,
but it does reduce the wall-clock cost when a single shard has to be
retried. That matches the mitigation we already applied to
`app-server-all-test` in `#19609`.
## What Changed
- Update `codex-rs/core/BUILD.bazel` so `core-all-test` uses `16` shards
instead of `8`.
- Leave `core-unit-tests` unchanged.
## Follow-up Work
This change is meant to buy back CI headroom while we fix the flaky
tests themselves in subsequent commits. The recent Windows retries that
look worth addressing directly include:
-
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
-
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`
-
`app::tests::update_memory_settings_updates_current_thread_memory_mode`
## Verification
- Compared `core-all-test`'s current sharding against the
`app-server-all-test` precedent in
[#19609](https://github.com/openai/codex/pull/19609).
- Inspected recent `main` Bazel workflow logs and the linked BuildBuddy
invocation to confirm that Windows retries on long shards are still
consuming a meaningful fraction of the 30-minute timeout budget.
- Did not run local tests for this change because it only adjusts Bazel
sharding metadata.
## Why
Runtime decisions should not infer permissions from the lossy legacy
sandbox projection once `PermissionProfile` is available. In particular,
`Disabled` and `External` need to remain distinct, and managed profiles
with split filesystem or deny-read rules should not be collapsed before
approval, network, safety, or analytics code makes decisions.
## What Changed
- Changes managed network proxy setup and network approval logic to use
`PermissionProfile` when deciding whether a managed sandbox is active.
- Migrates patch safety, Guardian/user-shell approval paths, Landlock
helper setup, analytics sandbox classification, and selected
turn/session code to profile-backed permissions.
- Validates command-level profile overrides against the constrained
`PermissionProfile` rather than a strict `SandboxPolicy` round trip.
- Preserves configured deny-read restrictions when command profiles are
narrowed.
- Adds coverage for profile-backed trust, network proxy/approval
behavior, patch safety, analytics classification, and command-profile
narrowing.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
* #19395
* #19394
* __->__ #19393
## Why
Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.
To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.
This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.
## What Changed
- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI
## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`
## Notes
- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
## Why
App-server request handling had a lot of repeated JSON-RPC error
construction and one-off `send_error`/`return` branches. This made small
handlers noisy and pushed error response details into leaf code that
otherwise only needed to validate input or call the underlying API.
## What Changed
- Added shared JSON-RPC error constructors in
`codex-rs/app-server/src/error_code.rs`.
- Lifted straightforward request result emission into
`codex-rs/app-server/src/message_processor.rs` so response/error
dispatch happens at the request boundary.
- Reused the result helpers across command exec, config, filesystem,
device-key, external-agent config, fs-watch, and outgoing-message paths.
- Removed leaf wrapper handlers where the method body was only
forwarding to a response helper.
- Returned request validation errors upward in the simple cases instead
of sending an error locally and immediately returning.
## Verification
- `cargo test -p codex-app-server --lib command_exec::tests`
- `cargo test -p codex-app-server --lib outgoing_message::tests`
- `cargo test -p codex-app-server --lib in_process::tests`
- `cargo test -p codex-app-server --test all v2::fs`
- `cargo test -p codex-app-server --test all v2::config_rpc`
- `cargo test -p codex-app-server --test all v2::external_agent_config`
- `cargo test -p codex-app-server --test all v2::initialize`
- `just fix -p codex-app-server`
- `git diff --check`
Note: full `cargo test -p codex-app-server` was attempted and stopped in
`message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
with a stack overflow after unrelated tests had already passed.
## Why
After #19391, `PermissionProfile` and the split filesystem/network
policies could still be stored in parallel. That creates drift risk: a
profile can preserve deny globs, external enforcement, or split
filesystem entries while a cached projection silently loses those
details. This PR makes the profile the runtime source and derives
compatibility views from it.
## What Changed
- Removes stored filesystem/network sandbox projections from
`Permissions` and `SessionConfiguration`; their accessors now derive
from the canonical `PermissionProfile`.
- Derives legacy `SandboxPolicy` snapshots from profiles only where an
older API still needs that field.
- Updates MCP connection and elicitation state to track
`PermissionProfile` instead of `SandboxPolicy` for auto-approval
decisions.
- Adds semantic filesystem-policy comparison so cwd changes can preserve
richer profiles while still recognizing equivalent legacy projections
independent of entry ordering.
- Updates config/session tests to assert profile-derived projections
instead of parallel stored fields.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392).
* #19395
* #19394
* #19393
* __->__ #19392
## Why
This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.
`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.
The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.
## What Changed
- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
## Summary
This PR lets programmatic AgentIdentity users provide one token through
either stdin login or environment auth.
`codex login --with-agent-identity` reads an Agent Identity JWT from
stdin, validates that it has the required claims, and stores that token
as the `agent_identity` value in `auth.json`. The file format is
token-only; the decoded account and key fields are runtime state, not
hand-authored auth.json fields.
The Agent Identity JWT claim shape and decoder live in
`codex-agent-identity`; `codex-login` only owns env/storage precedence
and conversion into `CodexAuth::AgentIdentity`.
When env auth is enabled, `CODEX_AGENT_IDENTITY` can provide the same
JWT without writing auth state to disk. `CODEX_API_KEY` still wins if
both env vars are set.
Reference old stack: https://github.com/openai/codex/pull/17387/changes
Reference JWT/env stack: https://github.com/openai/codex/pull/18176
## Stack
1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. This PR: accept AgentIdentity JWTs through login/env
## Testing
Tests: targeted login and Agent Identity crate tests, CLI checks, scoped
formatter/linter cleanup, and CI.
---------
Co-authored-by: Shijie Rao <shijie.rao@openai.com>
## Why
Windows Bazel runs in the permissions stack exposed that app-server
integration tests were launching normal plugin startup warmups in every
subprocess. Those warmups can call
`https://chatgpt.com/backend-api/plugins/featured` when a test is not
specifically exercising plugin startup, which adds slow background work,
noisy stderr, and dependence on external network state. The relevant
startup/featured-plugin behavior was introduced across #15042 and
#15264.
A few app-server tests also had long optional waits or unbounded cleanup
paths, making failures expensive to diagnose and contributing to slow
Windows shards. One external-agent config test from #18246 used a
GitHub-style marketplace source, which was enough to exercise the
pending remote-import path but also meant the background completion task
could attempt a real clone.
## What Changed
- Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
plumbing and a hidden debug-only
`--disable-plugin-startup-tasks-for-tests` app-server flag, so
integration tests can suppress startup plugin warmups without adding a
production env-var gate.
- Has the app-server test harness pass that hidden flag by default,
while opting plugin-startup coverage back in for tests that
intentionally exercise startup sync and featured-plugin warmup behavior.
- Lowers normal app-server subprocess logging from `info`/`debug` to
`warn` to avoid multi-megabyte stderr output in Bazel logs.
- Prevents the external-agent config test from attempting a real
marketplace clone by using an invalid non-local source while still
exercising the pending-import completion path.
- Bounds optional filesystem/realtime waits and fake WebSocket
test-server shutdown so failures produce targeted timeouts instead of
hanging a shard.
- Fixes the Unix script-resolution test in `rmcp-client` to exercise
PATH resolution directly and include the actual spawn error in failures.
## Verification
- `cargo check -p codex-app-server`
- `cargo clippy -p codex-app-server --tests -- -D warnings`
- `cargo test -p codex-rmcp-client
program_resolver::tests::test_unix_executes_script_without_extension`
- `cargo test -p codex-app-server --test all
external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
-- --nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
--nocapture`
- Windows Local Bazel passed with this test-hardening bundle before it
was extracted from #19606.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19683
This removes the hidden `codex responses` CLI subcommand after
confirming no downstream callers rely on it, deleting the raw Responses
passthrough implementation, unregistering the subcommand, and dropping
the now-unused CLI dependencies on `codex-api` and
`codex-model-provider`.
Some providers of Responses API forward a model-defined `end_turn`
boolean indicating explicitly the model's indication of whether it would
like to end the turn or to be inferenced again. In this PR, we update
the sampling loop to use this field correctly if it's set. If the field
is not set by the provider, we fall back to the existing sampling logic.
Fixes multiple scrollback and terminal resize issues: #5538, #5576,
#8352, #12223, #16165, and #15380.
## Why
Codex writes finalized transcript output into terminal scrollback after
wrapping it for the current viewport width. A later terminal resize
could leave that scrollback shaped for the old width, so wider windows
kept narrow output and narrower windows could show stale wrapping
artifacts until enough new output replaced the visible area.
This is also the foundation PR for responsive markdown tables. Table
rendering needs finalized transcript content to be width-sensitive after
insertion, not only while content is first streaming. Markdown table
rendering itself stays in #18576.
## Stack
- PR1: resize backlog reflow and interrupt cleanup
- #18576: markdown table support
## What Changed
- Rebuild source-backed transcript history when the terminal width
changes. `terminal_resize_reflow` is introduced through the experimental
feature system, but is enabled by default for this rollout so we can
validate behavior across real terminals.
- Preserve assistant and plan stream source so finalized streaming
output can participate in resize reflow after consolidation.
- Debounce resize work, but force a final source-backed reflow when a
resize happened during active or unconsolidated streaming output.
- Clear stale pending history lines on resize so old-width wrapped
output is not emitted just before rebuilt scrollback.
- Bound replay work with `[tui.terminal_resize_reflow].max_rows`:
omitted uses terminal-specific defaults, `0` keeps all rendered rows,
and a positive value sets an explicit cap. The cap applies both while
initially replaying a resumed transcript into scrollback and when
rebuilding scrollback after terminal resize.
- Consolidate interrupted assistant streams before cleanup, then clear
pending stream output and active-tail state consistently.
- Move resize reflow and thread event buffering helpers out of `app.rs`
into dedicated TUI modules.
- Add focused coverage for resize reflow, feature-gated behavior,
streaming source preservation, interrupted output cleanup,
unicode-neutral text, terminal-specific row caps, and composer/layout
stability.
## Runtime Bounds
Resize reflow keeps only the most recent rendered rows when a row cap is
active. The default is `auto`, which maps to the detected terminal's
default scrollback size where Codex can identify it: VS Code `1000`,
Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`.
Terminals without a dedicated mapping use the conservative fallback of
`1000` rows. Users can override this with `[tui.terminal_resize_reflow]
max_rows = N`, or set `max_rows = 0` to disable row limiting.
## Validation
- `just fmt`
- `git diff --check`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui
transcript_reflow`
- `just fix -p codex-tui`
- PR CI in progress on the squashed branch
## Why
For npm/Bun-managed installs, the update prompt was treating the latest
GitHub release as ready to install. During the `0.124.0` release, GitHub
and npm visibility were not atomic: the root npm wrapper could become
visible before the npm registry marked that version as the package
`latest`. That left a window where users could be prompted to upgrade
before npm was ready for the release.
## What changed
- Keep GitHub Releases as the candidate latest-version source for
npm/Bun installs, but only write the existing `version.json` cache after
npm registry metadata proves that same root version is ready.
- Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by
checking `dist-tags.latest` and root package `dist` metadata for the
GitHub candidate version.
- Move version parsing helpers into
`codex-rs/tui/src/update_versions.rs` so that logic can be tested
without compiling the release-only `updates.rs` module under tests.
- Update `.github/workflows/rust-release.yml` so the six known platform
tarballs publish before the root `@openai/codex` wrapper. Other npm
tarballs publish before the root wrapper, and the SDK publishes after
the root package it depends on.
I think raising it to 45 minutes in
https://github.com/openai/codex/pull/19578 was a mistake for the reasons
explained in the comments in the code. Instead, we attempt to defend
against timeouts by increasing the number of shards in
`app-server-all-test` so that a "true failure" that gets run 3x should
not take as much wall clock time.
## Why
Windows can represent the same canonical local path with either a normal
drive path or a verbatim device path prefix. The failure pattern that
motivated this PR was an assertion diff like `C:\...` versus
`\\?\C:\...`: different spellings, same file.
That became visible while validating the permissions stack above this
PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
normalizes supported Windows device prefixes, while several existing
tests still built expected values directly with
`std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
raw `PathBuf`. On Windows, that can make tests fail because the two
sides choose different textual forms for an otherwise equivalent
canonical path.
This PR is intentionally split out as the bottom PR below #19606. The
runtime permissions migration should not carry unrelated Windows test
stabilization, and reviewers should be able to verify this as a
test-only change before looking at the larger permissions changes.
## Failure Modes Covered
- `conversation_summary` expected rollout paths were built from raw
canonicalized `PathBuf`s, while app-server responses could carry
`AbsolutePathBuf`-normalized paths.
- `thread_resume` compared returned thread paths directly to previously
stored or fixture paths, so a verbatim-prefix spelling could fail an
otherwise correct resume.
- `marketplace_add` compared plugin install roots through `as_path()`
against raw canonicalized paths, reproducing the same `C:\...` versus
`\\?\C:\...` mismatch in both app-server and core-plugin coverage.
## What Changed
- In `app-server/tests/suite/conversation_summary.rs`, normalize both
expected rollout paths and received `ConversationSummary.path` values
through `AbsolutePathBuf` before comparing the full summary object.
- In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
of thread path comparisons before asserting equality. This keeps the
tests focused on whether resume returned the same existing path, not
whether Windows used the same string spelling.
- In `app-server/tests/suite/v2/marketplace_add.rs` and
`core-plugins/src/marketplace_add.rs`, compare install roots as
`AbsolutePathBuf` values instead of comparing an absolute-path wrapper
to a raw canonicalized `PathBuf`.
## Behavior
This PR does not change production app-server or marketplace behavior.
It only changes tests to assert semantic path identity across Windows
path spelling variants. It also leaves API response values untouched;
the normalization happens inside assertions only.
## Verification
Targeted local checks run while extracting this fix:
- `cargo test -p codex-app-server
get_conversation_summary_by_thread_id_reads_rollout`
- `cargo test -p codex-app-server
get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
- `cargo test -p codex-app-server
thread_resume_prefers_path_over_thread_id`
Windows-specific confidence comes from the Bazel Windows CI job for this
PR, since the failure is platform-specific.
## Docs
No docs update is needed because this is test-only infrastructure
stabilization.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19604
## Why
`sandbox_permissions = "require_escalated"` is treated as an explicit
request to approve the command and run it outside the
filesystem/platform sandbox. Before this change, shell and unified exec
still registered managed network approval context and could inject
Codex-managed proxy state into the child process, which meant an
approved escalated command could still hit a second network approval
path.
This PR makes that escalation boundary consistent: once a command is
explicitly approved to run outside the sandbox, Codex does not also
route that process through the managed network proxy.
## Security impact
Command/filesystem sandbox approval now implies network approval for
that command. If an untrusted command or script is allowed to run with
`require_escalated`, its network calls are unsandboxed: Codex-managed
network allowlists and denylists are not respected for that process, so
the command can exfiltrate any data it can read.
## What changed
- Skip managed network approval specs for
`SandboxPermissions::RequireEscalated`.
- Pass `network: None` into shell, zsh-fork shell, and unified exec
sandbox preparation for explicitly escalated requests.
- Strip Codex-managed proxy environment variables when
`CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env
when the Codex marker is absent.
- Add regression coverage for the prepared exec request so the old
behavior cannot silently reappear.
## Verification
- `cargo test -p codex-core explicit_escalation`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
## Why
Fixes#19499.
The slash-command popup recalculated the command-name column from only
the rows visible in the current viewport. That made the description
column shift horizontally while scrolling through `/` commands whenever
longer command names entered or left the visible window.
## What Changed
`codex-rs/tui/src/bottom_pane/command_popup.rs` now uses the shared
selection-popup `AutoAllRows` column-width mode for both height
measurement and rendering. This keeps the command description column
based on the full filtered slash-command list instead of the current
viewport.
## Verification
- `cargo test -p codex-tui bottom_pane::command_popup`
Follow-up to #19266.
## Why
`thread_start_with_non_local_thread_store_does_not_create_local_persistence`
is meant to catch accidental local thread persistence when a non-local
thread store is configured. The Windows flake reported in [this
BuildBuddy
invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d)
showed that the assertion was tripping on an unexpected top-level `.tmp`
entry:
```diff
{
+ ".tmp",
"config.toml",
"installation_id",
"memories",
"skills",
}
```
That `.tmp` does not appear to come from `tempfile::TempDir`; it comes
from unrelated plugin startup work that can legitimately materialize
`codex_home/.tmp`, including the startup remote plugin sync marker in
[`core/src/plugins/startup_sync.rs`](bce74c70ce/codex-rs/core/src/plugins/startup_sync.rs (L13-L15))
and the curated plugin snapshot under
[`.tmp/plugins`](bce74c70ce/codex-rs/core-plugins/src/startup_sync.rs (L25-L26)).
That makes the regression race unrelated background startup tasks
instead of validating the thread-store invariant it was added to cover.
Rather than weakening the assertion to allow arbitrary `.tmp` entries,
this change isolates the test from plugin warmups so it can stay strict
about unexpected local thread persistence artifacts.
## What changed
- disable plugins in the generated config used by
`app-server/tests/suite/v2/remote_thread_store.rs`
- keep the existing `codex_home` assertions unchanged so the test still
fails if local session or sqlite persistence is introduced
## Verification
- `cargo test -p codex-app-server
suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence
-- --exact`
Fixes#15219.
## Why
`thread/resume` should continue a persisted thread with the same model
provider that created the thread. The app server already restores the
persisted model and reasoning effort before resuming, but it was leaving
`model_provider` unset. If a user created a thread with one provider and
later switched their active profile to another provider, resumed
encrypted history could be sent to the wrong endpoint and fail with
`invalid_encrypted_content`.
The thread metadata already records the original provider, so resume
should apply it when the caller has not explicitly requested a different
model/provider/reasoning configuration.
## What changed
This updates `merge_persisted_resume_metadata` in
`app-server/src/codex_message_processor.rs` to copy
`ThreadMetadata::model_provider` into `ConfigOverrides::model_provider`
alongside the persisted model.
The existing resume metadata tests now also assert that:
- the persisted provider is restored for normal resume
- explicit model, provider, or reasoning-effort overrides still prevent
persisted resume metadata from being applied
- a thread with no persisted model or reasoning effort still resumes
with its persisted provider
## Verification
- `cargo test -p codex-app-server` passed the app-server unit tests,
including the updated resume metadata coverage. The broader integration
portion of that command failed in an unrelated environment-sensitive
skills-budget warning assertion, where this run saw 8 omitted skills
instead of the expected 7.
- `just fix -p codex-app-server` completed successfully.
Unfortunately, if most of the build graph is invalidated such that there
are few cache hits, the Windows Bazel build for all the tests often
takes more than `30` minutes, so this PR increases the timeout to `45`
minutes until we set up distributed builds.
## Why
The visibility cleanup in the base PR reduced what `codex-mcp` exposes,
but several files still made reviewers read private support machinery
before the public or crate-facing entry points. This ordering pass makes
each file easier to scan: exported API first, crate-visible MCP
internals next, then private helpers in breadth-first order from the
higher-level MCP flows to leaf utilities.
## What Changed
- Reordered `codex-mcp` exports so the runtime, configuration, snapshot,
auth, and helper surfaces are grouped by visibility and reader
importance.
- Moved public and crate-visible MCP items ahead of private helpers in
the auth, MCP planning/snapshot, connection manager, and tool-name
modules.
- Kept the change mechanical, with no behavior changes intended.
## Verification
- `cargo check -p codex-mcp`
## Why
`codex-mcp` currently exposes more API than the rest of the workspace
uses. Some of that surface is simply visibility that can be tightened,
and some of it is public helper code that remains compiler-valid because
it is exported even though no workspace caller uses it.
That distinction matters: Rust does not warn on exported API just
because the current workspace does not call it. This PR intentionally
treats those exported-but-workspace-unreferenced paths as stale
`codex-mcp` surface. The main example is MCP skill dependency
collection, where the active implementation now lives in
`codex-rs/core/src/mcp_skill_dependencies.rs`; keeping the older
`codex-mcp` copy makes it unclear which implementation owns skill MCP
installation.
## What Changed
- Pruned unused `codex-mcp` re-exports from `codex-mcp/src/lib.rs`.
- Removed non-runtime helper methods from `McpConnectionManager` so it
stays focused on live MCP clients.
- Made `ToolPluginProvenance` lookup methods crate-private.
- Removed workspace-unreferenced snapshot wrapper APIs and
qualified-tool grouping helpers.
- Deleted the duplicate `codex-mcp` skill dependency module and tests
now that skill MCP dependency handling is owned by `core`.
## Verification
- `cargo check -p codex-mcp`
## Summary
- Mark `unavailable_dummy_tools` as a stable feature and enable it by
default
- Update the feature registry test to match the new default state
## Testing
- `just fmt`
- `cargo test -p codex-features`
## Why
Issue #19418 points out a small grammar issue in `codex-rs/README.md`
under "Code Organization." The current sentence says "we hope this to
be," which reads awkwardly.
Fixes#19418.
## What changed
Updated the `core/` crate description so the sentence reads "we hope
this becomes a library crate."
## Verification
Documentation-only change. Reviewed the Markdown diff.
## Why
Recent `main` CI repeatedly timed out in:
- `codex-core::all suite::approvals::approval_matrix_covers_all_modes`
It failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
[24905823212](https://github.com/openai/codex/actions/runs/24905823212),
[24903439629](https://github.com/openai/codex/actions/runs/24903439629),
[24903336028](https://github.com/openai/codex/actions/runs/24903336028),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
The failure pattern was a 60s Linux remote timeout. Logs showed many
approval scenarios completing before the single matrix test timed out.
## Root Cause
`approval_matrix_covers_all_modes` packed every approval/sandbox/tool
scenario into one test case. That made the test vulnerable to normal CI
variance: one slow scenario or a slow process startup could push the
whole monolithic case past the 60s per-test timeout. It also hid which
part of the matrix was slow because the runner only reported the one
large matrix test.
## What Changed
- Keep the shared `scenarios()` table as the single source of approval
matrix coverage.
- Use one `#[test_case]` per `ScenarioGroup` to generate five async
Tokio tests: danger/full-access, read-only, workspace-write,
apply-patch, and unified-exec.
- Keep the group runner small and add per-scenario error context so a
failure still reports the specific scenario name.
## Why This Should Be Reliable
Each scenario group now has its own test harness timeout instead of
sharing one timeout window with the full matrix. That removes the long
sequential loop from a single test while keeping the implementation
compact and easy to scan.
The tests still run through the same scenario definitions and runner, so
this preserves coverage. `test-case` already composes with
`#[tokio::test]` in this crate and is already available for test code.
## Verification
- `cargo test -p codex-core --test all approval_matrix_ -- --list`
- `cargo test -p codex-core --test all approval_matrix_`
Adds the TUI user experience for goals on top of the core runtime from
PR 4.
## Why
Users need a direct TUI control surface for long-running goals. The UI
should make the current goal visible, support common goal actions
without waiting for a model turn, and avoid confusing end-of-turn
notifications while an active goal is immediately continuing.
## What changed
- Added `/goal` summary rendering for the current goal, including
active, paused, budget-limited, and complete states.
- Added `/goal <objective>` creation/replacement through the app-server
goal API rather than a model prompt.
- Added `/goal clear`, `/goal pause`, and `/goal unpause` command
variants.
- Added a confirmation menu when the user enters a new goal while
another goal already exists.
- Updated `/goal` help and summary tip text so it reflects the supported
command variants without advertising slash-command token budgets.
- Added footer/statusline goal indicators, including elapsed time and
token budget display when a budget exists from API/tool-created goals.
- Consumes goal updated/cleared notifications so the TUI stays in sync
with external app-server changes.
- Suppresses end-of-turn desktop notifications only when a goal is still
active and follow-up work is expected.
- Preserves slash-command history behavior and avoids leaking queued
`/goal` state into unrelated submissions.
## Verification
- Added TUI unit and snapshot coverage for goal command availability,
summary rendering, control commands, replacement menu behavior,
status/footer display, notification handling, and command history.
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.
## Why
A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.
## What changed
- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.
## Verification
- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
Adds the model-facing goal tools on top of the app-server API from PR 2.
## Why
Once goals are persisted and exposed to clients, the model needs a
small, constrained tool surface for goal workflows. The tool contract
should let the model inspect goals, create them only when explicitly
requested, and mark them complete without giving it broad control over
user/runtime-owned state.
## What changed
- Added `get_goal`, `create_goal`, and `update_goal` tool specs behind
the `goals` feature flag.
- Added core goal tool handlers that validate objectives and token
budgets before mutating persisted state.
- Constrained `create_goal` to create only when no goal exists, with
optional `token_budget` only when a budget is explicitly provided.
- Tightened the `create_goal` instructions so the model does not infer
goals from ordinary task requests.
- Constrained `update_goal` to expose only goal completion; pause,
resume, clear, and budget-limited transitions remain user- or
runtime-controlled.
- Registered the goal tools in the tool registry and kept them out of
review contexts where they should not appear.
## Verification
- Added tool-registry coverage for feature gating and tool availability.
- Added core session tests for create/get/update behavior, duplicate
goal rejection, budget validation, and completion-only updates.
Adds the app-server v2 goal API on top of the persisted goal state from
PR 1.
## Why
Clients need a stable app-server surface for reading and controlling
materialized thread goals before the model tools and TUI can use them.
Goal changes also need to be observable by app-server clients, including
clients that resume an existing thread.
## What changed
- Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
RPCs for materialized threads.
- Added `thread/goal/updated` and `thread/goal/cleared` notifications so
clients can keep local goal state in sync.
- Added resume/snapshot wiring so reconnecting clients see the current
goal state for a thread.
- Added app-server handlers that reconcile persisted rollout state
before direct goal mutations.
- Updated the app-server README plus generated JSON and TypeScript
schema fixtures for the new API surface.
## Verification
- Added app-server v2 coverage for goal get/set/clear behavior,
notification emission, resume snapshots, and non-local thread-store
interactions.
Adds the persisted goal foundation for the rest of the stack. This PR is
intentionally limited to feature flag and state-layer behavior;
app-server APIs, model tools, runtime continuation, and TUI UX are
layered in later PRs.
## Why
Goal mode needs durable thread-level state before clients or model tools
can safely build on it. The state layer needs to know whether a goal
exists, what objective it tracks, whether it is active, paused,
budget-limited, or complete, and how much time/token usage has already
been accounted.
## What changed
- Added the `goals` feature flag and generated config schema entry.
- Added the `thread_goals` state table and Rust model for persisted
thread goals.
- Added state runtime APIs for creating, replacing, updating, deleting,
and accounting goal usage.
- Added `goal_id`-based stale update protection so an old goal update
cannot overwrite a replacement.
- Kept this PR scoped to persistence and state runtime behavior, with no
app-server, model-facing, continuation, or TUI behavior yet.
## Verification
- Added state runtime coverage for goal creation, replacement, stale
update protection, status transitions, token-budget behavior, and usage
accounting.
## Summary
Fix a Bazel-only path resolution bug in
`codex_utils_cargo_bin::cargo_bin`.
Under Bazel runfiles, `rlocation` can return a relative `bazel-out/...`
path even though `cargo_bin()` documents that it returns an absolute
path. That can break callers that store the returned binary path and
later spawn it after changing cwd, because the relative path is resolved
from the wrong directory.
This patch absolutizes the runfiles-resolved path before returning it.
## Summary
- update Codex issue automation to pin `openai/codex-action` to
`5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02`, the commit for `v1.7`
- keep the release intent visible with `# v1.7` comments beside the hash
pins
## Test plan
- `git diff --check`
- `yq e '.' .github/workflows/issue-labeler.yml`
- `yq e '.' .github/workflows/issue-deduplicator.yml`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`:
`FullAccess` meant the historical read-only/workspace-write modes could
read the full filesystem, while `Restricted` tried to carry partial
readable roots. The partial-read model now belongs in
`FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on
`SandboxPolicy` makes every legacy projection reintroduce lossy
read-root bookkeeping and creates unnecessary noise in the rest of the
permissions migration.
This PR makes the legacy policy model narrower and explicit:
`SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent
the old full-read sandbox modes only. Split readable roots, deny-read
globs, and platform-default/minimal read behavior stay in the runtime
permissions model.
## What changed
- Removes `ReadOnlyAccess` from
`codex_protocol::protocol::SandboxPolicy`, including the generated
`access` and `readOnlyAccess` API fields.
- Updates legacy policy/profile conversions so restricted filesystem
reads are represented only by `FileSystemSandboxPolicy` /
`PermissionProfile` entries.
- Keeps app-server v2 compatible with legacy `fullAccess` read-access
payloads by accepting and ignoring that no-op shape, while rejecting
legacy `restricted` read-access payloads instead of silently widening
them to full-read legacy policies.
- Carries Windows sandbox platform-default read behavior with an
explicit override flag instead of depending on
`ReadOnlyAccess::Restricted`.
- Refreshes generated app-server schema/types and updates tests/docs for
the simplified legacy policy shape.
## Verification
- `cargo check -p codex-app-server-protocol --tests`
- `cargo check -p codex-windows-sandbox --tests`
- `cargo test -p codex-app-server-protocol sandbox_policy_`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19449
## Why
When using the Amazon Bedrock provider with `openai.gpt-5.4-cmb`, the
model picker allowed `xhigh` because the CMB catalog entry was derived
from the bundled `gpt-5.4` reasoning metadata. Bedrock rejects that
effort level, causing the request to fail before the turn can run:
```text
{"error":{"code":"validation_error","message":"Failed to deserialize the JSON body into the target type: Invalid 'reasoning': Invalid 'effort': unknown variant `xhigh`, expected one of `high`, `low`, `medium`, `minimal` at line 1 column 77239","param":null,"type":"invalid_request_error"}}
```
## What Changed
- Replace the runtime lookup of bundled `gpt-5.4` metadata for
`openai.gpt-5.4-cmb` with an explicit Bedrock CMB `ModelInfo` entry.
- Advertise only the Bedrock-supported CMB reasoning levels: `minimal`,
`low`, `medium`, and `high`.
- Keep the existing GPT OSS Bedrock model metadata and reasoning levels
unchanged.
- Add catalog coverage for the hardcoded CMB metadata and
Bedrock-compatible reasoning level list.
## Why
This prepares feedback log capture for a future remote app-server hook
sink without changing the current local SQLite upload path. The
important boundary is now intentionally small: a log sink is a tracing
`Layer` that can also flush entries it has accepted.
That keeps the existing SQLite implementation simple while giving the
upcoming gRPC sink a place to fit beside it. SQLite and gRPC have
different worker/write semantics, so this PR avoids introducing a shared
buffered-sink abstraction and instead lets each `LogWriter` own the
buffering mechanics it needs.
## What Changed
- Added `LogSinkQueueConfig` with the existing local defaults: queue
capacity `512`, batch size `128`, and flush interval `2s`.
- Added `LogDbLayer::start_with_config(...)` while preserving
`LogDbLayer::start(...)` and `log_db::start(...)` defaults.
- Introduced the `LogWriter` trait as the minimal shared interface:
`tracing_subscriber::Layer` plus `flush()`.
- Made `LogDbLayer` implement `LogWriter`.
- Kept tracing event formatting inside `LogDbLayer`; it still creates
one `LogEntry` per tracing event before queueing it for SQLite.
- Kept normal event capture best-effort and non-blocking via bounded
`try_send`.
## Behavior Notes
This does not change the SQLite schema, retention behavior,
`/feedback/upload`, or Sentry upload behavior. Normal log events still
drop when the queue is full; explicit `flush()` still waits for queue
capacity and receiver processing before returning.
## Verification
- `cargo test -p codex-state log_db`
- `cargo test -p codex-state`
- `just fix -p codex-state`
The added tests cover configured batch-size flushing, configured
interval flushing, queue-full drops, and the flush barrier semantics.
## Summary
- include the outer tool `call_id` in Codex Apps MCP request metadata
under `_meta._codex_apps.call_id`
- preserve existing Codex Apps metadata like `resource_uri` and
`contains_mcp_source`
- add request metadata coverage for both the existing-metadata and
no-existing-metadata cases
## Why
The paired backend change in
[openai/openai#850796](https://github.com/openai/openai/pull/850796)
updates MCP compliance logging to prefer `_meta._codex_apps.call_id`
instead of the JSON-RPC request id. This client change sends that outer
tool call id so the backend can record the model/tool call identifier
when it is available.
This is wire-compatible with older backends because `_meta._codex_apps`
is already reserved backend-only metadata. Backends that do not read
`call_id` will ignore the extra field.
## Testing
- `cargo test -p codex-core request_meta`
- `just fmt`
- `just fix -p codex-core`
- Add an integration test that guarantees nothing gets written to codex
home dir or sqlite when running a rollout with a non-local ThreadStore
- Add an in-memory "spy" ThreadStore for tests like this
Note I could not find a good way to also ensure there were no filesystem
_reads_ that didn't go through threadstore. I explored a more elaborate
sandboxed-subprocess approach but it isn't platform portable and felt
like it wasn't (yet) worth it.
## Summary
- Mirrors the OpenAI Docs skill cleanup in the bundled Codex skill copy
- Clarifies reasoning-effort recommendation wording
- Replaces internal snake_case prompt block names with natural-language
guidance aligned to the prompting guide
## Test plan
- `git diff --check`
- Verified the old snake_case prompt block names no longer appear in the
bundled upgrade guide
## Why
The VS Code extension and desktop app do not need the full TUI binary,
and `codex-app-server` is materially smaller than standalone `codex`. We
still want to publish it as an official release artifact, but building
it by tacking another `--bin` onto the existing release `cargo build`
invocations would lengthen those jobs.
This change keeps `codex-app-server` on its own release bundle so it can
build in parallel with the existing `codex` and helper bundles.
## What changed
- Made `.github/workflows/rust-release.yml` bundle-aware so each macOS
and Linux MUSL target now builds either the existing `primary` bundle
(`codex` and `codex-responses-api-proxy`) or a standalone `app-server`
bundle (`codex-app-server`).
- Preserved the historical artifact names for the primary macOS/Linux
bundles so `scripts/stage_npm_packages.py` and
`codex-cli/scripts/install_native_deps.py` continue to find release
assets under the paths they already expect, while giving the new
app-server artifacts distinct names.
- Added a matching `app-server` bundle to
`.github/workflows/rust-release-windows.yml`, and updated the final
Windows packaging job to download, sign, stage, and archive
`codex-app-server.exe` alongside the existing release binaries.
- Generalized the shared signing actions in
`.github/actions/linux-code-sign/action.yml`,
`.github/actions/macos-code-sign/action.yml`, and
`.github/actions/windows-code-sign/action.yml` so each workflow row
declares its binaries once and reuses that list for build, signing, and
staging.
- Added `codex-app-server` to `.github/dotslash-config.json` so releases
also publish a generated DotSlash manifest for the standalone app-server
binary.
- Kept the macOS DMG focused on the existing `primary` bundle;
`codex-app-server` ships as the regular standalone archives and DotSlash
manifest.
## Verification
- Parsed the modified workflow and action YAML files locally with
`python3` + `yaml.safe_load(...)`.
- Parsed `.github/dotslash-config.json` locally with `python3` +
`json.loads(...)`.
- Reviewed the resulting release matrices, artifact names, and packaging
paths to confirm that `codex-app-server` is built separately on macOS,
Linux MUSL, and Windows, while the existing npm staging and Windows
`codex` zip bundling contracts remain intact.
## Summary
- Mirrors openai/skills#374 in the Codex bundled OpenAI Docs skill
- Adds `gpt-image-2` as the best image generation/edit model
- Updates `gpt-image-1.5` to less expensive image generation/edit
quality
## Test plan
- `git diff --check`
## Why
We already prefer shipping the MUSL Linux builds, and the in-repo
release consumers resolve Linux release assets through the MUSL targets.
Keeping the GNU release jobs around adds release time and extra assets
without serving the paths we actually publish and consume.
This is also easier to reason about as a standalone change: future work
can point back to this PR as the intentional decision to stop publishing
`x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu` release
artifacts.
## What changed
- Removed the `x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu`
entries from the `build` matrix in `.github/workflows/rust-release.yml`.
- Added a short comment in that matrix documenting that Linux release
artifacts intentionally ship MUSL-linked binaries.
## Verification
- Reviewed `.github/workflows/rust-release.yml` to confirm that the
release workflow now only builds Linux release artifacts for
`x86_64-unknown-linux-musl` and `aarch64-unknown-linux-musl`.
- Route cold thread/resume and thread/fork source loading through
ThreadStore reads instead of direct rollout path operations
- Keep lookups that explicitly specify a rollout-path using the local
thread store methods but return an invalid-request error for remote
ThreadStore configurations
- Add some additional unit tests for code path coverage
## Why
The profile conversion path still required a `cwd` even when it was only
translating a legacy `SandboxPolicy` into a `PermissionProfile`. That
made profile producers invent an ambient `cwd`, which is exactly the
anchoring we are trying to remove from permission-profile data. A legacy
workspace-write policy can be represented symbolically instead: `:cwd =
write` plus read-only `:project_roots` metadata subpaths.
This PR creates that cwd-free base so the rest of the stack can stop
threading cwd through profile construction. Callers that actually need a
concrete runtime filesystem policy for a specific cwd still have an
explicitly named cwd-bound conversion.
## What Changed
- `PermissionProfile::from_legacy_sandbox_policy` now takes only
`&SandboxPolicy`.
- `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the
symbolic, cwd-free projection for profiles.
- The old concrete projection is retained as
`FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for
runtime/boundary code that must materialize legacy cwd behavior.
- Workspace-write profiles preserve `CurrentWorkingDirectory` and
`ProjectRoots` special entries instead of materializing cwd into
absolute paths.
## Verification
- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-exec -p
codex-exec-server -p codex-tui -p codex-sandboxing -p
codex-linux-sandbox -p codex-analytics --tests`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p
codex-sandboxing -p codex-linux-sandbox -p codex-analytics`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19414
Selection menus in the TUI currently let disabled rows interfere with
numbering and default focus. This makes mixed menus harder to read and
can land selection on rows that are not actionable. This change updates
the shared selection-menu behavior in list_selection_view so disabled
rows are not selected when these views open, and prevents them from
being numbered like selectable rows.
- Disabled rows no longer receive numeric labels
- Digit shortcuts map to enabled rows only
- Default selection moves to the first enabled row in mixed menus
- Updated affected snapshot
- Added snapshot coverage for a plugin detail error popup
- Added a focused unit test for shared selection-view behavior
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Switch Unix socket app-server connections to perform the standard
WebSocket HTTP Upgrade handshake
- Update the Unix socket test to exercise a real upgrade over the Unix
stream
- Refresh the app-server README to describe the new Unix socket behavior
## Testing
- `cargo test -p codex-app-server transport::unix_socket_tests`
- `just fmt`
- `git diff --check`
## Why
`thread/fork` responses intentionally include copied history so the
caller can render the fork immediately, but `thread/started` is a
lifecycle notification. The v2 `Thread` contract says notifications
should return `turns: []`, and the fork path was reusing the response
thread directly, causing copied turns to be emitted through
`thread/started` as well.
## What Changed
- Route app-server `thread/started` notification construction through a
helper that clears `thread.turns` before sending.
- Keep `thread/fork` responses unchanged so callers still receive copied
history.
- Add persistent and ephemeral fork coverage that asserts
`thread/started` emits an empty `turns` array while the response retains
fork history.
## Testing
- `just fmt`
- `cargo test -p codex-app-server`
## Why
`openai.gpt-5.4-cmb` is served through the Amazon Bedrock provider,
whose request validator currently accepts `function` and `mcp` tool
specs but rejects Responses `custom` tools. The CMB catalog entry reuses
the bundled `gpt-5.4` metadata, which marks `apply_patch_tool_type` as
`freeform`. That causes Codex to include an `apply_patch` tool with
`type: "custom"`, so even heavily disabled sessions can fail before the
model runs with:
```text
Invalid tools: unknown variant `custom`, expected `function` or `mcp`
```
This is provider-specific: the model should still expose `apply_patch`,
but for Bedrock it needs to use the JSON/function tool shape instead of
the freeform/custom shape.
## What Changed
- Override the `openai.gpt-5.4-cmb` static catalog entry to set
`apply_patch_tool_type` to `function` after inheriting the rest of the
`gpt-5.4` model metadata.
- Update the catalog test expectation so the CMB entry continues to
track `gpt-5.4` metadata except for this Bedrock-specific tool shape
override.
## Verification
- `cargo test -p codex-model-provider`
- `just fix -p codex-model-provider`
## Summary
This PR hardens package-manager usage across the repo to reduce
dependency supply-chain risk. It also removes the stale `codex-cli`
Docker path, which was already broken on `main`, instead of keeping a
bitrotted container workflow alive.
## What changed
- Updated pnpm package manager pins and workspace install settings.
- Removed stale `codex-cli` Docker assets instead of trying to keep a
broken local container path alive.
- Added uv settings and lockfiles for the Python SDK packages.
- Updated Python SDK setup docs to use `uv sync`.
## Why
This is primarily a security hardening change. It reduces
package-install and supply-chain risk by ensuring dependency installs go
through pinned package managers, committed lockfiles, release-age
settings, and reviewed build-script controls.
For `codex-cli`, the right follow-up was to remove the local Docker path
rather than keep patching it:
- `codex-cli/Dockerfile` installed `codex.tgz` with `npm install -g`,
which bypassed the repo lockfile and age-gated pnpm settings.
- The local `codex-cli/scripts/build_container.sh` helper was already
broken on `main`: it called `pnpm run build`, but
`codex-cli/package.json` does not define a `build` script.
- The container path itself had bitrotted enough that keeping it would
require extra packaging-specific behavior that was not otherwise needed
by the repo.
## Gaps addressed
- Global npm installs bypassed the repo lockfile in Docker and CLI
reinstall paths, including `codex-cli/Dockerfile` and
`codex-cli/bin/codex.js`.
- CI and Docker pnpm installs used `--frozen-lockfile`, but the repo was
missing stricter pnpm workspace settings for dependency build scripts.
- Python SDK projects had `pyproject.toml` metadata but no committed
`uv.lock` coverage or uv age/index settings in `sdk/python` and
`sdk/python-runtime`.
- The secure devcontainer install path used npm/global install behavior
without a local locked package-manager boundary.
- The local `codex-cli` Docker helper was already broken on `main`, so
this PR removes that stale Docker path instead of preserving a broken
surface.
- pnpm was already pinned, but not to the current repo-wide pnpm version
target.
## Verification
- `pnpm install --frozen-lockfile`
- `.devcontainer/codex-install`: `pnpm install --prod --frozen-lockfile`
- `.devcontainer/codex-install`: `./node_modules/.bin/codex --version`
- `sdk/python`: `uv lock --check`, `uv sync --locked --all-extras
--dry-run`, `uv build`
- `sdk/python-runtime`: `uv lock --check`, `uv sync --locked --dry-run`,
`uv build --wheel`
- `pnpm -r --filter ./sdk/typescript run build`
- `pnpm -r --filter ./sdk/typescript run lint`
- `pnpm -r --filter ./sdk/typescript run test`
- `node --check codex-cli/bin/codex.js`
- `docker build -f .devcontainer/Dockerfile.secure -t codex-secure-test
.`
- `cargo build -p codex-cli`
- repo-wide package-manager audit
## Summary
Updates the bundled OpenAI Docs system skill for GPT-5.5.
## Changes
- Updates the bundled latest-model fallback
- Replaces bundled upgrade guidance with GPT-5.5 migration guidance
- Replaces bundled prompting guidance with GPT-5.5 prompting guidance
## Test plan
- Ran `node scripts/resolve-latest-model-info.js`
- Verified bundled files match the OpenAI Docs skill fallback content
## Why
The elevated Windows command runner currently trusts the first process
that connects to its parent-created named pipes. Tightening the pipe ACL
already narrows who can reach that boundary, but verifying the connected
client PID gives the parent one more fail-closed check: it only accepts
the exact runner process it just spawned.
## What changed
- validate `GetNamedPipeClientProcessId` after `ConnectNamedPipe` and
reject clients whose PID does not match the spawned runner
- also did some code de-duplication to route the one-shot elevated
capture flow in `windows-sandbox-rs/src/elevated_impl.rs` through
`spawn_runner_transport()` so both elevated codepaths use the same pipe
bootstrap and PID validation
Using the transport unification here also reduces duplication in the
elevated Windows IPC bootstrap, so future hardening to the runner
handshake only needs to land in one place.
## Validation
- `cargo test -p codex-windows-sandbox`
- manual testing: one-shot elevated path via `target/debug/codex.exe
exec` running a randomized shell command and confirming captured output
- manual testing: elevated session path via `target/debug/codex.exe -c
'windows.sandbox="elevated"' sandbox windows -- python -u -c ...` with
stdin/stdout round-trips (`READY`, then `GOT:...` for two input lines)
---------
Co-authored-by: viyatb-oai <viyatb@openai.com>
Fix a bug where the `turn/interrupt` RPC hangs when interrupting a turn
that has already completed.
Before this change, `turn/interrupt` requests were queued in app-server
and only answered when a later TurnAborted event arrived. If the target
turn was already complete, core treated Op::Interrupt as a no-op, so no
abort event was emitted and the RPC could hang indefinitely.
This change fixes that in two places:
* Reject turn/interrupt immediately with `INVALID_REQUEST` when the
requested turn is no longer the active turn.
* Resolve any already-accepted pending interrupt requests when the turn
reaches TurnComplete, covering the case where a turn finishes naturally
after the interrupt request is accepted but before it aborts.
I tested this by adding a failing test in
707487c063. You may view the results here:
https://github.com/openai/codex/actions/runs/24585182419/
<img width="1512" height="310" alt="CleanShot 2026-04-17 at 16 33 30@2x"
src="https://github.com/user-attachments/assets/f4a88228-b2a4-41f4-9aaa-ec82814096af"
/>
## Why
Agent interruptions currently always persist a model-visible
interrupted-turn marker before emitting `TurnAborted`. That marker is
useful by default because it gives the next model turn context about a
deliberately interrupted task, but some deployments need to suppress
that history injection entirely while still keeping the client-visible
interruption event.
## What changed
- Add `[agents] interrupt_message = false` to disable the model-visible
interrupted-turn marker.
- Resolve the setting into `Config::agent_interrupt_message_enabled`,
defaulting to `true` so existing behavior is unchanged.
- Apply the setting to both live interrupted turns and interrupted fork
snapshots.
- Keep emitting `TurnAborted` even when the history marker is disabled.
- Regenerate `core/config.schema.json` for the new
`agents.interrupt_message` field.
## Testing
- `cargo test -p codex-core load_config_resolves_agent_interrupt_message
-- --nocapture`
- `cargo test -p codex-core
disabled_interrupted_fork_snapshot_appends_only_interrupt_event --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_developer_input_message --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_can_disable_interrupted_marker --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo check -p codex-core`
## Summary
- Thread `agent_max_threads` into `ToolsConfig` and
`SpawnAgentToolOptions`.
- Render the configured `max_concurrent_threads_per_session` value in
the MultiAgentV2 `spawn_agent` description.
- Cover the description text in `codex-tools` unit tests and
`codex-core` tool spec tests.
## Validation
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core spawn_agent_description`
- `git diff --check`
## Notes
- `cargo test -p codex-core` was also attempted, but unrelated
environment-sensitive tests failed with the active local environment.
Examples: approvals reviewer defaults observed `AutoReview` instead of
`User`, request-permissions event tests did not emit events, and
proxy-env tests saw `http://127.0.0.1:50604` from the active proxy
environment.
Co-authored-by: Codex <noreply@openai.com>
## Why
`MultiAgentV2` follow-up messages are delivered to agents as
assistant-authored `InterAgentCommunication` envelopes. When
`followup_task` used `interrupt: true`, the interrupted-turn guidance
was still persisted as a contextual user message, so model-visible
history made a system-generated interruption boundary look
user-authored.
This keeps interruption guidance consistent with the rest of the v2
inter-agent message stream while preserving the legacy marker shape for
non-v2 sessions.
## What changed
- Make `interrupted_turn_history_marker` feature-aware.
- Record the interrupted-turn marker as an assistant `OutputText`
message when `Feature::MultiAgentV2` is enabled.
- Keep the existing user contextual fragment for non-v2 sessions.
- Apply the same feature-aware marker to interrupted fork snapshots.
- Add coverage for the live `followup_task` interrupt path and the
helper-level v2 marker shape.
## Testing
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_assistant_output_message --
--nocapture`
- `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`
Supersedes #18735.
The scheduled rust-release-prepare workflow force-pushed
`bot/update-models-json` back to the generated models.json-only diff,
which dropped the test and snapshot updates needed for CI.
This PR keeps the latest generated `models.json` from #18735 and adds
the corresponding fixture updates:
- preserve model availability NUX in the app-server model cache fixture
- update core/TUI expectations for the new `gpt-5.4` `xhigh` default
reasoning
- refresh affected TUI chatwidget snapshots for the `gpt-5.5`
default/model copy changes
Validation run locally while preparing the fix:
- `just fmt`
- `cargo test -p codex-app-server model_list`
- `cargo test -p codex-core includes_no_effort_in_request`
- `cargo test -p codex-core
includes_default_reasoning_effort_in_request_when_defined_by_model_info`
- `cargo test -p codex-tui --lib chatwidget::tests`
- `cargo insta pending-snapshots`
---------
Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
## Summary
Fixes#19022.
`codex exec --json` currently emits `turn.completed.usage` with input,
cached input, and output token counts, but drops the reasoning-token
split that Codex already receives through thread token usage updates.
Programmatic consumers that rely on the JSON stream, especially
ephemeral runs that do not write rollout files, need this field to
accurately display reasoning-model usage.
This PR adds `reasoning_output_tokens` to the public exec JSON `Usage`
payload and maps it from the existing `ThreadTokenUsageUpdated` total
token usage data.
## Verification
- Added coverage to
`event_processor_with_json_output::token_usage_update_is_emitted_on_turn_completion`
so `turn.completed.usage.reasoning_output_tokens` is asserted.
- Updated SDK expectations for `run()` and `runStreamed()` so TypeScript
consumers see the new usage field.
- Ran `cargo test -p codex-exec`.
- Ran `pnpm --filter ./sdk/typescript run build`.
- Ran `pnpm --filter ./sdk/typescript run lint`.
- Ran `pnpm --filter ./sdk/typescript exec jest --runInBand
--testTimeout=30000`.
## Summary
Fixes#19275.
Codex runtime rejects inline MCP `bearer_token` config entries and asks
users to configure `bearer_token_env_var` instead, but the generated
config schema still advertised `mcp_servers.<name>.bearer_token` as a
supported field. That made editor/schema validation disagree with
runtime validation.
This keeps `bearer_token` in `RawMcpServerConfig` so Codex can continue
producing the targeted runtime error for recent or existing configs, but
skips the field during schemars generation. The checked-in
`core/config.schema.json` fixture now exposes `bearer_token_env_var`
without exposing unsupported inline `bearer_token`.
## Verification
- Added `config_schema_hides_unsupported_inline_mcp_bearer_token` to
assert the generated schema hides `bearer_token` while preserving
`bearer_token_env_var`.
- Ran `cargo test -p codex-config`.
- Ran `cargo test -p codex-core config_schema`.
we were not respecting turn's `truncation_policy` to clamp output tokens
for `unified_exec` and `write_stdin`.
this meant truncation was only being applied by `ContextManager` before
the output was stored in-memory (so it _was_ being truncated from
model-visible context), but the full output was persisted to rollout on
disk.
now we respect that `truncation_policy` and `ContextManager`-level
truncation remains a backup.
### Tests
added tests, tested locally.
## Summary
`codex.emitImage` accepted arbitrary image MIME types for byte payloads
and data URLs. That allowed a value like `image/rgba` to be wrapped as
an `input_image`, even though it is not a supported encoded image
format, so the invalid image could reach the model-input path and
trigger output sanitization.
This results in a panic in debug builds because the output sanitization
is meant as a final safety net, not a primary means of rejecting invalid
image types. I've hit this case multiple times when executing certain
long-running tasks.
This PR rejects unsupported image MIME types before they are emitted
from `js_repl`.
## Changes
- Validate `codex.emitImage({ bytes, mimeType })` in the JS kernel so
only encoded PNG, JPEG, WebP, or GIF payloads are accepted.
- Apply the same MIME allowlist to direct image data URLs, including the
Rust host-side validation path.
- Clarify the JS REPL instructions so agents know byte payloads must
already be encoded as PNG/JPEG/WebP/GIF.
## Why
A rerun of the Windows Bazel clippy job after
[#19161](https://github.com/openai/codex/pull/19161) had exactly the
cache behavior we wanted in BuildBuddy: zero action-cache misses. Even
so, the GitHub job still took a little over five minutes.
The problem was that the job was paying for two separate Bazel startup
paths:
1. a `bazel query` to discover extra lint targets
2. the real `bazel build --config=clippy ...` invocation
On Windows, that query was bypassing the CI Bazel wrapper, so it did not
reuse the same `--output_user_root`, CI config, or remote-cache setup as
the real build. In practice that meant the rerun could still cold-start
a separate Bazel server before the actual clippy build even began.
## What
- add `.github/scripts/run-bazel-query-ci.sh` to run CI-side Bazel
queries with the same startup and cache-related flags as the main Bazel
command
- switch `scripts/list-bazel-clippy-targets.sh` to use that helper for
manual `rust_test` target discovery
- switch `tools/argument-comment-lint/list-bazel-targets.sh` to use the
same helper
- simplify `.github/scripts/run-argument-comment-lint-bazel.sh` so its
Windows-only query path also goes through the shared helper
This keeps the target-discovery queries aligned with the later
build/test invocation instead of treating them as a separate cold Bazel
session.
## Verification
- `bash -n .github/scripts/run-bazel-query-ci.sh`
- `bash -n scripts/list-bazel-clippy-targets.sh`
- `bash -n tools/argument-comment-lint/list-bazel-targets.sh`
- `bash -n .github/scripts/run-argument-comment-lint-bazel.sh`
- mocked a Windows invocation of `run-bazel-query-ci.sh` and verified it
forwards `--output_user_root`, `--config=ci-windows`, the BuildBuddy
auth header, and the repository cache flags
## Docs
No documentation updates are needed.
Fixes#19257.
## Summary
Agent roles declared in config layers can set `config_file` to a
relative path, but deserializing the layer-local `[agents.*]` table
happened without an `AbsolutePathBuf` base path. That caused configs
like `config_file = "agents/my-role.toml"` to fail with `AbsolutePathBuf
deserialized without a base path`.
This updates agent role layer loading to deserialize `[agents.*]` while
the layer config folder is active as the path base, matching the
behavior documented for `AgentRoleToml.config_file`. It also adds
coverage for a user config layer with a relative agent role
`config_file`.
## Why
`PermissionProfile` is becoming the canonical permissions abstraction,
but the old shape only carried optional filesystem and network fields.
It could describe allowed access, but not who is responsible for
enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy
when profiles were exported, cached, or round-tripped through app-server
APIs.
The important model change is that active permissions are now a disjoint
union over the enforcement mode. Conceptually:
```rust
pub enum PermissionProfile {
Managed {
file_system: FileSystemSandboxPolicy,
network: NetworkSandboxPolicy,
},
Disabled,
External {
network: NetworkSandboxPolicy,
},
}
```
This distinction matters because `Disabled` means Codex should apply no
outer sandbox at all, while `External` means filesystem isolation is
owned by an outside caller. Those are not equivalent to a broad managed
sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an
inner sandbox may require the outer Codex layer to use no sandbox rather
than a permissive one.
## How Existing Modeling Maps
Legacy `SandboxPolicy` remains a boundary projection, but it now maps
into the higher-fidelity profile model:
- `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed`
with restricted filesystem entries plus the corresponding network
policy.
- `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving
the “no outer sandbox” intent instead of treating it as a lax managed
sandbox.
- `ExternalSandbox { network_access }` maps to
`PermissionProfile::External { network }`, preserving external
filesystem enforcement while still carrying the active network policy.
- Split runtime policies that legacy `SandboxPolicy` cannot faithfully
express, such as managed unrestricted filesystem plus restricted
network, stay `Managed` instead of being collapsed into
`ExternalSandbox`.
- Per-command/session/turn grants remain partial overlays via
`AdditionalPermissionProfile`; full `PermissionProfile` is reserved for
complete active runtime permissions.
## What Changed
- Change active `PermissionProfile` into a tagged union: `managed`,
`disabled`, and `external`.
- Keep partial permission grants separate with
`AdditionalPermissionProfile` for command/session/turn overlays.
- Represent managed filesystem permissions as either `restricted`
entries or `unrestricted`; `glob_scan_max_depth` is non-zero when
present.
- Preserve old rollout compatibility by accepting the pre-tagged `{
network, file_system }` profile shape during deserialization.
- Preserve fidelity for important edge cases: `DangerFullAccess`
round-trips as `disabled`, `ExternalSandbox` round-trips as `external`,
and managed unrestricted filesystem + restricted network stays managed
instead of being mistaken for external enforcement.
- Preserve configured deny-read entries and bounded glob scan depth when
full profiles are projected back into runtime policies, including
unrestricted replacements that now become `:root = write` plus deny
entries.
- Regenerate the experimental app-server v2 JSON/TypeScript schema and
update the `command/exec` README example for the tagged
`permissionProfile` shape.
## Compatibility
Legacy `SandboxPolicy` remains available at config/API boundaries as the
compatibility projection. Existing rollout lines with the old
`PermissionProfile` shape continue to load. The app-server
`permissionProfile` field is experimental, so its v2 wire shape is
intentionally updated to match the higher-fidelity model.
## Verification
- `just write-app-server-schema`
- `cargo check --tests`
- `cargo test -p codex-protocol permission_profile`
- `cargo test -p codex-protocol
preserving_deny_entries_keeps_unrestricted_policy_enforceable`
- `cargo test -p codex-app-server-protocol
permission_profile_file_system_permissions`
- `cargo test -p codex-app-server-protocol serialize_client_response`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox`
- `just fix`
- `just fix -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
## Summary
- Add a remote plugin install write call that POSTs the selected remote
plugin to the ChatGPT cloud plugin API.
- Align remote install with the latest remote read contract:
`pluginName` carries the backend remote plugin id directly, for example
`plugins~Plugin_linear`, and install no longer synthesizes
`<name>@<marketplace>` ids.
- Validate remote install ids with the same character rules as remote
read, return the same install response shape as local installs, and
include mocked app-server coverage for the write path.
## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all plugin_install`
- `cargo test -p codex-core-plugins`
- `just fix -p codex-app-server`
- `just fix -p codex-core-plugins`
## Why
Device-key providers should only own platform key material. The
account/client binding used to authorize a signing payload is app-server
state, and keeping that state in provider-specific metadata makes the
same check harder to audit and harder to share across platform
implementations.
Persisting the binding in the shared state database gives the device-key
crate a platform-neutral source of truth before it asks a provider to
sign. It also lets app-server move potentially blocking key operations
off the main message processor path, which matters once providers may
wait for OS authentication prompts.
## What changed
- Add a `device_key_bindings` state migration plus `StateRuntime`
helpers keyed by `key_id`.
- Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key`
and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`.
- Keep provider calls behind async store methods and run the synchronous
provider work through `spawn_blocking`.
- Wire app-server device-key RPC handling to the SQLite-backed binding
store and spawn response/error delivery tasks for device-key requests.
- Run the turn-start tracing test on the existing larger current-thread
test harness after the larger async surface made the default test stack
too small locally.
## Validation
- `cargo test -p codex-device-key`
- `cargo test -p codex-state device_key`
- `cargo test -p codex-state`
- `cargo test -p codex-app-server device_key`
- `cargo test -p codex-app-server
message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
- `cargo test -p codex-app-server`
- `just fix -p codex-device-key`
- `just fix -p codex-state`
- `just fix -p codex-app-server`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `git diff --check`
## Why
`codex-models-manager` had grown to own provider-specific concerns:
constructing OpenAI-compatible `/models` requests, resolving provider
auth, emitting request telemetry, and deciding how provider catalogs
should be sourced. That made the manager harder to reuse for providers
whose model catalog is not fetched from the OpenAI `/models` endpoint,
such as Amazon Bedrock.
This change moves provider-specific model discovery behind
provider-owned implementations, so the models manager can focus on
refresh policy, cache behavior, picker ordering, and model metadata
merging.
## What Changed
- Introduced a `ModelsManager` trait with separate `OpenAiModelsManager`
and `StaticModelsManager` implementations.
- Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives
outside `codex-models-manager`.
- Moved `/models` request construction, provider auth resolution,
timeout handling, and request telemetry into `codex-model-provider` via
`OpenAiModelsEndpoint`.
- Added provider-owned `models_manager(...)` construction so configured
OpenAI-compatible providers use `OpenAiModelsManager`, while
static/catalog-backed providers can return `StaticModelsManager`.
- Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock
model IDs.
- Updated core/session/thread manager code and tests to depend on
`Arc<dyn ModelsManager>`.
- Moved offline model test helpers into
`codex_models_manager::test_support`.
## Metadata References
The Bedrock catalog metadata is based on the official Amazon Bedrock
OpenAI model documentation:
- [Amazon Bedrock OpenAI
models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html)
lists the Bedrock model IDs, text input/output modalities, and `128,000`
token context window for `gpt-oss-20b` and `gpt-oss-120b`.
- [Amazon Bedrock `gpt-oss-120b` model
card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html)
lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the
`bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities,
and `128K` context window.
- [OpenAI `gpt-oss-120b` model
docs](https://developers.openai.com/api/docs/models/gpt-oss-120b)
document configurable reasoning effort with `low`, `medium`, and `high`,
plus text input/output modality.
The display names, default reasoning effort, and priority ordering are
Codex-local catalog choices.
## Test Plan
- Manually verified app-server model listing with an AWS profile:
```shell
CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \
--codex-bin ./target/debug/codex \
-c 'model_provider="amazon-bedrock"' \
-c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \
-c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \
model-list
```
The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0`
as the default model and `openai.gpt-oss-20b-1:0` as the second listed
model, both text-only and supporting low/medium/high reasoning effort.
## Summary
Adds the remaining session and multi-agent edge wiring needed to
reconstruct rollout relationships across spawned agents, resumed
sessions, and parent/child message delivery.
## Stack
This is PR 4/5 in the rollout trace stack.
- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command
## Review Notes
This is the stack layer that makes traces useful for multi-threaded
agent workflows. The main invariant is that reconstructed relationships
should come from durable rollout data rather than transient in-memory
manager state wherever possible.
The PR is intentionally small relative to the preceding layers: it uses
the recorder and reducer contracts already established by the stack and
only adds the session/agent relationship events needed by later debug
reduction.
## Summary
Adds the debug CLI entry point for reducing recorded rollout traces.
This gives developers a direct way to inspect whether the emitted trace
stream reduces into the expected conversation/runtime model.
## Stack
This is PR 5/5 in the rollout trace stack.
- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command
## Review Notes
This PR is intentionally last: it depends on the trace crate, core
recorder, runtime/tool events, and session/agent edge data all existing.
The command should remain a debug/developer tool and avoid adding new
runtime behavior.
The useful review question is whether the CLI exposes the reducer in the
smallest practical way for local inspection without turning the debug
command into a supported user-facing workflow.
## Why
AWS/Bedrock mode currently reports `account: null` with
`requiresOpenaiAuth: false` from `account/read`. That suppresses the
OpenAI-auth requirement, but it does not let app clients distinguish AWS
auth from any other non-OpenAI custom provider. For the prototype AWS
provider UX, clients need a simple provider-derived signal so they can
suppress ChatGPT/API-key login and token-refresh paths without
hardcoding Bedrock checks.
## What changed
- Adds an `aws` variant to the v2 `Account` protocol union.
- Adds `ProviderAccountKind` to `codex-model-provider` so the runtime
provider owns the app-visible account classification.
- Makes Amazon Bedrock return `ProviderAccountKind::Aws` from the
model-provider layer.
- Updates app-server `account/read` to map `ProviderAccountKind` to the
existing `GetAccountResponse` wire shape.
- Preserves the existing `account: null, requiresOpenaiAuth: false`
behavior for other non-OpenAI providers.
- Regenerates the app-server protocol schema fixtures.
- Adds coverage for provider account classification and for the Amazon
Bedrock `account/read` response.
## Testing
- `cargo test -p codex-model-provider`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server get_account_with_aws_provider`
## Notes
I attempted `just bazel-lock-update` and `just bazel-lock-check`, but
both are blocked in my local environment because `bazel` is not
installed.
Fixes#18203.
## Why
Remote TUI clients connected through `codex app-server --listen
ws://...` can receive short bursts of outbound turn and tool-output
notifications. The WebSocket transport previously used the shared
128-message channel capacity for its outbound writer queue, so a healthy
client that briefly lagged during normal output streaming could fill the
queue and be disconnected immediately.
This is a smaller mitigation than #18265: instead of adding a new
overflow/backpressure pipeline, keep the existing non-blocking router
behavior and give WebSocket clients enough bounded headroom for
realistic bursts.
## What Changed
- Added a WebSocket-only outbound writer capacity of `64 * 1024`
messages.
- Used that larger capacity only for the WebSocket data writer queue in
`codex-rs/app-server/src/transport/websocket.rs`.
- Left the shared `CHANNEL_CAPACITY` and the existing disconnect-on-full
behavior unchanged for internal/control channels and genuinely stuck
clients.
## Verification
- `cargo test -p codex-app-server
transport::tests::broadcast_does_not_block_on_slow_connection`
- Manually retried the #18203 repro prompt against the remote TUI and
confirmed it stayed connected.
## Why
The BuildBuddy runs for PR #19086 and the later `main` build had the
same source tree, but their Windows Bazel action and test cache keys did
not line up. Comparing the downloaded execution logs showed the full
GitHub-hosted Windows runner `PATH` had changed from
`apache-maven-3.9.14` to `apache-maven-3.9.15`.
This repo is not using Maven; the Maven entry was just ambient
hosted-runner state. The problem was that Windows Bazel CI was still
forwarding the whole runner `PATH` into Bazel via `--action_env=PATH`,
`--host_action_env=PATH`, and `--test_env=PATH`, which made otherwise
reusable cache entries sensitive to unrelated runner image churn.
After discussion with the Bazel and BuildBuddy folks, the better shape
for this change was to stop asking Bazel to inherit the ambient Windows
`PATH` and instead compute one explicit cache-stable `PATH` in the
Windows setup action that already prepares the CI toolchain environment.
## What
- remove Windows `PATH` passthrough from `.bazelrc`
- export `CODEX_BAZEL_WINDOWS_PATH` from
`.github/actions/setup-bazel-ci/action.yml`
- move the PATH derivation logic into
`.github/scripts/compute-bazel-windows-path.ps1` so the allow-list is
easier to review and document
- keep only the Windows tool locations these Bazel jobs actually need:
MSVC and SDK paths, Git, PowerShell, Node, DotSlash, and the standard
Windows system directories
- update `.github/scripts/run-bazel-ci.sh` to require that explicit
value and forward it to Bazel action, host action, and test environments
- log the derived `CODEX_BAZEL_WINDOWS_PATH` in the setup step to
simplify cache-key debugging
## Verification
- `bash -n .github/scripts/run-bazel-ci.sh`
- `ruby -e 'require "yaml"; YAML.load_file(ARGV[0])'
.github/actions/setup-bazel-ci/action.yml`
- PowerShell parse check for
`.github/scripts/compute-bazel-windows-path.ps1`
- simulated a representative Windows `PATH` in PowerShell; the
allow-list retained MSVC, Git, PowerShell, Node, Windows, and DotSlash
entries while dropping Maven
We used to attempt a read-ACL on the same dir as `codex.exe` to grant
the sandbox user the ability to invoke `codex-command-runner.exe`. That
worked for the CLI case but it always fails for the installed desktop
app.
We have another solution already in place that copies
`codex-command-runner.exe` to `CODEX_HOME/.sandbox-bin` so we don't even
need this anymore. It causes a scary looking error in the logs that is a
non-issue and is therefore confusing
Sometimes codex runs `Start-Process` to start up a service or something
similar, which launches a user-visible powershell window that probably
doesn't get cleaned up. This instruction change encourages it to do so
using a hidden window.
This was reported in
https://openai.slack.com/archives/C09K6H5DZC4/p1776741272870519
One caveat is that this change won't do anything to cleanup these
processes, but it will stop them from polluting the user's visible
workspace
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Several approval-focused tests were unintentionally sensitive to
host-level rule files. On machines with broader allowed command
prefixes, commonly allowed commands such as `/bin/date` could bypass the
approval path these tests were meant to exercise, making the fixtures
depend on the developer or CI host configuration.
## What changed
- Pins the approval matrix fixture to the explicit user reviewer so it
does not inherit a host reviewer.
- Changes OTel approval fixtures to request `/usr/bin/touch
codex-otel-approval-test`, avoiding a command that may be pre-approved
by local rules.
- Clears the config layer stack for the permissions-message assertion
that needs to compare only the permissions text under test.
## Verification
- `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test
all approval_matrix_covers_all_modes -- --nocapture`
- `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test
all permissions_messages -- --nocapture`
## Summary
The plugin MCP tool-listing test could hide MCP startup failures by
polling `ListMcpTools` until its own 30s deadline. If the plugin MCP
server startup had already failed or timed out, the session-owned MCP
manager would keep returning an empty tool list, so CI only reported
`discovered tools: []` instead of the startup state that mattered.
This makes the test synchronize on `McpStartupComplete` for the sample
plugin MCP server before asserting listed tools, and gives the
Bazel-launched test server a larger startup window.
## Notes
Confidence is about 80%. The source path strongly supports the RCA: a
failed MCP startup is represented as an empty tool list through
`ListMcpTools`, so the old polling contract could not distinguish "not
ready yet" from "startup already failed." I could not retrieve the CI
execution-log artifact to confirm the exact hidden startup error, but
the observed Ubuntu Bazel failure matches this path: repeated
`ListMcpTools` responses with no tools until the test-local timeout
fired.
I think this is the right solution because it keeps plugin behavior
unchanged and fixes only the test contract. Future startup failures
should now report the `McpStartupComplete` failure/cancellation instead
of timing out on an empty tool snapshot.
This test was introduced in https://github.com/openai/codex/pull/12864.
Addresses #11267
## Summary
`/review` can be interrupted while it is still spawning the review
sub-agent. That spawn path lives in `codex-core` and did not observe the
task cancellation token until after `Codex::spawn` returned, so an
interrupted review could keep building a child session and leave the TUI
in a wedged state.
The TUI exit path also waited indefinitely for app-server
`thread/unsubscribe`, which made Ctrl+C look broken if the app-server
was already stuck. This makes interactive delegate startup
cancellation-aware and bounds the TUI shutdown-first unsubscribe wait
with a short UI escape-hatch timeout.
## Testing
I reproed the hang using the steps in the bug report. Confirmed hang no
longer exists after fix.
## Summary
The Windows Bazel job has been failing in
`chatwidget::tests::permissions::approvals_popup_navigation_skips_disabled`
because the test assumed a fixed approvals popup row order and shortcut
for the disabled permissions option. The approvals popup can include
platform-specific rows, so those assumptions made the test brittle.
This updates the test to derive the disabled row shortcut from the
rendered popup and assert navigation continues to skip disabled rows
before checking that disabled numeric shortcuts do not close or accept
the popup.
we copy `codex-command-runner.exe` into `CODEX_HOME/.sandbox-bin/` so
that it can be executed by the sandbox user.
We also detect if that version is stale and copy a new one in if so.
This can fail when you are running multiple versions of the app - the
file in `.sandbox-bin` can look stale because it comes from another app
build.
This change allows us to have multiple versions in there for different
CLI versions, and it fallsback to a `size+mtime` hash in the filename
for dev builds that don't report a real CLI version.
## Why
A Mac Bazel run hit a flake in
`server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes`
where the read path observed process exit but lost the expected buffered
stdout (`first\nsecond\n`). See the [GitHub Actions
job](https://github.com/openai/codex/actions/runs/24758468552/job/72436716505)
and [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/37475a12-4ef2-45fb-ab8a-e49a2aba1d59).
The underlying race is that process exit is not the same thing as
stdout/stderr closure. If a child or grandchild inherits the pipe write
end, or a process duplicates it with `dup2`, the watched process can
exit while the stream is still open and more output can still arrive.
The exec-server was starting exited-process retention cleanup from the
exit event, so the process entry could be removed before the output
streams had actually closed.
While stress-testing the exec-server unit suite,
`server::handler::tests::long_poll_read_fails_after_session_resume`
exposed a separate test race: it started a short-lived command that
could exit and wake the pending long-poll read before the session-resume
assertion observed the resumed-session error. That test is intended to
cover resume eviction, not process-exit delivery, so this change keeps
the process alive and quiet while the second connection resumes the
session.
## What changed
- Keep exec-server process entries retained until stdout/stderr streams
close, then start the post-exit retention timer from the closed event.
- Wake long-poll readers when the closed event is emitted.
- Add focused `local_process` unit coverage that proves late output is
still retained after the short test retention interval has elapsed, and
that closed process entries are eventually evicted.
- Add a local and remote regression test where a parent exits while a
child keeps inherited stdout open. The child waits on an explicit
release file, so the test deterministically observes exit first,
releases the child, then requires a nonzero-wait read from the exit
sequence to receive the late output.
- In `codex-rs/exec-server/src/server/handler/tests.rs`, make
`long_poll_read_fails_after_session_resume` run a long-lived silent
command instead of a short command that prints and exits. This isolates
the test to session-resume behavior and prevents a normal process exit
from satisfying the pending long-poll read first.
## Testing
- `cargo test -p codex-exec-server
exec_process_retains_output_after_exit_until_streams_close`
- `cargo test -p codex-exec-server local_process::tests`
- `cargo test -p codex-exec-server`
- `just fix -p codex-exec-server`
- `bazel test //codex-rs/exec-server:exec-server-unit-tests
//codex-rs/exec-server:exec-server-exec_process-test
//codex-rs/exec-server:exec-server-file_system-test
//codex-rs/exec-server:exec-server-http_client-test
//codex-rs/exec-server:exec-server-initialize-test
//codex-rs/exec-server:exec-server-process-test
//codex-rs/exec-server:exec-server-websocket-test`
- `bazel test --runs_per_test=25
//codex-rs/exec-server:exec-server-unit-tests`
## Documentation
No docs update needed; this is an internal exec-server correctness fix.
## Why
Shell escalation still has adapter code that expects a legacy sandbox
policy, but command approvals should carry the resolved
`PermissionProfile` so callers can reason about the granted permissions
canonically.
## What changed
This introduces profile-shaped resolved escalation permissions while
retaining the derived legacy sandbox policy for the Unix escalation
adapter. It updates approval types, the escalation server protocol, and
tests that inspect escalated command permissions.
## Verification
- `cargo test -p codex-core --test all handle_container_exec_ --
--nocapture`
- `cargo test -p codex-core --test all handle_sandbox_ -- --nocapture`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18287).
* #18288
* __->__ #18287
## Summary
Extends rollout tracing across tool dispatch and code-mode runtime
boundaries. This records canonical tool-call lifecycle events and links
code-mode execution/wait operations back to the model-visible calls that
caused them.
## Stack
This is PR 3/5 in the rollout trace stack.
- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command
## Review Notes
This PR is about attribution. Reviewers should focus on whether direct
tool calls, code-mode-originated tool calls, waits, outputs, and
cancellation boundaries are recorded with enough source information for
deterministic reduction without coupling the reducer to live runtime
internals.
The stack remains valid after this layer: tool and code-mode traces
reduce through the existing crate model, while the broader session and
multi-agent relationships are added in the next PR.
## Why
MCP tool calls can receive a serialized `SandboxState` when a server
declares the sandbox-state capability. That state is one of the places
MCP runtimes learn what permissions Codex is operating under. As the
permissions migration makes `PermissionProfile` the canonical
representation, MCP consumers should be able to read that profile
directly instead of reconstructing permissions from the legacy
`SandboxPolicy`.
## What changed
- Adds optional `permissionProfile` to `codex_mcp::SandboxState`, while
keeping `sandboxPolicy` for existing MCP consumers.
- Populates `permissionProfile` from the current `TurnContext` when
serializing sandbox state for MCP tool calls.
## Verification
- Current GitHub Actions for this PR are passing.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18286).
* #18288
* #18287
* __->__ #18286
## Why
Per-turn permission overrides should use the same canonical profile
abstraction as session configuration. That lets TUI submissions preserve
exact configured permissions without round-tripping through legacy
sandbox fields.
## What changed
This adds `permission_profile` to user-turn operations, threads it
through TUI/app-server submission paths, fills the new field in existing
test fixtures, and adds coverage that composer submission includes the
configured profile.
## Verification
- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285).
* #18288
* #18287
* #18286
* __->__ #18285
## Why
App-server needs a way to fetch thread-scoped config from the remote
thread config service when the user config opts into that behavior. This
mirrors the existing experimental remote thread store endpoint while
keeping local/noop behavior as the default.
Startup paths also need to avoid silently dropping the remote config
endpoint after the first config load. The stdio app-server path
discovers the endpoint from the initial config and installs the real
thread config loader for later config builds, while in-process clients
used by TUI/exec now select the same remote loader directly from their
provided config.
## What changed
- Added `experimental_thread_config_endpoint` to `ConfigToml`, `Config`,
and `core/config.schema.json`.
- Added config parsing coverage for the new setting.
- Updated app-server startup to select `RemoteThreadConfigLoader` from
the initially loaded config, falling back to `NoopThreadConfigLoader`
when unset.
- Let `ConfigManager` replace its thread config loader after startup
discovery so later config loads use the selected loader.
- Updated in-process app-server client startup to pass
`RemoteThreadConfigLoader` when its config has
`experimental_thread_config_endpoint` set.
## Verification
- Added `experimental_thread_config_endpoint_loads_from_config_toml`.
- Added
`runtime_start_args_use_remote_thread_config_loader_when_configured`.
- Ran `cargo check -p codex-app-server --lib`.
- Ran `cargo test -p codex-app-server-client`.
## Why
PR #18797 currently surfaces fallback rationale text that names Guardian
directly.
## What changed
- Updated the bare allow and bare deny fallback rationales in
`codex-rs/core/src/guardian/prompt.rs` from Guardian to Auto-review.
- Updated the existing bare allow parser test and added explicit bare
deny parser coverage.
## Verification
- `cargo test -p codex-core parse_guardian_assessment_treats_bare`
## Summary
- add macOS application and team identifiers to the release signing
entitlements
- add a Codex keychain access group for release-signed macOS binaries
- keep the existing JIT entitlement unchanged
## Why
Codex release binaries are signed with the OpenAI Developer ID team, but
the current entitlements plist only grants JIT. macOS Keychain and
Secure Enclave operations that create persistent keys can require the
process to carry an application identifier and keychain access group.
Adding these entitlements gives release-signed binaries a stable
Keychain namespace for Codex-owned device keys.
## Validation
- `plutil -lint
.github/actions/macos-code-sign/codex.entitlements.plist`
## Summary
- add unix:// app-server transport backed by the shared codex-uds crate
- reuse the websocket connection loop for axum and tungstenite-backed
streams
- add codex app-server proxy to bridge stdio clients to the control
socket
- tolerate Windows UDS backends that report a missing rendezvous path as
connection refused before binding
## Tests
- cargo test -p codex-app-server
control_socket_acceptor_forwards_websocket_text_messages_and_pings
- cargo test -p codex-app-server
- just fmt
- just fix -p codex-app-server
- git -c core.fsmonitor=false diff --check
## Why
Fixes#18475. A `-c` override such as `projects.<cwd>.trust_level =
"untrusted"` is meant to be a runtime config override, but app-server
thread startup treated any non-trusted project as eligible for automatic
trust persistence when a permissive sandbox/cwd was requested. That
meant an explicit `untrusted` session override could still cause
`config.toml` to be updated with `trusted`.
## What changed
The app-server auto-trust path now runs only when the active project
trust level is unknown. Explicit `trusted` and explicit `untrusted`
values are both respected, regardless of whether they came from
persisted config or session flags.
A focused `thread/start` test now covers the explicit `untrusted` case
with a permissive sandbox request.
## Verification
- `cargo test -p codex-app-server`
- `just fix -p codex-app-server`
Begin migrating the thread write codepaths to ThreadStore.
This starts using ThreadStore inside of core session code, not only in
the app server code.
Rework the interfaces around thread recording/persistence. We're left
with the following:
* `ThreadManager`: owns the process-level registry of loaded threads and
handles cross-thread orchestration: start, resume, fork, lookup, remove,
and route ops to running CodexThreads.
* `CodexThread`: represents one loaded/running thread from the outside.
It is the handle app-server and callers use to submit ops, inspect
session metadata, and shut the thread down.
* `LiveThread`: session-owned persistence lifecycle handle for one
active thread. Core session code uses it to append rollout items,
materialize lazy persistence, flush, shutdown, discard init-failed
writers, and load that thread’s persisted history.
* `ThreadStore`: storage backend abstraction. It answers “how are
threads persisted, read, listed, updated, archived?” Local and remote
implementations live behind this trait.
* `LocalThreadStore`: local ThreadStore implementation. It owns the
file/sqlite-specific details and keeps RolloutRecorder as a local
implementation detail.
This is a few too many Thread abstractions for my liking, but they do
all represent different concepts / needs / layers.
Migration note: in places where the core code explicitly requires a
path, rather than a thread ID, throw an error if we're running with a
remote store.
Cover the new local live-writer lifecycle with focused tests and
preserve app-server thread-start behavior, including ephemeral pathless
sessions.
For callers who expect to be paginating the results for the UI, they can
now call thread/resume or thread/fork with excludeturns:true so it will
not fetch any pages of turns, and instead only set up the subscription.
That call can be immediately followed by pagination requests to
thread/turns/list to fetch pages of turns according to the UI's current
interactions.
## Why
Thread-scoped config needs a stable boundary between the app/session
owner and the config stack. Instead of having call sites manually copy
thread config fields into individual overrides, this adds the proto and
Rust plumbing needed for a `ThreadConfigLoader` implementation to return
typed sources that can be translated into ordinary config layer entries.
Keeping the remote payload typed also makes precedence easier to reason
about: session-owned thread config maps back to the existing session
config source, while user-owned thread config is represented separately
without introducing a new config-layer source until it has TOML-backed
fields.
## What changed
- Added the `codex.thread_config.v1` protobuf service and generated Rust
module for loading thread config sources.
- Added `RemoteThreadConfigLoader`, which calls the gRPC service, parses
`SessionThreadConfig` / `UserThreadConfig`, and validates provider
fields such as `wire_api`, auth timeout, and absolute auth cwd.
- Added proto generation tooling under
`config/scripts/generate-proto.sh` and
`config/examples/generate-proto.rs`.
- Added `ThreadConfigLoader::load_config_layers`, plus static/no-op
loader helpers, so tests and callers can use the same typed loader
interface while config-layer translation stays centralized.
## Verification
- `cargo test -p codex-config thread_config`
## Why
MultiAgentV2 children should not receive an extra model-visible
developer fragment just because they were spawned. The parent/configured
developer instructions should carry through normally, but the dedicated
`<spawned_agent_context>` block is no longer desired.
## What changed
- Removed the `SpawnAgentInstructions` context fragment and its
`<spawned_agent_context>` wrapper.
- Stopped appending spawned-agent instructions in
`codex-rs/core/src/tools/handlers/multi_agents_v2/spawn.rs`.
- Updated subagent notification coverage to assert inherited parent
developer instructions without expecting the spawned-agent wrapper.
## Verification
- `cargo test -p codex-core --test all
spawned_multi_agent_v2_child_inherits_parent_developer_context --
--nocapture`
- `cargo test -p codex-core --test all
skills_toggle_skips_instructions_for_parent_and_spawned_child --
--nocapture`
- `cargo test -p codex-core --test all subagent_notifications --
--nocapture`
## Summary
- Updates generated CLI help for plugin marketplace commands to show the
full `codex plugin marketplace ...` namespace.
- Adds a regression test covering the marketplace command and its `add`,
`upgrade`, and `remove` help pages.
## Root Cause
The marketplace parser already lived under `codex plugin marketplace`,
but Clap generated usage text from the child parser's standalone command
name. That made help output show stale `codex marketplace ...`
instructions even though the top-level `codex marketplace` command no
longer parses.
## Validation
- `just fmt`
- `cargo test -p codex-cli`
- `./target/debug/codex plugin marketplace --help`
## Why
Once `SessionConfigured` carries the active `PermissionProfile`, the TUI
must treat that as authoritative session state. Otherwise the widget can
keep stale local permission details after a session is configured or
resumed.
The TUI also keeps a local `Config` copy used for later operations, so
session-sourced profiles and subsequent local sandbox changes need to
keep the derived split runtime permissions in sync. Because this PR may
land before the follow-up user-turn profile plumbing, embedded
app-server turns also need a standalone path for carrying local runtime
sandbox overrides.
## What changed
- Sync the chat widget runtime filesystem/network permissions from
`SessionConfigured.permission_profile`, with the legacy `sandbox_policy`
as the fallback.
- Recompute split runtime permissions whenever the TUI applies or
carries forward a local sandbox-policy override.
- Mark feature-driven Auto-review sandbox changes as runtime sandbox
overrides so the standalone embedded turn-start profile path is used
even without the follow-up user-turn profile PR.
- Send a turn-start `permissionProfile` for embedded,
non-ExternalSandbox turns when the TUI has a runtime sandbox override;
remote and ExternalSandbox turns keep using the legacy sandbox field.
- Extend coverage for profile sync, local sandbox changes,
ExternalSandbox fallback, feature-driven sandbox overrides, and
turn-start permission override selection.
## Verification
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_selects_auto_review`
- `cargo test -p codex-tui
turn_start_permission_overrides_send_profiles_only_for_embedded_runtime_overrides`
- `cargo test -p codex-tui permission_settings_sync`
- `cargo test -p codex-tui
session_configured_external_sandbox_keeps_external_runtime_policy`
- `cargo test -p codex-tui
session_configured_syncs_widget_config_permissions_and_cwd`
- `just fix -p codex-tui`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18284).
* #18288
* #18287
* #18286
* #18285
* __->__ #18284
## Why
Windows CI can flake in
`server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes`
after a process has exited but before both output streams have closed.
`exec/read` returned immediately whenever `exited` was true, so callers
that had already observed the exit event could spin instead of
long-polling for the later `closed` state.
## What Changed
- Keep returning immediately when a terminal exit event is newly
observable.
- Allow later reads, after the caller has advanced past that event, to
wait for `closed` or new output until `wait_ms` expires.
## Verification
- CI pending.
## Why
`multi_agent_v2` uses the v2 agent lifecycle, so accepting the legacy
`agents.max_threads` limit alongside it creates conflicting
configuration semantics. Config load should fail early with a clear
error instead of allowing both knobs to be set.
## What Changed
- During config load, detect when the effective `multi_agent_v2` feature
is enabled and `agents.max_threads` is explicitly set.
- Return an `InvalidInput` error: `agents.max_threads cannot be set when
multi_agent_v2 is enabled`.
## Verification
- `cargo test -p codex-core multi_agent_v2_rejects_agents_max_threads`
passed locally with a temporary focused test for this behavior.
- `cargo test -p codex-core` was also run; the new focused path passed,
but the crate suite has unrelated pre-existing failures in managed
config/proxy/request-permissions tests.
## Why
This keeps the partial Guardian subagent -> Auto-review rename
forward-compatible across mixed Codex installations. Newer binaries need
to understand the new `auto_review` spelling, but they cannot write it
to shared `~/.codex/config.toml` yet because older CLI/app-server
bundles only know `user` and `guardian_subagent` and can fail during
config load before recovering.
The Python SDK had the opposite compatibility gap: app-server responses
can contain `approvalsReviewer: "auto_review"`, but the checked-in
generated SDK enum did not accept that value.
## What Changed
- Keep `ApprovalsReviewer::AutoReview` readable from both
`guardian_subagent` and `auto_review`, while serializing it as
`guardian_subagent` in both protocol crates.
- Update TUI Auto-review persistence tests so enabling Auto-review
writes `approvals_reviewer = "guardian_subagent"` while UI copy still
says Auto-review.
- Map managed/cloud `feature_requirements.auto_review` to the existing
`Feature::GuardianApproval` gate without adding a broad local
`[features].auto_review` key or changing config writes.
- Add `auto_review` to the Python SDK `ApprovalsReviewer` enum and cover
`ThreadResumeResponse` validation.
## Testing
- `cargo test -p codex-protocol approvals_reviewer`
- `cargo test -p codex-app-server-protocol approvals_reviewer`
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_selects_auto_review`
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_in_profile_sets_profile_auto_review_policy`
- `cargo test -p codex-core
feature_requirements_auto_review_disables_guardian_approval`
- `pytest
sdk/python/tests/test_client_rpc_methods.py::test_thread_resume_response_accepts_auto_review_reviewer`
- `git diff --check`
## Summary
Lifecycle hooks currently treat `PreToolUse`, `PostToolUse`, and
`PermissionRequest` as Bash-only flows
- hook schema constrains `tool_name` to `Bash`
- hook input assumes a command-shaped `tool_input`
- core hook dispatch path passes only shell command strings
That means hooks cannot target MCP tools even though MCP tool names are
model-visible and stable
This change generalizes those hook paths so they can match and receive
payloads for MCP tools while preserving the existing Bash behavior.
## Reviewer Notes
I think these are the key files
- `codex-rs/core/src/tools/handlers/mcp.rs`
- `codex-rs/core/src/mcp_tool_call.rs`
Otherwise the changes across apply_patch, shell, and unified_exec are
mainly to rewire everything to be `tool_input` based instead of just
`command` so that it'll make sense for MCP tools.
## Changes
- Allow `PreToolUse`, `PostToolUse`, and `PermissionRequest` hook inputs
to carry arbitrary `tool_name` and `tool_input` values instead of
hard-coding `Bash` and command-only payloads.
- Add MCP hook payload support through `McpHandler`, using the
model-visible tool name from `ToolInvocation` and the raw MCP arguments
as `tool_input`.
- Include MCP tool responses in `PostToolUse` by serializing
`McpToolOutput` into the hook response payload.
- Run `PermissionRequest` hooks for MCP approval requests after
remembered approval checks and before falling back to user-facing MCP
elicitation.
- Preserve exact matching for literal hook matchers like `Bash` and
`mcp__memory__create_entities`, while keeping regex matcher support for
patterns like `mcp__memory__.*` and `mcp__.*__write.*`.
---------
Co-authored-by: Andrei Eternal <eternal@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Why
`item/permissions/requestApproval` sends a requested permission profile
to app-server clients. The core profile already stores filesystem
permissions as `entries`, but the v2 compatibility conversion used the
legacy `read`/`write` projection whenever possible and left `entries`
unset.
That made the request ambiguous for clients that consume the canonical
v2 shape: `permissions.fileSystem.entries` was missing even though
filesystem access was being requested. A client that rendered or echoed
grants from `entries` could treat the request as having no filesystem
permission entries, then return an empty or incomplete grant. The
app-server intersects responses with the original request, so omitted
filesystem permissions are denied.
## What Changed
- Populate `AdditionalFileSystemPermissions.entries` when converting
legacy read/write roots for request permission payloads, while
preserving `read` and `write` for compatibility.
- Mark `read` and `write` as transitional schema fields in the generated
app-server schema.
- Add regression coverage for the v2 conversion, the app-server
`item/permissions/requestApproval` round trip, and TUI app-server
approval conversion expectations.
- Refresh generated JSON and TypeScript schema fixtures.
## Verification
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server request_permissions_round_trip`
- `cargo test -p codex-tui
converts_request_permissions_into_granted_permissions`
- `cargo test -p codex-tui
resolves_permissions_and_user_input_through_app_server_request_id`
## Why
When the TUI upgrade flow moves a user to a newer model, the accepted
migration should also persist the target model's default reasoning
effort. That keeps the upgraded model and reasoning setting aligned
instead of carrying forward a stale previously saved effort from the old
model.
## What changed
- The accepted model migration path now updates in-memory config, TUI
state, and persisted model selection with the target preset's
`default_reasoning_effort`.
- The upgrade destructuring keeps `reasoning_effort_mapping` explicitly
unused because mappings are no longer consulted on accepted migrations.
- Added a catalog test that starts with a pre-existing saved reasoning
effort and verifies the accepted upgrade overwrites it with the target
model default and emits the expected persistence events.
- Rebasing onto current `main` also updates a TUI thread-session test
helper for the latest `permission_profile` field and
`ApprovalsReviewer::AutoReview` rename so CI compiles on the new base.
## Verification
- `cargo test -p codex-tui model_catalog`
- `cargo test -p codex-tui
permission_settings_sync_updates_active_snapshot_without_rewriting_side_thread`
## Why
The current cloud-requirements failures say `workspace-managed config`,
which is ambiguous and can read like it refers to local managed config
such as `managed_config.toml`.
This code path only applies to cloud requirements, so the user-facing
message should name that source directly.
## What changed
- Updated the load failure in
[`codex-rs/cloud-requirements/src/lib.rs`](46e704d1f9/codex-rs/cloud-requirements/src/lib.rs)
to say `failed to load cloud requirements (workspace-managed policies)`.
- Updated the parse failure in the same file to use the same `cloud
requirements (workspace-managed policies)` terminology.
- Kept `workspace-managed` hyphenated because it is used as a compound
modifier.
- Updated the matching assertion in
[`codex-rs/app-server/src/codex_message_processor.rs`](46e704d1f9/codex-rs/app-server/src/codex_message_processor.rs).
- Reused `CLOUD_REQUIREMENTS_LOAD_FAILED_MESSAGE` in the
`codex-cloud-requirements` test where the test is asserting that
crate-local contract directly.
## Testing
`cargo test -p codex-cloud-requirements`
Addresses #18854
## Why
The `/permissions` selector updates the active TUI session state, but
the cached session snapshot used when replaying a thread could still
contain the old approval or sandbox settings. After opening and leaving
`/side`, the main thread replay could restore those stale settings into
the `ChatWidget`, so the UI and the next submitted turn could fall back
to the old permission mode.
## What
- Sync the active thread's cached `ThreadSessionState` whenever approval
policy, sandbox policy, or approval reviewer changes.
## Verification
Confirmed bug prior to fix and correct behavior after fix.
# Why
Hooks are ready to graduate to GA in the next release!
# What
- Moves `Feature::CodexHooks` into the stable feature group.
- Marks the `codex_hooks` feature spec as `Stage::Stable` and
default-enabled.
## Why
`command/exec` is another app-server entry point that can run under
caller-provided permissions. It needs to accept `PermissionProfile`
directly so command execution is not left behind on `SandboxPolicy`
while thread APIs move forward.
Command-level profiles also need to preserve the semantics clients
expect from profile-relative paths. `:cwd` and cwd-relative deny globs
should be anchored to the resolved command cwd for a command-specific
profile, while configured deny-read restrictions such as `**/*.env =
none` still need to be enforced because they can come from config or
requirements rather than the command override itself.
## What Changed
This adds `permissionProfile` to `CommandExecParams`, rejects requests
that combine it with `sandboxPolicy`, and converts accepted profiles
into the runtime filesystem/network permissions used for command
execution.
When a command supplies a profile, the app-server resolves that profile
against the command cwd instead of the thread/server cwd. It also
preserves configured deny-read entries and `globScanMaxDepth` on the
effective filesystem policy so one-off command overrides cannot drop
those read protections. The PR also updates app-server docs/schema
fixtures and adds command-exec coverage for accepted, rejected,
cwd-scoped, and deny-read-preserving profile paths.
## Verification
- `cargo test -p codex-app-server
command_exec_permission_profile_cwd_uses_command_cwd`
- `cargo test -p codex-app-server
command_profile_preserves_configured_deny_read_restrictions`
- `cargo test -p codex-app-server
command_exec_accepts_permission_profile`
- `cargo test -p codex-app-server
command_exec_rejects_sandbox_policy_with_permission_profile`
- `just fix -p codex-app-server`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18283).
* #18288
* #18287
* #18286
* #18285
* #18284
* __->__ #18283
## Why
Enterprise and business-like ChatGPT plans should get Codex's Fast
service tier by default when the user or caller has not made an explicit
service-tier choice. At the same time, callers need a durable way to
choose standard routing without adding a new persisted `standard`
service tier value. This keeps existing config compatibility while
letting core own the managed default policy.
## What changed
- Resolve the effective service tier in core at session creation:
explicit `fast` or `flex` wins, explicit null/clear or
`[notice].fast_default_opt_out = true` resolves to standard routing, and
otherwise eligible ChatGPT plans resolve to Fast when FastMode is
enabled.
- Add `[notice].fast_default_opt_out` as the persisted opt-out marker
for managed Fast defaults.
- Treat app-server/TUI `service_tier: null` as an explicit
standard/clear choice by preserving that intent through config loading.
- Update TUI rendering to use core's effective service tier for startup
and status surfaces while still keeping `config.service_tier` as the
explicit configured choice.
- Update `/fast off` to clear `service_tier`, persist the opt-out
marker, and send explicit standard for subsequent turns.
## Verification
- Added unit coverage for config override/notice handling, service-tier
resolution, runtime null clearing, and `/fast off` turn propagation.
- `cargo build -p codex-cli`
Full test suite was not run locally per author request.
## Why
Clients that observe `SessionConfigured` need the same canonical
permission view that app-server thread responses provide. Reporting the
profile in protocol events lets clients keep their local state
synchronized without reinterpreting legacy sandbox fields.
## What changed
This adds `permission_profile` to `SessionConfigured` and propagates it
through core, exec JSON output, MCP server messages, and TUI
history/widget handling.
## Verification
- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18282).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* __->__ #18282
## Summary
Support the existing hooks schema in inline TOML so hooks can be
configured from both `config.toml` and enterprise-managed
`requirements.toml` without requiring a separate `hooks.json` payload.
This gives enterprise admins a way to ship managed hook policy through
the existing requirements channel while still leaving script delivery to
MDM or other device-management tooling, and it keeps `hooks.json`
working unchanged for existing users.
This also lays the groundwork for follow-on managed filtering work such
as #15937, while continuing to respect project trust gating from #14718.
It does **not** implement `allow_managed_hooks_only` itself.
NOTE: yes, it's a bit unfortunate that the toml isn't formatted as
closely as normal to our default styling. This is because we're trying
to stay compatible with the spec for plugins/hooks that we'll need to
support & the main usecase here is embedding into requirements.toml
## What changed
- moved the shared hook serde model out of `codex-rs/hooks` into
`codex-rs/config` so the same schema can power `hooks.json`, inline
`config.toml` hooks, and managed `requirements.toml` hooks
- added `hooks` support to both `ConfigToml` and
`ConfigRequirementsToml`, including requirements-side `managed_dir` /
`windows_managed_dir`
- treated requirements-managed hooks as one constrained value via
`Constrained`, so managed hook policy is merged atomically and cannot
drift across requirement sources
- updated hook discovery to load requirements-managed hooks first, then
per-layer `hooks.json`, then per-layer inline TOML hooks, with a warning
when a single layer defines both representations
- threaded managed hook metadata through discovered handlers and exposed
requirements hooks in app-server responses, generated schemas, and
`/debug-config`
- added hook/config coverage in `codex-rs/config`, `codex-rs/hooks`,
`codex-rs/core/src/config_loader/tests.rs`, and
`codex-rs/core/tests/suite/hooks.rs`
## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server config_api`
## Documentation
Companion updates are needed in the developers website repo for:
- the hooks guide
- the config reference, sample, basic, and advanced pages
- the enterprise managed configuration guide
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
## Why
This regressed in #19063, which made `GuardianApproval` stable and
enabled by default. That adds an enabled `Auto-review` row to the
permissions popup, but `approvals_popup_navigation_skips_disabled` still
assumed the disabled `Full Access` row lived behind a hard-coded numeric
shortcut, so the test started selecting a different row and closing the
popup instead of verifying disabled-row behavior.
## What
- disable `GuardianApproval` in
`approvals_popup_navigation_skips_disabled` so the popup layout matches
the scenario the test is exercising
- choose the hidden numeric shortcut for the disabled `Full Access` row
by platform (`2` on non-Windows, `3` on Windows where `Read Only` is
shown) before asserting that selecting the disabled row leaves the popup
open
## Testing
- `cargo test -p codex-tui --lib
chatwidget::tests::permissions::approvals_popup_navigation_skips_disabled
-- --exact --nocapture`
- `cargo test -p codex-tui --lib chatwidget::tests::permissions --
--nocapture`
- `cargo test -p codex-tui`
## Summary
Set `RUST_MIN_STACK=8388608` for Rust test entry points so
libtest-spawned test threads get an 8 MiB stack.
The Windows BuildBuddy failure on #18893 showed
`//codex-rs/tui:tui-unit-tests` exiting with a stack overflow in a
`#[tokio::test]` even though later test binaries in the shard printed
successful summaries. Default `#[tokio::test]` uses a current-thread
Tokio runtime, which means the async test body is driven on libtest's
std-spawned test thread. Increasing the test thread stack addresses that
failure mode directly.
To date, we have been fixing these stack-pressure problems with
localized future-size reductions, such as #13429, and by adding
`Box::pin()` in specific async wrapper chains. This gives us a baseline
test-runner stack size instead of continuing to patch individual tests
only after CI finds another large async future.
## What changed
- Added `common --test_env=RUST_MIN_STACK=8388608` in `.bazelrc` so
Bazel test actions receive the env var through Bazel's cache-keyed test
environment path.
- Set the same `RUST_MIN_STACK` value for Cargo/nextest CI entry points
and `just test`.
- Annotated the existing Windows Bazel linker stack reserve as 8 MiB so
it stays aligned with the libtest thread stack size.
## Testing
- `just --list`
- parsed `.github/workflows/rust-ci.yml` and
`.github/workflows/rust-ci-full.yml` with Ruby's YAML loader
- compared `bazel aquery` `TestRunner` action keys before/after explicit
`--test_env=RUST_MIN_STACK=...` and after moving the Bazel env to
`.bazelrc`
- `bazel test //codex-rs/tui:tui-unit-tests --test_output=errors`
- failed locally on the existing sandbox-specific status snapshot
permission mismatch, but loaded the Starlark changes and ran the TUI
test shards
## Summary
Allow the user to approve a request_permissions_tool request with the
condition that all commands in the rest of the turn are reviewed by
guardian, regardless of sandbox status.
## Testing
- [x] Added unit tests
- [x] Ran locally
## Why
While debugging the Windows stack overflows we saw in
[#13429](https://github.com/openai/codex/pull/13429) and then again in
[#18893](https://github.com/openai/codex/pull/18893), I hit another
overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
That test drives the legacy multi-agent spawn / close / resume path. The
behavior was fine, but several thin async wrappers were still inlining
much larger `AgentControl` futures into their callers, which was enough
to overflow the default Windows stack.
## What
- Box the thin `AgentControl` wrappers around `spawn_agent_internal`,
`resume_single_agent_from_rollout`, and `shutdown_agent_tree`.
- Box the corresponding legacy `multi_agents` handler calls in `spawn`,
`resume_agent`, and `close_agent`.
- Keep behavior unchanged while reducing future size on this call path
so the Windows test no longer overflows its stack.
## Testing
- `cargo test -p codex-core --lib
tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed
-- --exact --nocapture`
- `cargo test -p codex-core` (this still hit unrelated local
integration-test failures because `codex.exe` / `test_stdio_server.exe`
were not present in this shell; the relevant unit tests passed)
### Why
The RMCP layer needs a Streamable HTTP client that can talk either
directly over `reqwest` or through the executor HTTP runner without
duplicating MCP session logic higher in the stack. This PR adds that
client-side transport boundary so remote Streamable HTTP MCP can reuse
the same RMCP flow as the local path.
### What
- Add a shared `rmcp-client/src/streamable_http/` module with:
- `transport_client.rs` for the local-or-remote transport enum
- `local_client.rs` for the direct `reqwest` implementation
- `remote_client.rs` for the executor-backed implementation
- `common.rs` for the small shared Streamable HTTP helpers
- Teach `RmcpClient` to build Streamable HTTP transports in either local
or remote mode while keeping the existing OAuth ownership in RMCP.
- Translate remote POST, GET, and DELETE session operations into
executor `http/request` calls.
- Preserve RMCP session expiry handling and reconnect behavior for the
remote transport.
- Add remote transport coverage in
`rmcp-client/tests/streamable_http_remote.rs` and keep the shared test
support in `rmcp-client/tests/streamable_http_test_support.rs`.
### Verification
- `cargo check -p codex-rmcp-client`
- online CI
### Stack
1. #18581 protocol
2. #18582 runner
3. #18583 RMCP client
4. #18584 manager wiring and local/remote coverage
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`approvals_reviewer` now uses `auto_review` as the canonical config/API
value after #18504, but the Rust enum variant and nearby helper/test
names still used `GuardianSubagent` / guardian approval wording. That
made follow-up code and reviews confusing even though the external value
had already moved to Auto-review.
## What changed
- Renamed `ApprovalsReviewer::GuardianSubagent` to
`ApprovalsReviewer::AutoReview`.
- Updated protocol, app-server, config, core, TUI, exec, and analytics
test callsites.
- Renamed nearby helper/test names from guardian approval wording to
Auto-review wording where they refer to the approvals reviewer mode.
- Preserved wire compatibility:
- `auto_review` remains the canonical serialized value.
- `guardian_subagent` remains accepted as a legacy alias.
This intentionally does not rename the `[features].guardian_approval`
key, `Feature::GuardianApproval`, `core/src/guardian`, analytics event
names, or app-server Guardian review event types.
## Verification
- `cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-config approvals_reviewer`
- `cargo test -p codex-tui update_feature_flags`
- `cargo test -p codex-core permissions_instructions`
- `cargo test -p codex-tui permissions_selection`
Fixes#16246.
## Why
`exec_command` already emits `PreToolUse`, but long-running unified exec
commands that finish on a later `write_stdin` poll could miss the
matching `PostToolUse`. That left the Bash hook lifecycle inconsistent,
broke expectations around `tool_use_id` and `tool_input.command`, and
meant `PostToolUse` block/replacement feedback could fail to replace the
final session output before it reached model context.
This keeps the fix scoped to the `exec_command` / `write_stdin`
lifecycle. Broader non-Bash hook expansion is still out of scope here
and remains tracked separately in #16732.
## What changed
- Compute and store `PostToolUsePayload` while handlers still have
access to their concrete output type, and carry `tool_use_id` through
that payload.
- Preserve the original hook-facing `exec_command` string through
unified exec state (`ExecCommandRequest`, `ProcessEntry`,
`PreparedProcessHandles`, and `ExecCommandToolOutput`) via
`hook_command`, and remove the now-unused `session_command` output
metadata.
- Emit exactly one Bash `PostToolUse` for long-running `exec_command`
sessions when a later `write_stdin` poll observes final completion,
using the original `exec_command` call id and hook-facing command.
- Keep one-shot `exec_command` behavior aligned with the same payload
construction, including interactive completions that return a final
result directly.
- Apply `PostToolUse` block/replacement feedback before the final
`write_stdin` completion output is sent back to the model.
- Keep `write_stdin` itself out of `PreToolUse` matching so it continues
to act as transport/polling for the original Bash tool call.
- Restore plain matcher behavior for tool-name matchers such as `Bash`
and `Edit|Write`, while still treating patterns with regex characters
(for example `mcp__.*`) as regexes.
- Add unit coverage for unified exec payload construction and parallel
session separation, plus a core integration regression that verifies a
blocked `PostToolUse` replaces the final `write_stdin` output in model
context.
## Testing
- `cargo test -p codex-hooks`
- `cargo test -p codex-core post_tool_use_payload`
- `cargo test -p codex-core
post_tool_use_blocks_when_exec_session_completes_via_write_stdin`
## Why
Resume and reconstruction need to preserve the permissions that were
active for each user turn. If rollouts only keep legacy sandbox fields,
replay cannot faithfully represent profile-shaped overrides introduced
earlier in the stack.
## What changed
This records `permission_profile` on user-turn rollout events,
reconstructs it through history/state extraction, and updates rollout
reconstruction and related fixtures to keep the field explicit.
## Verification
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
- `cargo test -p codex-core --test all request_permissions --
--nocapture`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18281).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* __->__ #18281
## Why
After app-server can accept `PermissionProfile`, first-party clients
should stop preferring legacy sandbox fields when canonical permission
information is available. This keeps the migration moving without
removing legacy compatibility yet.
The client side still has mixed surfaces during the stack: embedded
thread start/resume/fork and exec initial turns can derive a profile
directly from local config, while TUI remote sessions and some
turn-start paths only have a legacy/server-context-safe sandbox
projection. Those paths keep sending legacy sandbox fields rather than
synthesizing or sending lossy/local-only profiles.
## What changed
- Sends `permissionProfile` from exec and embedded TUI thread
start/resume/fork requests when config has a representable profile.
- Keeps legacy sandbox fallback for external sandbox policies, TUI
remote thread lifecycle requests, and TUI turn-start requests that do
not yet carry the active profile.
- Sends the actual config-derived `permissionProfile` for exec initial
turns instead of rebuilding one from the legacy sandbox projection.
- Stores response `permissionProfile` as optional in TUI session state
so external sandbox responses and compatibility payloads preserve
`null`.
- Updates tests for request construction and response mapping.
## Verification
- `cargo check --tests -p codex-tui -p codex-exec`
- `cargo test -p codex-tui app_server_session -- --nocapture`
- `cargo test -p codex-exec thread_start_params -- --nocapture`
- `cargo test -p codex-tui
app_server_session::tests::thread_lifecycle_params -- --nocapture`
- `just fix -p codex-tui -p codex-exec`
- `just fix -p codex-tui`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18280).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* __->__ #18280
## Why
This is a cleanup PR for the `PermissionProfile` migration stack. #19016
fixed remote exec-server sandbox contexts so Docker-backed filesystem
requests use a request/container `cwd` instead of leaking the local test
runner `cwd`. That exposed the broader API problem:
`FileSystemSandboxContext::new(SandboxPolicy)` could still reconstruct
filesystem permissions by reading the exec-server process cwd with
`AbsolutePathBuf::current_dir()`.
That made `cwd`-dependent legacy entries, such as `:cwd`,
`:project_roots`, and relative deny globs, depend on ambient process
state instead of the request sandbox `cwd`. As later PRs make
`PermissionProfile` the primary permissions abstraction, sandbox
contexts should be explicit about whether they carry a request `cwd` or
are profile-only. Removing the implicit constructor prevents new call
sites from accidentally rebuilding permissions against the wrong `cwd`.
## What changed
- Removed `FileSystemSandboxContext::new(SandboxPolicy)`.
- Kept production callers on explicit constructors:
`from_legacy_sandbox_policy(..., cwd)`, `from_permission_profile(...)`,
and `from_permission_profile_with_cwd(...)`.
- Updated exec-server test helpers to construct `PermissionProfile`
values directly instead of routing through legacy `SandboxPolicy`
projections.
- Updated the environment regression test to use an explicit restricted
profile with no synthetic `cwd`.
## Verification
- `cargo test -p codex-exec-server`
- `just fix -p codex-exec-server`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19046).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* __->__ #19046
### Why
Auto-review is the user-facing name for the approvals reviewer, but the
config/API value still exposed the old `guardian_subagent` name. That
made new configs and generated schemas point users at Guardian
terminology even though the intended product surface is Auto-review.
This PR updates the external `approvals_reviewer` value while preserving
compatibility for existing configs and clients.
### What changed
- Makes `auto_review` the canonical serialized value for
`approvals_reviewer`.
- Keeps `guardian_subagent` accepted as a legacy alias.
- Keeps `user` accepted and serialized as `user`.
- Updates generated config and app-server schemas so
`approvals_reviewer` includes:
- `user`
- `auto_review`
- `guardian_subagent`
- Updates app-server README docs for the reviewer value.
- Updates analytics and config requirements tests for the canonical
auto_review value.
### Compatibility
Existing configs and API payloads using:
```toml
approvals_reviewer = "guardian_subagent"
```
continue to load and map to the Auto-review reviewer behavior.
New serialization emits:
```toml
approvals_reviewer = "auto_review"
```
This PR intentionally does not rename the [features].guardian_approval
key or broad internal Guardian symbols. Those are split out for a
follow-up PR to keep this migration small and avoid touching large
TUI/internal surfaces.
**Verification**
cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent
cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent
## Summary
Sync the bundled `openai-docs` system skill with the already-merged
`openai/skills` update from https://github.com/openai/skills/pull/360.
Codex bundles system skills from `codex-rs/skills/src/assets/samples`,
so this PR copies the same GPT-5.4 OpenAI Docs skill update into the
Codex app/CLI bundle path.
## Changes
- Add the latest-model resolver script to the bundled `openai-docs`
skill.
- Route model upgrade and prompt-upgrade requests through remote
latest-model metadata when current guidance is needed.
- Rename bundled fallback references to `upgrade-guide.md` and
`prompting-guide.md`.
- Keep the bundled fallback guidance GPT-5.4-only.
## Validation
- Verified this bundled skill is byte-for-byte identical to
`openai/skills@origin/main` `skills/.system/openai-docs`.
- Ran the resolver locally and confirmed it returns `gpt-5.4` /
`gpt-5p4`.
## Summary
- register `in_app_browser` and `browser_use` as stable feature keys
- allow requirements/MDM feature requirements to pin those desktop
browser controls
- add coverage for browser requirements being accepted by config loading
## Testing
- `cargo fmt --all` (`just fmt` unavailable locally; rustfmt warned
about nightly-only `imports_granularity` config)
- `cargo test -p codex-features`
- `cargo test -p codex-core browser_feature_requirements_are_valid`
- Tested manually by setting in `requirements.toml` and seeing after app
restart state to reflect the setting was correct (at the time hiding the
`Browser Use` setting when the enterprise setting was set to false
## Summary
- Factor the state DB `ThreadMetadata` to rollout `ThreadItem` mapping
into a shared helper used by both DB pages and filesystem overlays
- Generalize filtered filesystem list overlays to fill missing thread
list metadata from the state-derived `ThreadItem`, while preserving
filesystem `path` and `thread_id`
- Add coverage for the merge behavior so existing filesystem values are
not overwritten and future `ThreadItem` fields require an explicit
decision
## Testing
- `just fmt` from `codex-rs`
- `git diff --check -- codex-rs/rollout/src/recorder.rs
codex-rs/rollout/src/recorder_tests.rs`
- Attempted `cargo test -p codex-rollout thread_item_metadata` from
`codex-rs`; blocked in dependency fetch/setup after updating crates.io
and git submodules `https://github.com/livekit/protocol` and
`https://chromium.googlesource.com/libyuv/libyuv`, so the focused tests
did not run
## Why
The post-merge `rust-ci-full` run for #18999 still failed the Ubuntu
remote `suite::remote_env` sandboxed filesystem tests. That run checked
out merge commit `ddde50c611e4800cb805f243ed3c50bbafe7d011`, so the arg0
guard lifetime fix was present.
The Docker-backed failure had two remaining pieces:
- The sandboxed filesystem helper needs to execute Codex through the
`codex-linux-sandbox` arg0 alias path. The helper sandbox was only
granting read access to the real Codex executable parent, so the alias
parent also has to be visible inside the helper sandbox.
- The remote-env tests were building sandbox contexts with
`FileSystemSandboxContext::new()`, which captures the local test runner
cwd. In the Docker remote exec-server, that host checkout path does not
exist, so spawning the filesystem helper failed with `No such file or
directory` before the helper could process the request.
## What Changed
- Track all helper runtime read roots instead of a single root.
- Add both the real Codex executable parent and the
`codex-linux-sandbox` alias parent to sandbox readable roots.
- Avoid sending an unused local cwd in remote filesystem sandbox
contexts when the permission profile has no cwd-dependent entries.
- Build the Docker remote-env test sandbox contexts with a cwd path that
exists inside the container.
- Add unit coverage for the alias-parent root and remote sandbox cwd
handling.
## Verification
- `cargo test -p codex-exec-server`
- `cargo test -p codex-core
remote_test_env_sandboxed_read_allows_readable_root`
- `just fix -p codex-exec-server`
- `just fix -p codex-core`
###### Why/Context/Summary
Repro: start a session outside Full Access, switch permissions to Full
Access, then submit a new turn that triggers MCP/CUA permission
handling.
The turn used the live Full Access `SessionConfiguration`, but the MCP
coordinator was still synced from the stale `original_config_do_not_use`
/ per-turn config copy. That left the coordinator with an old sandbox
policy, so empty MCP permission elicitations could be denied instead of
auto-accepted.
Fix: update/rebuild the MCP connection manager from the live
turn/session approval and sandbox policy fields.
###### Test plan
```sh
just fmt
cargo test -p codex-core --lib
cargo test -p codex-core --lib mcp_tool_call::tests
```
## Summary
Give guardian network-access reviews the command context that triggered
a managed-network approval. The prompt JSON now includes the originating
tool call id, tool name, command argv, cwd, sandbox permissions,
additional permissions, justification, and tty state when a single
active tool call can be attributed.
The implementation keeps the trigger shape canonical by serializing
`GuardianNetworkAccessTrigger` directly and lets each runtime build that
trigger from its `ToolCtx`. Non-guardian approval prompts avoid cloning
the full trigger payload.
## UX changes
Guardian network-access reviews now include a `trigger` object that
explains what command caused the network approval. Instead of seeing
only the requested host, the guardian reviewer can also see the
originating tool call, argv, working directory, sandbox mode,
justification, and tty state.
Example payload the guardian reviewer can see:
```json
{
"tool": "network_access",
"target": "https://api.github.com:443",
"host": "api.github.com",
"protocol": "https",
"port": 443,
"trigger": {
"callId": "call_abc123",
"toolName": "shell",
"command": ["gh", "api", "/repos/openai/codex/pulls/18197"],
"cwd": "/workspace/codex",
"sandboxPermissions": "require_escalated",
"justification": "Fetch PR metadata from GitHub.",
"tty": false
}
}
```
The network review itself remains scoped to the network decision:
`target_item_id` stays `null`. `trigger.callId` is attribution context
only, so clients can still distinguish network reviews from
item-targeted command reviews.
## Verification
- Added coverage for serializing network trigger context in guardian
approval JSON.
- Added regression coverage that network guardian reviews do not reuse
`trigger.callId` as `target_item_id`.
---------
Co-authored-by: Codex <noreply@openai.com>
### Why
Remote streamable HTTP MCP needs the executor to perform ordinary HTTP
requests on the executor side. This keeps network placement aligned with
`experimental_environment = "remote"` without adding MCP-specific
executor APIs.
### What
- Add an executor-side `http/request` runner backed by `reqwest`.
- Validate request method and URL scheme, preserving the transport
boundary at plain HTTP.
- Return buffered responses for ordinary calls and emit ordered
`http/request/bodyDelta` notifications for streaming responses.
- Register the request handler in the exec-server router.
- Document the runner entrypoint, conversion helpers, body-stream
bridge, notification sender, timeout behavior, and new integration-test
helpers.
- Add exec-server integration tests with the existing websocket harness
and a local TCP HTTP peer for buffered and streamed responses, with
comments spelling out what each test proves and its
setup/exercise/assert phases.
### Stack
1. #18581 protocol
2. #18582 runner
3. #18583 RMCP client
4. #18584 manager wiring and local/remote coverage
### Verification
- `just fmt`
- `cargo check -p codex-exec-server -p codex-rmcp-client --tests`
- `cargo check -p codex-core --test all` compile-only
- `git diff --check`
- Online full CI is running from the `full-ci` branch, including the
remote Rust test job.
Co-authored-by: Codex <noreply@openai.com>
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`PermissionProfile` is becoming the canonical permissions shape shared
by core and app-server. After app-server responses expose the active
profile, clients need to be able to send that same shape back when
starting, resuming, forking, or overriding a turn instead of translating
through the legacy `sandbox`/`sandboxPolicy` shorthands.
This still needs to preserve the existing requirements/platform
enforcement model. A profile-shaped request can be downgraded or
rejected by constraints, but the server should keep the user's
elevated-access intent for project trust decisions. Turn-level profile
overrides also need to retain existing read protections, including
deny-read entries and bounded glob-scan metadata, so a permission
override cannot accidentally drop configured protections such as
`**/*.env = deny`.
## What changed
- Adds optional `permissionProfile` request fields to `thread/start`,
`thread/resume`, `thread/fork`, and `turn/start`.
- Rejects ambiguous requests that specify both `permissionProfile` and
the legacy `sandbox`/`sandboxPolicy` fields, including running-thread
resume requests.
- Converts profile-shaped overrides into core runtime filesystem/network
permissions while continuing to derive the constrained legacy sandbox
projection used by existing execution paths.
- Preserves project-trust intent for profile overrides that are
equivalent to workspace-write or full-access sandbox requests.
- Preserves existing deny-read entries and `globScanMaxDepth` when
applying turn-level `permissionProfile` overrides.
- Updates app-server docs plus generated JSON/TypeScript schema fixtures
and regression coverage.
## Verification
- `cargo test -p codex-app-server-protocol schema_fixtures`
- `cargo test -p codex-core
session_configuration_apply_permission_profile_preserves_existing_deny_read_entries`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18279).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* __->__ #18279
## Summary
Short circuit the convo if auto-review hits too many denials
## Testing
- [x] Added unit tests
---------
Co-authored-by: Codex <noreply@openai.com>
Preserve skill name/path entries whenever possible and trim descriptions
first, using round-robin character allocation so short descriptions do
not waste budget.
## Why
The Ubuntu GNU remote Cargo run has been regularly failing sandboxed
`suite::remote_env` filesystem tests with `No such file or directory`,
while the same cases pass under Bazel. The Cargo remote-env setup starts
`target/debug/codex exec-server` inside Docker via
`scripts/test-remote-env.sh`. That CLI builds `codex-linux-sandbox` and
other arg0 helper aliases in a temporary directory, then passes those
alias paths into the exec-server runtime.
`arg0_dispatch_or_else` constructed `Arg0DispatchPaths` from that
temporary alias guard, but then awaited the async CLI entry point
without otherwise keeping the guard live. That allowed the guard to be
dropped while the exec-server was still running, removing the helper
alias directory. Later sandboxed filesystem calls tried to spawn the
now-deleted `codex-linux-sandbox` path and surfaced as `ENOENT`.
The relevant distinction I found is that `core/tests/common` stores the
result of `arg0_dispatch()` in a process-lifetime
`OnceLock<Option<Arg0PathEntryGuard>>` for test binaries. The Cargo
remote-env setup exercises a real `codex exec-server` process instead,
so it depends on the normal CLI lifetime behavior fixed here.
## What Changed
- Keep the arg0 tempdir guard alive until `main_fn(paths).await`
completes.
- Keep the helper on the real `arg0_dispatch()` shape, where alias setup
can fail and return `None` in production.
- Add a regression test that uses an explicit guard, yields once, and
verifies the generated helper alias path still exists while the async
entry point is running.
## Verification
- `cargo test -p codex-arg0`
- `just argument-comment-lint -p codex-arg0`
- `just fix -p codex-arg0`
## Summary
This adds the structural plumbing needed for an app-server client to
approve a previously denied Guardian review and carry that approval
context into the next model turn.
This PR does not add the actual `/auto-review-denials` tool
## What Changed
- Added app-server v2 RPC `thread/approveGuardianDeniedAction`.
- Added generated JSON schema and TypeScript fixtures for
`ThreadApproveGuardianDeniedAction*`.
- Added core `Op::ApproveGuardianDeniedAction`.
- Added a core handler that validates the event is a denied Guardian
assessment and injects a developer message containing the stored denial
event JSON.
- Queues the approval context for the next turn if there is no active
turn yet.
- Added the TUI app-server bridge so `Op::ApproveGuardianDeniedAction {
event }` is routed to the app-server request.
## What This Does Not Do
- Does not add `/auto-review-denials`.
- Does not add chat widget recent-denial state.
- Does not add popup/list UI.
- Does not add a product-facing denial lookup/store.
- Does not change where Guardian denials are originally emitted or
persisted.
## Verification
- `cargo test -p codex-tui thread_approve_guardian_denied_action`
## Summary
Wires rollout trace recording into `codex-core` session and turn
execution. This records the core model request/response, compaction, and
session lifecycle boundaries needed for replay without yet tracing every
nested runtime/tool boundary.
## Stack
This is PR 2/5 in the rollout trace stack.
- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command
## Review Notes
This layer is the first live integration point. The important review
question is whether trace recording is isolated from normal session
behavior: trace failures should not become user-visible execution
failures, and recording should preserve the existing turn/session
lifecycle semantics.
The PR depends on the reducer/data model from the first stack entry and
only introduces the core recorder surface that later PRs use for richer
runtime and relationship events.
Addresses #18860
Problem: Remote app-server clients could stop draining websocket events
when their bounded local event channel filled, leaving clients stuck on
stale in-progress turns after a disconnect.
Solution: Use an unbounded local event channel for the remote client so
the websocket reader can keep forwarding disconnect and progress events
instead of blocking or dropping them.
Why this is reasonable: This does not make the remote websocket itself
unbounded. The changed queue lives inside the remote client, between the
task that reads the remote websocket and the API consumer in the same
client process. Once an event has been received from the remote server,
preserving it is preferable to blocking websocket reads or dropping
disconnect/lifecycle events; network-level backpressure still happens at
the websocket boundary if the remote side outpaces the client.
This is PR 2 of the Python SDK PyPI publishing split. [PR
1](https://github.com/openai/codex/pull/18862) refreshed the generated
SDK bindings; this PR makes the runtime package itself publishable, and
PR 3 will wire the SDK package/version pinning to this runtime package.
## Summary
- Rename the runtime distribution to `openai-codex-cli-bin` while
keeping the import package as `codex_cli_bin`.
- Make the runtime package wheel-only and build `py3-none-<platform>`
wheels instead of interpreter-specific wheels.
- Add `stage-runtime --codex-version` and `--platform-tag` so release
staging can produce the platform wheel matrix from Codex release tags.
- Add focused artifact workflow tests for version normalization,
platform tag injection, and runtime wheel metadata.
## Why Rename
There is already an unofficial PyPI package,
[`codex-bin`](https://pypi.org/project/codex-bin/), distributing OpenAI
Codex binaries. Publishing the official SDK runtime dependency as
`openai-codex-cli-bin` makes the ownership clear, avoids confusing the
SDK-pinned runtime wheel with that unowned wrapper, and keeps the import
package unchanged as `codex_cli_bin`.
## Tests
- `uv run --extra dev pytest
tests/test_artifact_workflow_and_binaries.py` -> 21 passed
- `uv run --extra dev python scripts/update_sdk_artifacts.py
stage-runtime /tmp/codex-python-pr2-rebased/runtime-stage
/tmp/codex-python-pr2-rebased/codex --codex-version
rust-v0.116.0-alpha.1 --platform-tag macosx_11_0_arm64`
- `uv run --with build --extra dev python -m build --wheel
/tmp/codex-python-pr2-rebased/runtime-stage`
- `uv run --with twine --extra dev twine check
/tmp/codex-python-pr2-rebased/runtime-stage/dist/openai_codex_cli_bin-0.116.0a1-py3-none-macosx_11_0_arm64.whl`
## Note
- Full `uv run --extra dev pytest` currently fails because regenerating
from schemas already on `main` adds new DeviceKey Python types. I left
that generated catch-up out of this runtime-only PR.
## Summary
This updates the embedded `imagegen` system skill in `codex-rs/skills`
with the ImageGen 2 skill changes from `openai/skills-internal#87`.
The bundled skill now keeps normal image generation/editing on the
built-in `image_gen` path, updates the CLI fallback defaults to
`gpt-image-2`, and routes explicit transparent-output requests through
`gpt-image-1.5` with clear guidance that `gpt-image-2` does not support
transparent backgrounds.
## Details
- Update `SKILL.md` routing guidance for built-in vs CLI fallback
behavior.
- Update CLI/API references for `gpt-image-2` size constraints, quality
options, near-4K sizes, and unsupported options.
- Update `scripts/image_gen.py` defaults and validation:
- default model `gpt-image-2`
- default size `auto`
- default quality `medium`
- reject transparent backgrounds on `gpt-image-2`
- reject `input_fidelity` on `gpt-image-2`
- validate flexible `gpt-image-2` sizes and suggest `3824x2160` /
`2160x3824` for near-4K requests
- Update prompt/reference docs with the new model and routing guidance.
## Validation
- `cargo test -p codex-skills`
- `git diff --check`
- Manual CLI dry-runs for:
- default `gpt-image-2` payload
- `3824x2160` near-4K size acceptance
- `3840x2160` rejection with near-4K guidance
- transparent background rejection on `gpt-image-2`
- transparent background acceptance on `gpt-image-1.5`
- `input_fidelity` rejection on `gpt-image-2`
Bazel target check was not run locally because `bazel` is not installed
in this environment.
## Why
`wait_agent` can be called while mailbox mail is already pending. The
previous implementation subscribed for future mailbox sequence changes
and then waited for the next notification. If the mail was queued before
that wait started, no new notification arrived, so the tool could sit
until `timeout_ms` even though mail was ready to deliver.
## What Changed
- Added `Session::has_pending_mailbox_items()` for checking pending
mailbox mail through the session API.
- Updated `multi_agents_v2::wait` to return immediately when pending
mailbox mail already exists before sleeping on a new mailbox sequence
update.
- Reworked the regression coverage in `multi_agents_tests.rs` so already
queued mailbox mail must wake `wait_agent` promptly.
Relevant code:
- [`wait_agent` pending-mail
check](aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_v2/wait.rs (L55-L60))
-
[`Session::has_pending_mailbox_items`](aa8ca06e83/codex-rs/core/src/session/mod.rs (L2979-L2981))
-
[`multi_agent_v2_wait_agent_returns_for_already_queued_mail`](aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_tests.rs (L2854))
## Verification
- `cargo test -p codex-core
multi_agent_v2_wait_agent_returns_for_already_queued_mail`
## Summary
- Teach app-server `thread/list` to accept either a single `cwd` or an
array of cwd filters, returning threads whose recorded session cwd
matches any requested path
- Add `useStateDbOnly` as an explicit opt-in fast path for callers that
want to answer `thread/list` from SQLite without scanning JSONL rollout
files
- Preserve backwards compatibility: by default, `thread/list` still
scans JSONL rollouts and repairs SQLite state
- Wire the new cwd array and SQLite-only options through app-server,
local/remote thread-store, rollout listing, generated TypeScript/schema
fixtures, proto output, and docs
## Test Plan
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-rollout`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-app-server thread_list`
- `just fmt`
- `just fix -p codex-app-server-protocol -p codex-rollout -p
codex-thread-store -p codex-app-server`
- `cargo build -p codex-cli --bin codex`
## Why
Guardian analytics includes time-to-first-token, but the Guardian
reviewer runs as a normal Codex session and `TurnCompleteEvent` did not
expose TTFT. The timing needs to flow through the standard
turn-completion protocol so Guardian review analytics can consume the
same value as the rest of the session machinery.
## What changed
Adds optional `time_to_first_token_ms` to `TurnCompleteEvent` and
populates it from `TurnTiming`. The value is carried through app-server
thread history, rollout reconstruction, TUI/app-server adapters, and
Guardian review session handling.
Guardian review analytics now captures TTFT from the reviewer
turn-complete event when available. Existing tests and fixtures are
updated to set the new optional field to `None` where TTFT is not
relevant.
## Verification
- `cargo clippy -p codex-tui --tests -- -D warnings`
- `cargo clippy -p codex-core --lib --tests -- -D warnings`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17696).
* __->__ #17696
* #17695
* #17693
* #18278
* #18953
## Why
The Guardian review event needs to report whether the action shown to
Guardian was truncated. That field should come from the same truncation
path used to build the Guardian prompt, rather than being inferred after
the fact.
## What changed
Plumbs truncation metadata through Guardian action formatting, prompt
construction, review session execution, and analytics emission.
`guardian_truncate_text` now reports both the rendered text and whether
it inserted the truncation marker, and `reviewed_action_truncated` is
set from that prompt-building result.
This keeps the analytics field aligned with the model-visible reviewed
action while preserving the existing Guardian prompt behavior.
## Verification
- Guardian truncation tests cover both truncated and non-truncated
action payloads.
- Guardian review tests assert the review session metadata and
truncation field are propagated.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17695).
* #17696
* __->__ #17695
* #17693
* #18278
* #18953
## Why
Guardian approvals now run as review sessions, but Codex analytics did
not have a terminal event for those reviews. That made it hard to
measure approval outcomes, failure modes, Guardian session reuse, model
metadata, token usage, and timing separately from the parent turn.
## What changed
Adds `codex_guardian_review` analytics emission for Guardian approval
reviews. The event is emitted from the Guardian review path with review
identity, target item id, approval request source, a PII-minimized
reviewed-action shape, terminal decision/status, failure reason,
Guardian assessment fields, Guardian session metadata, token usage, and
timing metadata.
The reviewed-action payload intentionally omits high-risk fields such as
shell commands, working directories, argv, file paths, network
targets/hosts, rationale, retry reason, and permission justifications.
It also classifies prompt-build failures separately from Guardian
session/runtime failures so fail-closed cases are distinguishable in
analytics.
## Verification
- Guardian review analytics tests cover terminal success,
timeout/cancel/fail-closed paths, session metadata, and token usage
plumbing.
- `cargo clippy -p codex-core --lib --tests -- -D warnings`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17693).
* #17696
* #17695
* __->__ #17693
## Why
The `PermissionProfile` migration needs app-server clients to see the
same constrained permission model that core is using at runtime. Before
this PR, thread lifecycle responses only exposed the legacy
`SandboxPolicy` shape, so clients still had to infer active permissions
from sandbox fields. That makes downstream resume, fork, and override
flows harder to make `PermissionProfile`-first.
External sandbox policies are intentionally excluded from this canonical
view. External enforcement cannot be round-tripped as a
`PermissionProfile`, and exposing a lossy root-write profile would let
clients accidentally change sandbox semantics if they echo the profile
back later.
## What changed
- Adds the app-server v2 `PermissionProfile` wire shape, including
filesystem permissions and glob scan depth metadata.
- Adds `PermissionProfileNetworkPermissions` so the profile response
does not expose active network state through the older
additional-permissions naming.
- Returns `permissionProfile` from thread start, resume, and fork
responses when the active sandbox can be represented as a
`PermissionProfile`.
- Keeps legacy `sandbox` in those responses for compatibility and
documents `permissionProfile` as canonical when present.
- Makes lifecycle `permissionProfile` nullable and returns `null` for
`ExternalSandbox` to avoid exposing a lossy profile.
- Regenerates the app-server JSON schema and TypeScript fixtures.
## Verification
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server
thread_response_permission_profile_omits_external_sandbox --
--nocapture`
- `cargo check --tests -p codex-analytics -p codex-exec -p codex-tui`
- `just fix -p codex-app-server-protocol -p codex-app-server -p
codex-analytics -p codex-exec -p codex-tui`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18278).
* #18279
* __->__ #18278
`codex sandbox windows` previously did a one-shot spawn for all
commands.
This change uses the `unified_exec` session to spawn long-lived
processes instead, and implements a simple bridge to forward stdin to
the spawned session and stdout/stderr from the spawned session back to
the caller.
It also fixes a bug with the new shared spawn context code where the
"no-network env" was being applied to both elevated and unelevated
sandbox spawns. It should only be applied for the unelevated sandbox
because the elevated one uses firewall rules instead of an env-based
network suppression strategy.
## Summary
This PR adds `CodexAuth::AgentIdentity` as an explicit auth mode.
An AgentIdentity auth record is a standalone `auth.json` mode. When
`AuthManager::auth().await` loads that mode, it registers one
process-scoped task and stores it in runtime-only state on the auth
value. Header creation stays synchronous after that because the task is
initialized before callers receive the auth object.
This PR also removes the old feature flag path. AgentIdentity is
selected by explicit auth mode, not by a hidden flag or lazy mutation of
ChatGPT auth records.
Reference old stack: https://github.com/openai/codex/pull/17387/changes
## Design Decisions
- AgentIdentity is a real auth enum variant because it can be the only
credential in `auth.json`.
- The process task is ephemeral runtime state. It is not serialized and
is not stored in rollout/session data.
- Account/user metadata needed by existing Codex backend checks lives on
the AgentIdentity record for now.
- `is_chatgpt_auth()` remains token-specific.
- `uses_codex_backend()` is the broader predicate for ChatGPT-token auth
and AgentIdentity auth.
## Stack
1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. This PR: explicit AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
and load `CODEX_AGENT_IDENTITY`
## Testing
Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
## Why
`Permissions` should not store a separate `PermissionProfile` that can
drift from the constrained `SandboxPolicy` and network settings. The
active profile needs to be derived from the same constrained values that
already honor `requirements.toml`.
## What changed
This adds derivation of the active `PermissionProfile` from the
constrained runtime permission settings and exposes that derived value
through config snapshots and thread state. The app-server can then
report the active profile without introducing a second source of truth.
## Verification
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
- `cargo test -p codex-core --test all request_permissions --
--nocapture`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18277).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* __->__ #18277
## Summary
The Bedrock Mantle SigV4 auth provider currently looks like it can
lazily load `AwsAuthContext`, but the provider is only constructed after
`resolve_auth_method` has already loaded that context. Because
`with_context` always pre-populates the `OnceCell`, the
`get_or_try_init` fallback is unused in normal operation and makes the
provider lifecycle harder to reason about.
This change removes that dead lazy-loading path and makes the actual
behavior explicit:
- `BedrockAuthMethod::AwsSdkAuth` carries only the resolved
`AwsAuthContext`.
- `BedrockMantleSigV4AuthProvider` stores the resolved context directly.
- request signing uses the stored context without going through
`OnceCell`.
The existing eager AWS auth resolution behavior is unchanged; this is a
simplification of the provider state, not a behavior change.
## Testing
- `cargo shear`
- `cargo test -p codex-model-provider`
- `just bazel-lock-check`
## Summary
- Keep the guardian policy installed as guardian base instructions.
- Clear inherited parent `developer_instructions` for guardian review
sessions.
- Update guardian config tests to assert developer instructions are
cleared and policy text is sourced from base instructions.
## Why
Guardian review sessions are intended to run under an isolated guardian
policy. Because the guardian config is cloned from the parent config,
inherited custom or managed developer instructions could otherwise
remain active and conflict with guardian review behavior.
## Validation
- `just fmt`
- `cargo test -p codex-core guardian_review_session_config`
Co-authored-by: Codex <noreply@openai.com>
## Why
A [Windows Cargo
build](https://github.com/openai/codex/actions/runs/24754807756/job/72425641062)
on `main` timed out in several unrelated-looking suites at the same
time:
- `codex-app-server` account tests failed before account logic, while
`mcp.initialize()` was waiting for the first JSON-RPC response.
- `codex-core` `apply_patch_cli` tests timed out while running full
Codex/apply_patch turns.
- `codex-windows-sandbox` legacy session tests timed out while creating
restricted-token child processes and private desktops.
The app-server log reached the test harness write path in
[`McpProcess::initialize_with_params`](731b54d08f/codex-rs/app-server/tests/common/mcp_process.rs (L244-L263)),
but never printed the matching stdout read from
[`read_jsonrpc_message`](731b54d08f/codex-rs/app-server/tests/common/mcp_process.rs (L1123-L1128)).
The server initialize handler is a small bookkeeping/response path
([`message_processor.rs`](731b54d08f/codex-rs/app-server/src/message_processor.rs (L601-L728))),
so the failure looks like Windows runner process/pipe scheduling
starvation rather than account-specific behavior.
## What Changed
This updates `.config/nextest.toml` to serialize two process-heavy sets:
- `codex-core` tests matching `package(codex-core) & kind(test) &
test(apply_patch_cli)`
- `codex-windows-sandbox` tests matching `package(codex-windows-sandbox)
& test(legacy_)`
`codex-app-server` integration tests were already serialized inside
their own package; this change reduces overlap with the other suites
that were saturating the runner at the same time.
## Verification
- `cargo nextest list --filterset "package(codex-core) & kind(test) &
test(apply_patch_cli)"`
- `cargo nextest list --filterset "package(codex-windows-sandbox) &
test(legacy_)"`
The Windows sandbox filter naturally lists no tests on macOS, but it
validates the nextest filter/config syntax locally.
## Why
The fast `rust-ci` workflow decides whether to run the cross-platform
`argument-comment-lint` job based on changed paths. PRs that touch
Rust-adjacent Bazel wrapper files, such as `defs.bzl` or
`workspace_root_test_launcher.*.tpl`, can change how Rust tests and lint
targets behave without changing any `.rs` files.
When that detector returned false, GitHub skipped the matrix job before
expanding it. That produced a single skipped check named `Argument
comment lint - ${{ matrix.name }}` instead of the Linux, macOS, and
Windows check names that branch protection expects, leaving the PR
unable to go green when those matrix checks are required.
## What Changed
- Treat root Bazel wrapper files as `argument-comment-lint` relevant
changes.
- Keep the `argument_comment_lint_prebuilt` matrix job materialized for
every PR so the per-platform check names always exist.
- Add a single gate step that decides whether the real lint work should
run.
- Move the checkout-adjacent Bazel setup and OS-specific lint commands
into `.github/actions/run-argument-comment-lint/action.yml` so the
workflow does not repeat the same path-detection condition on each step.
## Verification
- Parsed `.github/workflows/rust-ci.yml` and
`.github/actions/run-argument-comment-lint/action.yml` with Python YAML
loading.
- Simulated the workflow path-matching shell conditions for the root
Bazel wrapper files and confirmed they set `argument_comment_lint=true`.
## Why
The exec-server still needs platform sandbox inputs, but the migration
should preserve the `PermissionProfile` that produced them. Keeping only
the derived legacy sandbox map would keep `SandboxPolicy` as the
effective abstraction and would make full-disk vs. restricted profiles
harder to preserve as the permissions stack starts round-tripping
profiles.
`PermissionProfile` entries can also be cwd-sensitive (`:cwd`,
`:project_roots`, relative globs), so the exec-server must carry the
request sandbox cwd instead of resolving those entries against the
long-lived exec-server process cwd.
## What changed
`FileSystemSandboxContext` now carries `permissions: PermissionProfile`
plus an optional `cwd`:
- removed `sandboxPolicy`, `sandboxPolicyCwd`,
`fileSystemSandboxPolicy`, and `additionalPermissions`
- added `permissions` and `cwd`
- kept the platform knobs `windowsSandboxLevel`,
`windowsSandboxPrivateDesktop`, and `useLegacyLandlock`
Core turn and apply-patch paths populate the context from the active
runtime permissions and request cwd. Exec-server derives platform
`SandboxPolicy`/`FileSystemSandboxPolicy` at the filesystem boundary,
adds helper runtime reads there, and rejects cwd-dependent profiles that
arrive without a cwd.
The legacy `FileSystemSandboxContext::new(SandboxPolicy)` constructor
now preserves the old workspace-write conversion semantics for
compatibility tests/callers.
## Verification
- `cargo test -p codex-exec-server`
- `cargo test -p codex-exec-server sandbox_cwd -- --nocapture`
- `cargo test -p codex-exec-server
sandbox_context_new_preserves_legacy_workspace_write_read_only_subpaths
-- --nocapture`
- `cargo test -p codex-core --lib
file_system_sandbox_context_uses_active_attempt -- --nocapture`
## Why
A Mac Bazel CI run saw `remote_notifications_arrive_over_websocket` fail
during shutdown with `remote app-server shutdown channel is closed`
(https://app.buildbuddy.io/invocation/9dac05d6-ae20-40f9-b627-fca6e91cf127).
The remote websocket worker can legitimately finish while `shutdown()`
is waiting for the shutdown acknowledgement: after the test server sends
a notification and exits, the worker may deliver the required disconnect
event, observe that the caller has dropped the event receiver, and exit
before it sends the shutdown one-shot.
That state is already terminal cleanup, not a failed shutdown, so
callers should not see a `BrokenPipe` from the acknowledgement channel.
## What Changed
- Treat a closed remote shutdown acknowledgement as an already-exited
worker while still propagating websocket close errors when the worker
returns them.
- Added a deterministic regression test for the interleaving where the
shutdown command is received and the worker exits before replying.
## Verification
- `cargo test -p codex-app-server-client`
- New test:
`remote::tests::shutdown_tolerates_worker_exit_after_command_is_queued`
Add a temporary internal remote_plugin feature flag that merges remote
marketplaces into plugin/list and routes plugin/read through the remote
APIs when needed, while keeping pure local marketplaces working as
before.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The `codex-tui` Cargo test suite was catching stale snapshot
expectations, but the matching Bazel unit-test target was still green.
The TUI unit target is wrapped by `workspace_root_test` so tests run
from the repository root and Insta can resolve Cargo-like snapshot
paths. After native Bazel sharding was enabled for that wrapped target,
rules_rust also inserted its own sharding wrapper around the Rust test
binary.
Those two wrappers did not compose: rules_rust's sharding wrapper
expects to run from its own runfiles cwd, while `workspace_root_test`
deliberately changes cwd to the repo root before invoking the test. In
that configuration, the inner wrapper could fail to enumerate the Rust
tests and exit successfully with empty shards, so snapshot regressions
were not being exercised by Bazel.
## What Changed
- Stop enabling rules_rust's inner `experimental_enable_sharding` for
unit-test binaries created by `codex_rust_crate`.
- Keep the configured `shard_count` on the outer `workspace_root_test`
target.
- Add libtest sharding directly to `workspace_root_test_launcher.sh.tpl`
and `workspace_root_test_launcher.bat.tpl` after the launcher has
resolved the actual test binary and established the intended
repository-root cwd.
- Partition tests by a stable FNV-1a hash of each libtest test name,
matching the stable-shard behavior we wanted without depending on the
inner rules_rust wrapper.
- Preserve ad-hoc local test filters by running the resolved test binary
directly when explicit test args are supplied.
- On Windows, run selected libtest names from the shard list in bounded
PowerShell batches instead of concatenating every selected test into one
`cmd.exe` command line.
This PR is stacked on top of #18912, which contains only the snapshot
expectation updates exposed once the Bazel target actually runs the TUI
unit tests. It is also the reason #18916 becomes visible: once this
wrapper fix makes Bazel execute the affected `codex-core` test, that
test needs its own executable-path setup fixed.
## Verification
- `cargo test -p codex-tui`
- `bazel test //codex-rs/tui:tui-unit-tests --test_output=errors`
- `bazel test //codex-rs/tui:all --test_output=errors`
- `bash -n workspace_root_test_launcher.sh.tpl`
- Exercised the Windows PowerShell batching fragment locally with a fake
test binary and shard-list file.
## Summary
Add first-class Amazon Bedrock Mantle provider support so Codex can keep
using its existing Responses API transport with OpenAI-compatible
AWS-hosted endpoints such as AOA/Mantle.
This is needed for the AWS launch path, where provider traffic should
authenticate with AWS credentials instead of OpenAI bearer credentials.
Requests are authenticated immediately before transport send, so SigV4
signs the final method, URL, headers, and body bytes that `reqwest` will
send.
## What Changed
- Added a new `codex-aws-auth` crate for loading AWS SDK config,
resolving credentials, and signing finalized HTTP requests with AWS
SigV4.
- Added a built-in `amazon-bedrock` provider that targets Bedrock Mantle
Responses endpoints, defaults to `us-east-1`, supports region/profile
overrides, disables WebSockets, and does not require OpenAI auth.
- Added Amazon Bedrock auth resolution in `codex-model-provider`: prefer
`AWS_BEARER_TOKEN_BEDROCK` when set, otherwise use AWS SDK credentials
and SigV4 signing.
- Added `AuthProvider::apply_auth` and `Request::prepare_body_for_send`
so request-signing providers can sign the exact outbound request after
JSON serialization/compression.
- Determine the region by taking the `aws.region` config first (required
for bearer token codepath), and fallback to SDK default region.
## Testing
Amazon Bedrock Mantle Responses paths:
- Built the local Codex binary with `cargo build`.
- Verified the custom proxy-backed `aws` provider using `env_key =
"AWS_BEARER_TOKEN_BEDROCK"` streamed raw `responses` output with
`response.output_text.delta`, `response.completed`, and `mantle-env-ok`.
- Verified a full `codex exec --profile aws` turn returned
`mantle-env-ok`.
- Confirmed the custom provider used the bearer env var, not AWS profile
auth: bogus `AWS_PROFILE` still passed, empty env var failed locally,
and malformed env var reached Mantle and failed with `401
invalid_api_key`.
- Verified built-in `amazon-bedrock` with `AWS_BEARER_TOKEN_BEDROCK` set
passed despite bogus AWS profiles, returning `amazon-bedrock-env-ok`.
- Verified built-in `amazon-bedrock` SDK/SigV4 auth passed with
`AWS_BEARER_TOKEN_BEDROCK` unset and temporary AWS session env
credentials, returning `amazon-bedrock-sdk-env-ok`.
## Why
`build_prompt_input` now initializes `ExecServerRuntimePaths`, which
requires a configured Codex executable path. The previous inline unit
test in `core/src/prompt_debug.rs` built a bare `test_config()` and then
failed before it could assert anything useful:
```text
Codex executable path is not configured
```
This coverage is also integration-shaped: it drives the public
`build_prompt_input` entry point through config, thread, and session
setup rather than testing a small internal helper in isolation.
Bazel CI did not catch this earlier because the affected test was behind
the same wrapped Rust unit-test path fixed by #18913. Before that
launcher/sharding fix, the outer `workspace_root_test` changed the
working directory for Insta compatibility while the inner `rules_rust`
sharding wrapper still expected its runfiles working directory. In
practice, Bazel could report success without executing the Rust test
cases in that shard. Once #18913 makes the wrapper run the Rust test
binary directly and shard with libtest arguments, this stale unit test
actually runs and exposes the missing `codex_self_exe` setup.
## What Changed
- Moved `build_prompt_input_includes_context_and_user_message` out of
`core/src/prompt_debug.rs`.
- Added `core/tests/suite/prompt_debug_tests.rs` and registered it from
`core/tests/suite/mod.rs`.
- Builds the test config with `ConfigBuilder` and provides
`codex_self_exe` using the current test executable, matching the
runtime-path invariant required by prompt debug setup.
- Preserves the existing assertions that the generated prompt input
includes both the debug user message and project-specific user
instructions.
## Verification
- `cargo test -p codex-core --test all
prompt_debug_tests::build_prompt_input_includes_context_and_user_message`
- `bazel test //codex-rs/core:core-all-test
--test_arg=prompt_debug_tests::build_prompt_input_includes_context_and_user_message
--test_output=errors`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18916).
* #18913
* __->__ #18916
Fixes https://github.com/openai/codex/issues/16732.
## Why
`apply_patch` is Codex's primary file edit path, but it was not emitting
`PreToolUse` or `PostToolUse` hook events. That meant hook-based policy,
auditing, and write coordination could observe shell commands while
missing the actual file mutation performed by `apply_patch`.
The issue also exposed that the hook runtime serialized command hook
payloads with `tool_name: "Bash"` unconditionally. Even if `apply_patch`
supplied hook payloads, hooks would either fail to match it directly or
receive misleading stdin that identified the edit as a Bash tool call.
## What Changed
- Added `PreToolUse` and `PostToolUse` payload support to
`ApplyPatchHandler`.
- Exposed the raw patch body as `tool_input.command` for both
JSON/function and freeform `apply_patch` calls.
- Taught tool hook payloads to carry a handler-supplied hook-facing
`tool_name`.
- Preserved existing shell compatibility by continuing to emit `Bash`
for shell-like tools.
- Serialized the selected hook `tool_name` into hook stdin instead of
hardcoding `Bash`.
- Relaxed the generated hook command input schema so `tool_name` can
represent tools other than `Bash`.
## Verification
Added focused handler coverage for:
- JSON/function `apply_patch` calls producing a `PreToolUse` payload.
- Freeform `apply_patch` calls producing a `PreToolUse` payload.
- Successful `apply_patch` output producing a `PostToolUse` payload.
- Shell and `exec_command` handlers continuing to expose `Bash`.
Added end-to-end hook coverage for:
- A `PreToolUse` hook matching `^apply_patch$` blocking the patch before
the target file is created.
- A `PostToolUse` hook matching `^apply_patch$` receiving the patch
input and tool response, then adding context to the follow-up model
request.
- Non-participating tools such as the plan tool continuing not to emit
`PreToolUse`/`PostToolUse` hook events.
Also validated manually with a live `codex exec` smoke test using an
isolated temp workspace and temp `CODEX_HOME`. The smoke test confirmed
that a real `apply_patch` edit emits `PreToolUse`/`PostToolUse` with
`tool_name: "apply_patch"`, a shell command still emits `tool_name:
"Bash"`, and a denying `PreToolUse` hook prevents the blocked patch file
from being created.
## Summary
- add experimental turn/start.environments params for per-turn
environment id + cwd selections
- pass selections through core protocol ops and resolve them with
EnvironmentManager before TurnContext creation
- treat omitted selections as default behavior, empty selections as no
environment, and non-empty selections as first environment/cwd as the
turn primary
## Testing
- ran `just fmt`
- ran `just write-app-server-schema`
- not run: unit tests for this stacked PR
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
#18275 anchors session-scoped `:cwd` and `:project_roots` grants to the
request cwd before recording them for reuse. Relative deny glob entries
need the same treatment. Without anchoring, a stored session permission
can keep a pattern such as `**/*.env` relative, then reinterpret that
deny against a later turn cwd. That makes the persisted profile depend
on the cwd at reuse time instead of the cwd that was reviewed and
approved.
## What changed
`intersect_permission_profiles` now materializes retained
`FileSystemPath::GlobPattern` entries against the request cwd, matching
the existing materialization for cwd-sensitive special paths.
Materialized accepted grants are now deduplicated before deny retention
runs. This keeps the sticky-grant preapproval shape stable when a
repeated request is merged with the stored grant and both `:cwd = write`
and the materialized absolute cwd write are present.
The preapproval check compares against the same materialized form, so a
later request for the same cwd-relative deny glob still matches the
stored anchored grant instead of re-prompting or rejecting.
Tests cover both the storage path and the preapproval path: a
session-scoped `:cwd = write` grant with `**/*.env = none` is stored
with both the cwd write and deny glob anchored to the original request
cwd, cannot be reused from a later cwd, and remains preapproved when
re-requested from the original cwd after merging with the stored grant.
## Verification
- `cargo test -p codex-sandboxing policy_transforms`
- `cargo test -p codex-core --lib
relative_deny_glob_grants_remain_preapproved_after_materialization`
- `cargo clippy -p codex-sandboxing --tests -- -D
clippy::redundant_clone`
- `cargo clippy -p codex-core --lib -- -D clippy::redundant_clone`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18867).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* #18277
* #18276
* __->__ #18867
## Summary
- tighten the external migration prompt snapshot around stable synthetic
fixture text
- add focused display_description tests for relative path rewriting and
plugin summaries
- split the path-format assertions into smaller, easier-to-read unit
tests
## Why
The previous prompt snapshot was coupled to path text that came from
detected migration items, which made it noisier and more brittle than
necessary. This change keeps the snapshot focused on stable UI structure
and moves dynamic path formatting checks into targeted unit tests.
## Validation
- cargo test -p codex-tui external_agent_config_migration::tests::
- cargo test -p codex-tui
external_agent_config_migration::tests::display_description_
- just fmt
## Notes
Per the repo instructions, I did not rerun tests after the final `just
fmt` pass.
This change aligns the `/statusline` and `/title` UIs around the same
normalized item model so both surfaces use consistent ids, labels, and
preview semantics. It keeps the shared preview work from #18435 ,
tightens the remaining mismatches by standardizing item naming, expands
title/status item coverage where appropriate, and makes `/title` preview
use the same title-specific formatting path as the real rendered
terminal title.
- Normalizes persisted item ids and keeps legacy aliases for
compatibility
- Aligns `status-line` and `terminal-title` items with the shared
preview model
- Routes `terminal-title` preview through title-specific formatting and
truncation
- Updates the affected status/title setup snapshots
Added to `/statusline`:
- status
- task-progress
Normalized in `/statusline`:
- model-name -> model
- project-root -> project-name
Added to `/title`:
- current-dir
- context-remaining
- context-used
- five-hour-limit
- weekly-limit
- codex-version
- used-tokens
- total-input-tokens
- total-output-tokens
- session-id
- fast-mode
- model-with-reasoning
Normalized in `/title`:
- project -> project-name
- thread -> thread-title
- model-name -> model
## Summary
Allow guardian to skip other fields and output only
`{"outcome":"allow"}` when the command is low risk.
This change lets guardian reviews use a non-strict text format while
keeping the JSON schema itself as plain user-visible schema data, so
transport strictness is carried out-of-band instead of through a schema
marker key.
## What changed
- Add an explicit `output_schema_strict` flag to model prompts and pass
it into `codex-api` text formatting.
- Set guardian reviewer prompts to non-strict schema validation while
preserving strict-by-default behavior for normal callers.
- Update the guardian output contract so definitely-low-risk decisions
may return only `{"outcome":"allow"}`.
- Treat bare allow responses as low-risk approvals in the guardian
parser.
- Add tests and snapshots covering the non-strict guardian request and
optional guardian output fields.
## Verification
- `cargo test -p codex-core guardian::tests::guardian`
- `cargo test -p codex-core guardian::tests::`
- `cargo test -p codex-core client_common::tests::`
- `cargo test -p codex-protocol
user_input_serialization_includes_final_output_json_schema`
- `cargo test -p codex-api`
- `git diff --check`
Note: `cargo test -p codex-core` was also attempted, but this desktop
environment injects ambient config/proxy state that causes unrelated
config/session tests expecting pristine defaults to fail.
---------
Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
Adds the standalone `codex-rollout-trace` crate, which defines the raw
trace event format, replay/reduction model, writer, and reducer logic
for reconstructing model-visible conversation/runtime state from
recorded rollout data.
The crate-level design is documented in
[`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md).
## Stack
This is PR 1/5 in the rollout trace stack.
- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command
## Review Notes
This PR intentionally does not wire tracing into live Codex execution.
It establishes the data model and reducer contract first, with
crate-local tests covering conversation reconstruction, compaction
boundaries, tool/session edges, and code-cell lifecycle reduction. Later
PRs emit into this model.
The README is the best entry point for reviewing the intended trace
format and reduction semantics before diving into the reducer modules.
## Summary
- Adds a process-local, in-memory cookie store for ChatGPT HTTP clients.
- Limits cookie storage and replay to a shared ChatGPT host allowlist.
- Wires the shared store into the default Codex reqwest client and
backend client.
- Shares the ChatGPT host allowlist with remote-control URL validation
to avoid drift.
- Enables reqwest cookie support and updates lockfiles.
## Summary
This PR fully reverts the previously merged Agent Identity runtime
integration from the old stack:
https://github.com/openai/codex/pull/17387/changes
It removes the Codex-side task lifecycle wiring, rollout/session
persistence, feature flag plumbing, lazy `auth.json` mutation,
background task auth paths, and request callsite changes introduced by
that stack.
This leaves the repo in a clean pre-AgentIdentity integration state so
the follow-up PRs can reintroduce the pieces in smaller reviewable
layers.
## Stack
1. This PR: full revert
2. https://github.com/openai/codex/pull/18871: move Agent Identity
business logic into a crate
3. https://github.com/openai/codex/pull/18785: add explicit
AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate auth callsites
through AuthProvider
## Testing
Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
## Why
The device-key protocol needs an app-server implementation that keeps
local key operations behind the same request-processing boundary as
other v2 APIs.
app-server owns request dispatch, transport policy, documentation, and
JSON-RPC error shaping. `codex-device-key` owns key binding, validation,
platform provider selection, and signing mechanics. Keeping the adapter
thin makes the boundary easier to review and avoids moving local
key-management details into thread orchestration code.
## What changed
- Added `DeviceKeyApi` as the app-server adapter around
`DeviceKeyStore`.
- Converted protocol protection policies, payload variants, algorithms,
and protection classes to and from the device-key crate types.
- Encoded SPKI public keys and DER signatures as base64 protocol fields.
- Routed `device/key/create`, `device/key/public`, and `device/key/sign`
through `MessageProcessor`.
- Rejected remote transports before provider access while allowing local
`stdio` and in-process callers to reach the device-key API.
- Added stdio, in-process, and websocket tests for device-key validation
and transport policy.
- Documented the device-key methods in the app-server v2 method list.
## Test coverage
- `device_key_create_rejects_empty_account_user_id`
- `in_process_allows_device_key_requests_to_reach_device_key_api`
- `device_key_methods_are_rejected_over_websocket`
## Stack
This is PR 3 of 4 in the device-key app-server stack. It is stacked on
#18429.
## Validation
- `cargo test -p codex-app-server device_key`
- `just fix -p codex-app-server`
## Summary
Adds main-chat shortcuts for changing reasoning effort one step at a
time:
- `Alt+,` lowers reasoning (has the `<` arrow on the key)
- `Alt+.` raises reasoning (similarly, has the `>` arrow)
The shortcut updates the active session only. It does not persist the
selected reasoning level as the default for future sessions. In Plan
mode, it applies temporarily to Plan mode without opening the
global-vs-Plan scope prompt.
## Details
The shortcut uses the active model preset to decide which reasoning
levels are valid. If the current session has no explicit reasoning
effort, it starts from the model default. Each keypress moves to the
next supported level in the requested direction.
The shortcut only runs from the main chat surface. If a popup or modal
is open, input remains owned by that UI.
In Plan mode, the shortcut updates the in-memory Plan reasoning override
directly. The model/reasoning picker still keeps the existing scope
prompt for explicit picker changes.
## Notes
Ctrl-plus and Ctrl-minus were considered, but terminals do not deliver
those combinations consistently, so this PR uses Alt shortcuts instead.
If the current effort is unsupported by the selected model, the shortcut
skips to the nearest supported level in the requested direction. If
there is no valid step, it shows the existing boundary message.
## Tests
- `cargo test -p codex-tui reasoning_shortcuts`
- `cargo test -p codex-tui reasoning_effort`
- `cargo test -p codex-tui reasoning_shortcut`
- `cargo test -p codex-tui footer_snapshots`
- `cargo test -p codex-tui`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui -- --tests`
---------
Co-authored-by: Eric Traut <etraut@openai.com>
## Why
PR #18431 exposed a Bazel clippy failure in the app-server unit-test
target across Linux, macOS, and Windows. The failing lint was
`clippy::await_holding_invalid_type`: two tracing tests serialized
access to global tracing state by holding a `tokio::sync::MutexGuard`
across awaited test work.
That serialization is still needed because the tests share
process-global tracing setup and exporter state, but it should not
require holding an async mutex guard through the whole test body.
## What changed
- Replaced the bespoke async `tracing_test_guard` helper with
`serial_test` on the two tracing tests that need global tracing
serialization.
- Removed the `#[expect(clippy::await_holding_invalid_type)]`
annotations and the lock guard callsites that Bazel clippy rejected.
## Validation
- `cargo test -p codex-app-server jsonrpc_span`
- `just fix -p codex-app-server`
- `git diff --check`
I also attempted the exact failing Bazel clippy target locally with
BuildBuddy disabled: `bazel --noexperimental_remote_repo_contents_cache
build --config=clippy --bes_backend= --remote_cache=
--experimental_remote_downloader= --
//codex-rs/app-server:app-server-unit-tests-bin`. That run did not reach
clippy because Bazel timed out downloading `libcap-2.27.tar.gz` from
`kernel.org`.
## Why
Device-key storage and signing are local security-sensitive operations
with platform-specific behavior. Keeping the core API in
`codex-device-key` keeps app-server focused on routing and business
logic instead of owning key-management details.
The crate keeps the signing surface intentionally narrow: callers can
create a bound key, fetch its public key, or sign one of the structured
payloads accepted by the crate. It does not expose a generic
arbitrary-byte signing API.
Key IDs cross into platform-specific labels, tags, and metadata paths,
so externally supplied IDs are constrained to the same auditable
namespace created by the crate: `dk_` followed by unpadded base64url for
32 bytes. Remote-control target paths are also tied to each signed
payload shape so connection proofs cannot be reused for enrollment
endpoints, or vice versa.
## What changed
- Added the `codex-device-key` workspace crate.
- Added account/client-bound key creation with stable `dk_` key IDs.
- Added strict `key_id` validation before public-key lookup or signing
reaches a provider.
- Added public-key lookup and structured signing APIs.
- Split remote-control client endpoint allowlists by connection vs
enrollment payload shape.
- Added validation for key bindings, accepted payload fields, token
expiration, and payload/key binding mismatches.
- Added flow-oriented docs on the validation helpers that gate provider
signing.
- Added protection policy and protection-class types without wiring a
platform provider yet.
- Added an unsupported default provider so platforms without an
implementation fail explicitly instead of silently falling back to
software-backed keys.
- Updated Cargo and Bazel lock metadata for the new crate and
non-platform-specific dependencies.
## Stack
This is stacked on #18428.
## Validation
- `cargo test -p codex-device-key`
- Added unit coverage for strict `key_id` validation before provider
use.
- Added unit coverage that rejects remote-control paths from the wrong
signed payload shape.
- `just bazel-lock-update`
- `just bazel-lock-check`
## Summary
This is the runtime/foundation half of the Windows sandbox unified-exec
work.
- add Windows sandbox `unified_exec` session support in
`windows-sandbox-rs` for both:
- the legacy restricted-token backend
- the elevated runner backend
- extend the PTY/process runtime so driver-backed sessions can support:
- stdin streaming
- stdout/stderr separation
- exit propagation
- PTY resize hooks
- add Windows sandbox runtime coverage in `codex-windows-sandbox` /
`codex-utils-pty`
This PR does **not** enable Windows sandbox `UnifiedExec` for product
callers yet because hooking this up to app-server comes in the next PR.
Windows sandbox advertising is intentionally kept aligned with `main`,
so sandboxed Windows callers still fall back to `ShellCommand`.
This PR isolates the runtime/session layer so it can be reviewed
independently from product-surface enablement.
---------
Co-authored-by: jif-oai <jif@openai.com>
Co-authored-by: Codex <noreply@openai.com>
This is the first step in splitting the Python SDK PyPI publish work
into reviewable layers: land the generated SDK refresh by itself before
changing packaging mechanics. The next PRs will make the runtime wheel
publishable, then wire the SDK package/version pinning to that runtime.
## Summary
- Refresh generated Python app-server v2 models and notification
registry from the current schema.
- Update the public API signature expectations for the newly generated
kwargs.
## Stack
- PR 1 of 3 for the Python SDK PyPI publishing split.
- Follow-up PRs will handle runtime wheel publishing mechanics, then
SDK/package version pinning.
## Tests
- `uv run --extra dev pytest` in `sdk/python` -> 51 passed, 37 skipped.
## Why
Permission approval responses must not be able to grant more access than
the tool requested. Moving this flow to `PermissionProfile` means the
comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and
cwd-relative special paths such as `:cwd` and `:project_roots` must stay
anchored to the turn that produced the request.
## What changed
This implements semantic `PermissionProfile` intersection in
`codex-sandboxing` for file-system and network permissions. The
intersection accepts narrower path grants, rejects broader grants,
preserves deny-read carve-outs and glob scan depth, and materializes
cwd-dependent special-path grants to absolute paths before they can be
recorded for reuse.
The request-permissions response paths now use that intersection
consistently. App-server captures the request turn cwd before waiting
for the client response, includes that cwd in the v2 approval params,
and core stores the requested profile plus cwd for direct TUI/client
responses and Guardian decisions before recording turn- or
session-scoped grants. The TUI app-server bridge now preserves the
app-server request cwd when converting permission approval params into
core events.
## Verification
- `cargo test -p codex-sandboxing intersect_permission_profiles --
--nocapture`
- `cargo test -p codex-app-server request_permissions_response --
--nocapture`
- `cargo test -p codex-core
request_permissions_response_materializes_session_cwd_grants_before_recording
-- --nocapture`
- `cargo check -p codex-tui --tests`
- `cargo check --tests`
- `cargo test -p codex-tui
app_server_request_permissions_preserves_file_system_permissions`
## Summary
The TUI app refactor in #18753 moved the old `app.rs` tests into a
single `app/tests.rs` file. That kept the split mechanically simple, but
it left several focused unit tests far from the modules they exercise.
This PR is a follow-up that moves tests next to the code they cover.
It also adds `tui/src/app/test_support.rs` for shared fixture
construction.
This is just a mechanical refactoring (no functional changes) and does
not affect any production code.
## Why
`debug_clear_memories_resets_state_and_removes_memory_dir` can be flaky
because the test drops its `sqlx::SqlitePool` immediately before
invoking `codex debug clear-memories`. Dropping the pool does not wait
for all SQLite connections to close, so the CLI can race with still-open
test connections.
## What changed
- Await `pool.close()` before spawning `codex debug clear-memories`.
- Close the reopened verification pool before the temp `CODEX_HOME` is
torn down.
## Verification
- `cargo test -p codex-cli --test debug_clear_memories
debug_clear_memories_resets_state_and_removes_memory_dir`
Fixes#17954.
## Why
When a manual shell command like `!sleep 10` is running, submitting
plain text such as `hi` currently sends that text as a steer for the
active shell turn. User shell turns are not steerable like model turns,
so the TUI can remain stuck in `Working` after the shell command
finishes.
## What Changed
- Detect when the only active work is one or more
`ExecCommandSource::UserShell` commands.
- Queue plain submitted input in that state so it drains after the shell
command and shell turn complete.
- Preserve `!cmd` submissions during running work so explicit shell
commands keep their existing behavior.
- Add regression coverage for the `!sleep 10` plus `hi` flow in
`chatwidget::tests::exec_flow::user_message_during_user_shell_command_is_queued_not_steered`.
## Verification
- Manually confirmed hang before the fix and no hang after the fix
## Summary
- wrap OSC 9 notifications in tmux's DCS passthrough so terminal
notifications make it through tmux
- use codex-terminal-detection for OSC 9 auto-selection so tmux sessions
inherit the underlying client terminal support
- add focused notification backend tests for plain OSC 9 and
tmux-wrapped output
## Stack
- base PR: #18479
- review order: #18479, then this PR
## Why
Tmux does not forward OSC 9 notifications directly; the sequence has to
be wrapped in tmux's DCS passthrough envelope. Codex also had local
notification heuristics that could miss supported terminals when running
under tmux, even though codex-terminal-detection already knows how to
attribute tmux sessions to the client terminal.
## Validation
- `just fmt`
- `cargo test -p codex-tui` *(currently blocked by an unrelated existing
compile error in `app-server/src/message_processor.rs:754` referencing
`connection_id` out of scope; not caused by this change)*
Co-authored-by: Codex <noreply@openai.com>
## Summary
- attach the authoritative Codex thread id to MCP tool request
`_meta.threadId` for model-initiated tool calls
- attach the same thread id for manual `mcpServer/tool/call` requests
before invoking the MCP server
- cover both metadata helper behavior and the manual app-server MCP path
in tests
needed because the Rust app-server is the last place that still has
authoritative knowledge of “this model-generated MCP tool call belongs
to conversation/thread X” before the request leaves Codex and reaches
Hoopa. It adds threadId to MCP request metadata in the model-generated
tool-call path, using sess.conversation_id, and also does the same for
the manual mcpServer/tool/call path.
## Test plan
- `cargo test -p codex-core
mcp_tool_call_thread_id_meta_is_added_to_request_meta --lib`
- `cargo test -p codex-app-server
mcp_server_tool_call_returns_tool_result`
Paired Hoopa consumer PR: https://github.com/openai/openai/pull/833263
## Why
Clients need a stable app-server protocol surface for enrolling a local
device key, retrieving its public key, and producing a device-bound
proof.
The protocol reports `protectionClass` explicitly so clients can
distinguish hardware-backed keys from an explicitly allowed OS-protected
fallback. Signing uses a tagged `DeviceKeySignPayload` enum rather than
arbitrary bytes so each signed statement is auditable at the API
boundary.
## What changed
- Added v2 JSON-RPC methods for `device/key/create`,
`device/key/public`, and `device/key/sign`.
- Added request/response types for device-key metadata, SPKI public
keys, protection classes, and ECDSA signatures.
- Added `DeviceKeyProtectionPolicy` with hardware-only default behavior
and an explicit `allow_os_protected_nonextractable` option.
- Added the initial `remoteControlClientConnection` signing payload
variant.
- Regenerated JSON Schema and TypeScript fixtures for app-server
clients.
## Stack
This is PR 1 of 4 in the device-key app-server stack.
## Validation
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
The `test-log` crate is only used by `codex-core` tests, so it does not
need
to be part of the normal `codex-core` dependency graph. Keeping
`test-log` in
`dev-dependencies` removes it from normal `codex-core` builds and keeps
the
production dependency set a little smaller.
Verification:
- `cargo tree -p codex-core --edges normal --invert test-log`
- `cargo check -p codex-core --lib`
- `cargo test -p codex-core --lib`
This add with 2 entry point:
* `reset_git_repository` that takes a directory and set it as a new git
root
* `diff_since_latest_init` this returns the diff for a given directory
since the last `reset_git_repository`
## What changed
This PR makes the default Cargo dev profile use line-tables-only debug
info:
```toml
[profile.dev]
debug = 1
```
That keeps useful backtraces while avoiding the cost of full variable
debug
info in normal local dev builds.
This also makes the Bazel CI setting explicit with `-Cdebuginfo=0` for
target
and exec-configuration Rust actions. Bazel/rules_rust does not read
Cargo
profiles for this setting, and the current fastbuild action already
emitted
`--codegen=debuginfo=0`; the Bazel part of this PR makes that choice
direct in
our build configuration.
## Why
The slow codex-core rebuilds are dominated by debug-info codegen, not
parsing
or type checking. On a warm-dependency package rebuild, the baseline
codex-core compile was about 39.5s wall / 38.9s rustc total, with
codegen_crate around 14.0s and LLVM_passes around 13.4s. Setting
codex-core
to line-tables-only debug info brought that to about 27.2s wall / 26.7s
rustc
total, with codegen_crate around 3.1s and LLVM_passes around 2.8s.
`debug = 0` was only about another 0.7s faster than `debug = 1` in the
codex-core measurement, so `debug = 1` is the better default dev
tradeoff: it
captures nearly all of the compile-time win while preserving basic
debuggability.
I also sampled other first-party crates instead of keeping a
codex-core-only
package override. codex-app-server showed the same pattern: rustc total
dropped from 15.85s to 10.48s, while codegen_crate plus LLVM_passes
dropped
from about 13.47s to 3.23s. codex-app-server-protocol had a smaller but
still
real improvement, 16.05s to 14.58s total, and smaller crates showed
modest
wins. That points to a workspace dev-profile policy rather than a
hand-maintained list of large crates.
## Relationship to #18612
[#18612](https://github.com/openai/codex/pull/18612) added the
`dev-small`
profile. That remains useful when someone wants a working local build
quickly
and is willing to opt in with `cargo build --profile dev-small`.
This PR is deliberately less aggressive: it changes the common default
dev
profile while preserving line tables/backtraces. `dev-small` remains the
explicit "build quickly, no debuggability concern" path.
## Other investigation
I looked for another structural win comparable to
[#16631](https://github.com/openai/codex/pull/16631) and
[#16630](https://github.com/openai/codex/pull/16630), but did not find
one.
The attempted TOML monomorphization changes were noisy or worse in
measurement, and the async task changes reduced some instantiations but
only
translated to roughly a one-second improvement while being much more
disruptive. The debug-info setting was the one repeatable, material win
that
survived measurement.
## Verification
- `just bazel-lock-update`
- `just bazel-lock-check`
- `cargo check -p codex-core --lib`
- `cargo test -p codex-core --lib`
- Bazel `aquery --config=ci-linux` confirmed `--codegen=debuginfo=0` and
`-Cdebuginfo=0` for `//codex-rs/core:core`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18844).
* #18846
* __->__ #18844
## Summary
- Move external agent config migration logic and tests from `codex-core`
into `app-server/src/config`.
- Keep the migration service crate-private to app-server and update the
API adapter imports.
- Remove stale core re-exports and expose only the needed marketplace
source helper.
## Testing
- `cargo test -p codex-app-server config::external_agent_config`
- `just fmt`
- `just fix -p codex-app-server`
- `just fix -p codex-core`
- `git diff --check`
Fixes https://github.com/openai/codex/issues/13638
## Why
VS Code's integrated terminal can run a Linux shell through WSL without
exposing `TERM_PROGRAM` to the Linux process, and with crossterm
keyboard enhancement flags enabled that environment can turn dead-key
composition into malformed key events instead of composed Unicode input.
Codex already handles composed Unicode correctly, so the fix is to avoid
enabling the terminal mode that breaks this path for the affected
terminal combination.
## What Changed
- Automatically skip crossterm keyboard enhancement flags when Codex
detects WSL plus VS Code, including a Windows-side `TERM_PROGRAM` probe
through WSL interop.
- Add `CODEX_TUI_DISABLE_KEYBOARD_ENHANCEMENT` so users can
force-disable or force-enable the keyboard enhancement policy for
diagnosis.
## Verification
- Added unit coverage for env parsing, VS Code detection, and the WSL/VS
Code auto-disable policy.
- `cargo check -p codex-tui` passed.
- `./tools/argument-comment-lint/run.py -p codex-tui -- --tests` passed.
- `cargo test -p codex-tui` was attempted locally, but the checkout
failed during linking before tests executed because V8 symbols from
`codex-code-mode` were unresolved for `arm64`.
## What
- Explicitly show our "bash mode" by changing the color and adding a
callout similar to how we do for `Plan mode (shift + tab to cycle)`
- Also replace our `›` composer prefix with a bang `!`

## Why
- It was unclear that we had a Bash mode
- This feels more responsive
- It looks cool!
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
This updates the code review orchestrator skill wording so the
instruction explicitly requires returning every issue from every
subagent.
## Impact
The change is limited to `.codex/skills/code-review/SKILL.md` and
clarifies review aggregation behavior for future Codex-driven reviews.
## Validation
No tests were run because this is a markdown-only skill wording change.
Deferred dynamic tools need to round-trip a namespace so a tool returned
by `tool_search` can be called through the same registry key that core
uses for dispatch.
This change adds namespace support for dynamic tool specs/calls,
persists it through app-server thread state, and routes dynamic tool
calls by full `ToolName` while still sending the app the leaf tool name.
Deferred dynamic tools must provide a namespace; non-deferred dynamic
tools may remain top-level.
It also introduces `LoadableToolSpec` as the shared
function-or-namespace Responses shape used by both `tool_search` output
and dynamic tool registration, so dynamic tools use the same wrapping
logic in both paths.
Validation:
- `cargo test -p codex-tools`
- `cargo test -p codex-core tool_search`
---------
Co-authored-by: Sayan Sisodiya <sayan@openai.com>
Follow-up to https://github.com/openai/codex/pull/18178, where we said
the await-holding clippy rule would be enabled separately.
Enable `await_holding_lock` and `await_holding_invalid_type` after the
preceding commits fixed or explicitly documented the current offenders.
## Why
This PR prepares the stack to enable Clippy await-holding lints that
were left disabled in #18178. The mechanical lock-scope cleanup is
handled separately; this PR is the documentation/configuration layer for
the remaining await-across-guard sites.
Without explicit annotations, reviewers and future maintainers cannot
tell whether an await-holding warning is a real concurrency smell or an
intentional serialization boundary.
## What changed
- Configures `clippy.toml` so `await_holding_invalid_type` also covers
`tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`.
- Adds targeted `#[expect(clippy::await_holding_invalid_type, reason =
...)]` annotations for intentional async guard lifetimes.
- Documents the main categories of intentional cases: active-turn state
transitions that must remain atomic, session-owned MCP manager accesses,
remote-control websocket serialization, JS REPL kernel/process
serialization, OAuth persistence, external bearer token refresh
serialization, and tests that intentionally serialize shared global or
session-owned state.
- For external bearer token refresh, documents the existing
serialization boundary: holding `cached_token` across the provider
command prevents concurrent cache misses from starting duplicate refresh
commands, and the current behavior is small enough that an explicit
expectation is easier to maintain than adding another synchronization
primitive.
## Verification
- `cargo clippy -p codex-login --all-targets`
- `cargo clippy -p codex-connectors --all-targets`
- `cargo clippy -p codex-core --all-targets`
- The follow-up PR #18698 enables `await_holding_invalid_type` and
`await_holding_lock` as workspace `deny` lints, so any undocumented
remaining offender will fail Clippy.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423).
* #18698
* __->__ #18423
## Why
Customers need finer-grained control over allowed sandbox modes based on
the host Codex is running on. For example, they may want stricter
sandbox limits on devboxes while keeping a different default elsewhere.
Our current cloud requirements can target user/account groups, but they
cannot vary sandbox requirements by host. That makes remote development
environments awkward because the same top-level `allowed_sandbox_modes`
has to apply everywhere.
## What
Adds a new `remote_sandbox_config` section to `requirements.toml`:
```toml
allowed_sandbox_modes = ["read-only"]
[[remote_sandbox_config]]
hostname_patterns = ["*.org"]
allowed_sandbox_modes = ["read-only", "workspace-write"]
[[remote_sandbox_config]]
hostname_patterns = ["*.sh", "runner-*.ci"]
allowed_sandbox_modes = ["read-only", "danger-full-access"]
```
During requirements resolution, Codex resolves the local host name once,
preferring the machine FQDN when available and falling back to the
cleaned kernel hostname. This host classification is best effort rather
than authenticated device proof.
Each requirements source applies its first matching
`remote_sandbox_config` entry before it is merged with other sources.
The shared merge helper keeps that `apply_remote_sandbox_config` step
paired with requirements merging so new requirements sources do not have
to remember the extra call.
That preserves source precedence: a lower-precedence requirements file
with a matching `remote_sandbox_config` cannot override a
higher-precedence source that already set `allowed_sandbox_modes`.
This also wires the hostname-aware resolution through app-server,
CLI/TUI config loading, config API reads, and config layer metadata so
they all evaluate remote sandbox requirements consistently.
## Verification
- `cargo test -p codex-config remote_sandbox_config`
- `cargo test -p codex-config host_name`
- `cargo test -p codex-core
load_config_layers_applies_matching_remote_sandbox_config`
- `cargo test -p codex-core
system_remote_sandbox_config_keeps_cloud_sandbox_modes`
- `cargo test -p codex-config`
- `cargo test -p codex-core` unit tests passed; `tests/all.rs`
integration matrix was intentionally stopped after the relevant focused
tests passed
- `just fix -p codex-config`
- `just fix -p codex-core`
- `cargo check -p codex-app-server`
## Summary
When auto-review is enabled, it should handle request_permissions tool.
We'll need to clean up the UX but I'm planning to do that in a separate
pass
## Testing
- [x] Ran locally
<img width="893" height="396" alt="Screenshot 2026-04-17 at 1 16 13 PM"
src="https://github.com/user-attachments/assets/4c045c5f-1138-4c6c-ac6e-2cb6be4514d8"
/>
---------
Co-authored-by: Codex <noreply@openai.com>
This updates TUI skill mentions to show a fallback label when a skill
does not define a display name, so unnamed skills remain understandable
in the picker without changing behavior for skills that already have
one.
<img width="1028" height="198" alt="Screenshot 2026-04-20 at 6 25 15 PM"
src="https://github.com/user-attachments/assets/84077b85-99d0-4db9-b533-37e1887b4506"
/>
## Summary
Making thread id optional so that we can better cache resources for MCPs
for connectors since their resource templates is universal and not
particular to projects.
- Make `mcpServer/resource/read` accept an optional `threadId`
- Read resources from the current MCP config when no thread is supplied
- Keep the existing thread-scoped path when `threadId` is present
- Update the generated schemas, README, and integration coverage
## Testing
- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-mcp`
- `cargo test -p codex-app-server --test all mcp_resource`
- `just fix -p codex-mcp`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
## Why
#18274 made `PermissionProfile` the canonical file-system permissions
shape, but the round-trip from `FileSystemSandboxPolicy` to
`PermissionProfile` still dropped one piece of policy metadata:
`glob_scan_max_depth`.
That field is security-relevant for deny-read globs such as `**/*.env`.
On Linux, bubblewrap sandbox construction uses it to bound unreadable
glob expansion. If a profile copied from active runtime permissions
loses this value and is submitted back as an override, the resulting
`FileSystemSandboxPolicy` can behave differently even though the visible
permission entries look equivalent.
## What changed
- Add `glob_scan_max_depth` to protocol `FileSystemPermissions` and
preserve it when converting to/from `FileSystemSandboxPolicy`.
- Keep legacy `read`/`write` JSON for simple path-only permissions, but
force canonical JSON when glob scan depth is present so the metadata is
not silently dropped.
- Carry `globScanMaxDepth` through app-server
`AdditionalFileSystemPermissions`, generated JSON/TypeScript schemas,
and app-server/TUI conversion call sites.
- Preserve the metadata through sandboxing permission normalization,
merging, and intersection.
- Carry the merged scan depth into the effective
`FileSystemSandboxPolicy` used for command execution, so bounded
deny-read globs reach Linux bubblewrap materialization.
## Verification
- `cargo test -p codex-sandboxing glob_scan -- --nocapture`
- `cargo test -p codex-sandboxing policy_transforms -- --nocapture`
- `just fix -p codex-sandboxing`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18713).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* #18277
* #18276
* #18275
* __->__ #18713
## Why
This is part of the follow-up work from #18178 to make Codex ready for
Clippy's
[`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
/
[`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
lints.
This bottom PR keeps the scope intentionally small:
`NetworkProxyState::record_blocked()` only needs the state write lock
while it mutates the blocked-request ring buffer and counters. The debug
log payload and `BlockedRequestObserver` callback can be produced after
that lock is released.
## What changed
- Copies the blocked-request snapshot values needed for logging while
updating the state.
- Releases the `RwLockWriteGuard` before logging or notifying the
observer.
## Verification
- `cargo test -p codex-network-proxy`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18418).
* #18698
* #18423
* __->__ #18418
### Why
Remote streamable HTTP MCP needs a transport-shaped executor primitive
before the MCP client can move network I/O to the executor. This layer
keeps the executor unaware of MCP and gives later PRs an ordered
streaming surface for response bodies.
### What
- Add typed `http/request` and `http/request/bodyDelta` protocol
payloads.
- Add executor client helpers for buffered and streamed HTTP responses.
- Route body-delta notifications to request-scoped streams with sequence
validation and cleanup when a stream finishes or is dropped.
- Document the new protocol constants, transport structs, public client
methods, body-stream lifecycle, and request-scoped routing helpers.
- Add in-memory JSON-RPC client coverage for streamed HTTP response-body
notifications, with comments spelling out what the test proves and each
setup/exercise/assert phase.
### Stack
1. #18581 protocol
2. #18582 runner
3. #18583 RMCP client
4. #18584 manager wiring and local/remote coverage
### Verification
- `just fmt`
- `cargo check -p codex-exec-server -p codex-rmcp-client --tests`
- `cargo check -p codex-core --test all` compile-only
- `git diff --check`
- Online full CI is running from the `full-ci` branch, including the
remote Rust test job.
Co-authored-by: Codex <noreply@openai.com>
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Codex needs a first-class `amazon-bedrock` model provider so users can
select Bedrock without copying a full provider definition into
`config.toml`. The provider has Codex-owned defaults for the pieces that
should stay consistent across users: the display `name`, Bedrock
`base_url`, and `wire_api`.
At the same time, users still need a way to choose the AWS credential
profile used by their local environment. This change makes
`amazon-bedrock` a partially modifiable built-in provider: code owns the
provider identity and endpoint defaults, while user config can set
`model_providers.amazon-bedrock.aws.profile`.
For example:
```toml
model_provider = "amazon-bedrock"
[model_providers.amazon-bedrock.aws]
profile = "codex-bedrock"
```
## What Changed
- Added `amazon-bedrock` to the built-in model provider map with:
- `name = "Amazon Bedrock"`
- `base_url = "https://bedrock-mantle.us-east-1.api.aws/v1"`
- `wire_api = "responses"`
- Added AWS provider auth config with a profile-only shape:
`model_providers.<id>.aws.profile`.
- Kept AWS auth config restricted to `amazon-bedrock`; custom providers
that set `aws` are rejected.
- Allowed `model_providers.amazon-bedrock` through reserved-provider
validation so it can act as a partial override.
- During config loading, only `aws.profile` is copied from the
user-provided `amazon-bedrock` entry onto the built-in provider. Other
Bedrock provider fields remain hard-coded by the built-in definition.
- Updated the generated config schema for the new provider AWS profile
config.
This PR makes the `/statusline` and `/title` setup UIs share one
preview-value source instead of each surface using its own examples.
Both pickers now render consistent live values when available, and
stable placeholders when they are not. It also resolves live preview
values at the shared preview-item layer, so `/title` preview can use
real runtime values for title-specific cases like status text, task
progress, and project-name fallback behavior.
- Adds a shared preview data model for status surfaces
- Maps status-line items and terminal-title items onto that shared
preview list
- Feeds both setup views from the same chatwidget-derived preview data,
with terminal-title-specific formatting applied before `/title` preview
renders
- Keeps project-root preview aligned with status-line behavior while
project in /title keeps its title fallback/truncation behavior
- Adds snapshot coverage for live-only, hardcoded-only, and mixed cases
Test Steps
- Open Codex TUI and launch `/statusline`.
- Toggle and reorder items, then verify the preview uses current session
values when possible, and placeholder values for missing values (ex: no
thread ID).
- Open `/title` and verify it shows the same normalized values,
including live status/task-progress values when available.
## Summary
- Track how many realtime transcript entries have already been attached
to a background-agent handoff.
- Attach only entries added since the previous handoff as
`<transcript_delta>` instead of resending the accumulated transcript
snapshot.
- Update the realtime integration test so the second delegation carries
only the second transcript delta.
## Validation
- `just fmt`
- `cargo test -p codex-api`
- `cargo test -p codex-core
inbound_handoff_request_sends_transcript_delta_after_each_handoff`
- `cargo build -p codex-cli -p codex-app-server`
## Manual testing
Built local debug binaries at:
- `codex-rs/target/debug/codex`
- `codex-rs/target/debug/codex-app-server`
Addresses #18505
## Summary
When Codex is launched from a subdirectory of a Git repository, the
onboarding trust prompt says it is trusting the current directory even
though the persisted trust target is the repository root. That can make
the scope of the trust decision unclear.
This updates the TUI trust prompt to show a yellow note only when the
current directory differs from the resolved trust target, explaining
that trust applies to the repository root and displaying that root. It
also removes the stale onboarding TODO now that the warning is
implemented.
## Summary
This fixes a stale-environment path in shell snapshot restoration. A
sandboxed command can source a shell snapshot that was captured while an
older proxy process was running. If that proxy has died and come back on
a different port, the snapshot can otherwise put old proxy values back
into the command environment, which is how tools like `pip` end up
talking to a dead proxy.
The wrapper now captures the live process environment before sourcing
the snapshot and then restores or clears every proxy env var from the
proxy crate's canonical list. That makes proxy state after shell
snapshot restoration match the current command environment, rather than
whatever proxy values happened to be present in the snapshot. On macOS,
the Codex-generated `GIT_SSH_COMMAND` is refreshed when the SOCKS
listener changes, while custom SSH wrappers are still left alone.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Reject new exec-server client operations once the transport has
disconnected.
- Convert pending RPC calls into closed errors instead of synthetic
server errors.
- Cover pending read and later write behavior after remote executor
disconnect.
## Verification
- `just fmt`
- `cargo check -p codex-exec-server`
## Stack
```text
@ #18027 [6/6] Fail exec client operations after disconnect
│
o #18212 [5/6] Wire executor-backed MCP stdio
│
o #18087 [4/6] Abstract MCP stdio server launching
│
o #18020 [3/6] Add pushed exec process events
│
o #18086 [2/6] Support piped stdin in exec process API
│
o #18085 [1/6] Add MCP server environment config
│
o main
```
---------
Co-authored-by: Codex <noreply@openai.com>
Addresses #18113
Problem: Shared flags provided before the exec subcommand were parsed by
the root CLI but not inherited by the exec CLI, so exec sessions could
run with stale or default sandbox and model configuration.
Solution: Move shared TUI and exec flags into a common option block and
merge root selections into exec before dispatch, while preserving exec's
global subcommand flag behavior.
## Why
The TUI app module had grown past the 512K source-file cap enforced by
CI/CD. This keeps the app entry point below that limit while preserving
the existing runtime behavior and test surface.
## What changed
- Kept the top-level `App` state and run-loop wiring in
`tui/src/app.rs`.
- Split app responsibilities into focused private submodules under
`tui/src/app/`, covering event dispatch, thread routing, session
lifecycle, config persistence, background requests, startup prompts,
input, history UI, platform actions, and thread event buffering.
- Moved the existing app-level tests into `tui/src/app/tests.rs` and
reused the existing snapshot location rather than adding new tests or
snapshots.
- Added module header comments for `app.rs` and the new submodules.
## Follow-up
A future cleanup can move narrow unit tests from `tui/src/app/tests.rs`
into the specific app submodules they exercise. This PR keeps the
existing app-level tests together so the refactor stays focused on the
source-file split.
## Verification
- `cargo test -p codex-tui --lib
app::tests::agent_picker_item_name_snapshot`
- `cargo test -p codex-tui --lib app::tests::clear_ui`
- `cargo test -p codex-tui --lib
app::tests::ctrl_l_clear_ui_after_long_transcript_reuses_clear_header_snapshot`
- `just fix -p codex-tui`
Full `cargo test -p codex-tui` still fails on model-catalog drift
unrelated to this refactor, including stale
`gpt-5.3-codex`/`gpt-5.1-codex` snapshot and migration expectations now
resolving to `gpt-5.4`.
## Why
Cloud-hosted sessions need a way for the service that starts or manages
a thread to provide session-owned config without treating all config as
if it came from the same user/project/workspace TOML stack.
The important boundary is ownership: some values should be controlled by
the session/orchestrator, some by the authenticated user, and later some
may come from the executor. The earlier broad config-store shape made
that boundary too fuzzy and overlapped heavily with the existing
filesystem-backed config loader. This PR starts with the smaller piece
we need now: a typed session config loader that can feed the existing
config layer stack while preserving the normal precedence and merge
behavior.
## What Changed
- Added `ThreadConfigLoader` and related typed payloads in
`codex-config`.
- `SessionThreadConfig` currently supports `model_provider`,
`model_providers`, and feature flags.
- `UserThreadConfig` is present as an ownership boundary, but does not
yet add TOML-backed fields.
- `NoopThreadConfigLoader` preserves existing behavior when no external
loader is configured.
- `StaticThreadConfigLoader` supports tests and simple callers.
- Taught thread config sources to produce ordinary `ConfigLayerEntry`
values so the existing `ConfigLayerStack` remains the place where
precedence and merging happen.
- Wired the loader through `ConfigBuilder`, the config loader, and
app-server startup paths so app-server can provide session-owned config
before deriving a thread config.
- Added coverage for:
- translating typed thread config into config layers,
- inserting thread config layers into the stack at the right precedence,
- applying session-provided model provider and feature settings when
app-server derives config from thread params.
## Follow-Ups
This intentionally stops short of adding the remote/service transport.
The next pieces are expected to be:
1. Define the proto/API shape for this interface.
2. Add a client implementation that can source session config from the
service side.
## Verification
- Added unit coverage in `codex-config` for the loader and layer
conversion.
- Added `codex-core` config loader coverage for thread config layer
precedence.
- Added app-server coverage that verifies session thread config wins
over request-provided config for model provider and feature settings.
## Summary
- add a codex-uds crate with async UnixListener and UnixStream wrappers
- expose helpers for private socket directory setup and stale socket
path checks
- migrate codex-stdio-to-uds onto codex-uds and Tokio-based stdio/socket
relaying
- update the CLI stdio-to-uds command path for the async runner
## Tests
- cargo test -p codex-uds -p codex-stdio-to-uds
- cargo test -p codex-cli
- just fmt
- just fix -p codex-uds
- just fix -p codex-stdio-to-uds
- just fix -p codex-cli
- just bazel-lock-check
- git diff --check
## Summary
Adds a second realtime v2 function tool, `remain_silent`, so the
realtime model has an explicit non-speaking action when the
collaboration mode or latest context says it should not answer aloud.
This is stacked on #18597.
## Design
- Advertise `remain_silent` alongside `background_agent` in realtime v2
conversational sessions.
- Parse `remain_silent` function calls into a typed
`RealtimeEvent::NoopRequested` event.
- Have core answer that function call with an empty
`function_call_output` and deliberately avoid `response.create`, so no
follow-up realtime response is requested.
- Keep the event hidden from app-server/TUI surfaces; it is operational
plumbing, not user-visible conversation content.
Migrate the conversation summary App Server methods to ThreadStore
Because this app server api allows explicitly fetching the thread by
rollout path, intercept that case in the app server code and (a) route
directly to underlying local thread store methods if we're using a local
thread store, or (b) throw an unsupported error if we're using a remote
thread store. This keeps the thread store API clean and all filesystem
operations inside of the local thread store, which pushing the
"fundamental incompatibility" check as early as possible.
## Problem
The TUI resolved fork parent titles from local CODEX_HOME metadata,
which could show missing or stale titles when app-server metadata is
authoritative.
This is a lingering bug left over from the migration of the TUI to the
app-server interface. I found it when I asked Codex to review all places
where the TUI code was still directly accessing the local CODEX_HOME.
## Solution
Route fork parent title metadata through the app-server session state
and render only that supplied title, with focused snapshot coverage for
stale local metadata.
## Testing
I manually tested by renaming a thread then forking it and confirming
that the "forked from" message indicated the parent thread's name.
This updates the spawn-agent tool contract so subagents are presented as
inheriting the parent model by default. The visible model list is now
framed as optional overrides, the model parameter tells callers to leave
it unset and the delegation guidance no longer nudges models toward
picking a smaller/mini override.
Fixes reports that 5.4 would occasionally pick 5.2 or lower as
sub-agents.
## Why
Fixes#18718.
After rewinding a thread, `/copy` could still copy the latest assistant
response from before the rewind. The transcript cells were rolled back,
but the copy source was a single `last_agent_markdown` cache that was
not synchronized with backtracking, so the visible conversation and
copied content could diverge.
## What changed
`ChatWidget` now keeps a bounded copy history for the most recent 32
assistant responses, keyed by the visible user-turn count. When local
rollback trims transcript cells, the copy cache is trimmed to the same
surviving user-turn count so `/copy` uses the latest visible assistant
response.
If the user rewinds past the retained copy window, `/copy` now reports:
```text
Cannot copy that response after rewinding. Only the most recent 32 responses are available to /copy.
```
The change also adds coverage for copying the latest surviving response
after rollback and for the over-limit rewind message.
## Verification
- Manually resumed a synthetic 35-turn session, rewound within the
retained window, and verified `/copy` copied the surviving response.
- Manually rewound past the retained window and verified `/copy` showed
the 32-response limit message.
- `cargo test -p codex-tui slash_copy`
- `just fix -p codex-tui`
- `cargo insta pending-snapshots`
Note: `cargo test -p codex-tui` currently fails on unrelated model
catalog and snapshot drift around the default model changing to
`gpt-5.4`; the focused `/copy` tests pass after fixing the new test
setup.
Fixes stale test fixtures left after the active bundled model catalog
updates in #18586 and #18388. Those changes made `gpt-5.4` the current
default and removed several older hardcoded slugs, which left Windows
Bazel shards failing TUI and config tests.
What changed:
- Refresh TUI model migration, availability NUX, plan-mode, status, and
snapshot fixtures to use active bundled model slugs.
- Update the config edit test expectation for the TOML-quoted
`"gpt-5.2"` migration key.
- Move the model catalog tests into
`codex-rs/tui/src/app/tests/model_catalog.rs` so touching them does not
trip the blob-size policy for `app.rs`.
Verification:
- CI Bazel/lint checks are expected to cover the affected test shards.
## Why
`skills/list` refreshes are best-effort metadata updates. If one fails
during startup or thread switching, the TUI should keep running and show
enough detail to diagnose the app-server failure instead of leaving the
user with only a log entry.
This addresses the recoverability and observability issue reported in
#16914.
## What Changed
- Preserve the full startup `skills/list` error chain before sending it
back through the app event queue.
- Surface failed skills refreshes as recoverable TUI error messages
while still logging the warning.
This is related to the recent bug fix from [PR
#18370](https://github.com/openai/codex/pull/18370).
## Summary
This PR aims to improve integration between the realtime model and the
codex agent by sharing more context with each other. In particular, we
now share full realtime conversation transcript deltas in addition to
the delegation message.
realtime_conversation.rs now turns a handoff into:
```
<realtime_delegation>
<input>...</input>
<transcript_delta>...</transcript_delta>
</realtime_delegation>
```
## Implementation notes
The transcript is accumulated in the realtime websocket layer as parsed
realtime events arrive. When a background-agent handoff is requested,
the current transcript snapshot is copied onto the handoff event and
then serialized by `realtime_conversation.rs` into the hidden realtime
delegation envelope that Codex receives as user-turn context.
For Realtime V2, the session now explicitly enables input audio
transcription, and the parser handles the relevant input/output
transcript completion events so the snapshot includes both user speech
and realtime model responses. The delegation `<input>` remains the
actual handoff request, while `<transcript_delta>` carries the
surrounding conversation history for context.
Reviewers should note that the transcript payload is intended for Codex
context sharing, not UI rendering. The realtime delegation envelope
should stay hidden from the user-facing transcript surface, while still
being included in the background-agent turn so Codex can answer with the
same conversational context the realtime model had.
## Why
Guardian review analytics needs a Rust event shape that matches the
backend schema while avoiding unnecessary PII exposure from reviewed
tool calls. This PR narrows the analytics payload to the fields we
intend to emit and keeps shared Guardian assessment enums in protocol
instead of duplicating equivalent analytics-only enums.
## What changed
- Uses protocol Guardian enums directly for `risk_level`,
`user_authorization`, `outcome`, and command source values.
- Removes high-risk reviewed-action fields from the analytics payload,
including raw commands, display strings, working directories, file
paths, network targets/hosts, justification text, retry reason, and
rationale text.
- Makes `target_item_id` and `tool_call_count` nullable so the Codex
event can represent cases where the app-server protocol or producer does
not have those values.
- Keeps lower-risk structured reviewed-action metadata such as sandbox
permissions, permission profile, `tty`, `execve` source/program, network
protocol/port, and MCP connector/tool labels.
- Adds an analytics reducer/client test covering `codex_guardian_review`
serialization with an optional `target_item_id` and absent removed
fields.
## Verification
- `cargo test -p codex-analytics
guardian_review_event_ingests_custom_fact_with_optional_target_item`
- `cargo fmt --check`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17692).
* #17696
* #17695
* #17693
* __->__ #17692
## Summary
- Pin vulnerable npm dependencies through the existing root
`resolutions` mechanism so the lockfile moves only to patched versions.
- Refresh `pnpm-lock.yaml` for `@modelcontextprotocol/sdk`,
`handlebars`, `path-to-regexp`, `picomatch`, `minimatch`, `flatted`,
`rollup`, and `glob`.
- Bump `quinn-proto` from `0.11.13` to `0.11.14` and refresh
`MODULE.bazel.lock`.
## Testing
- `corepack pnpm --store-dir .pnpm-store install --frozen-lockfile
--ignore-scripts`
- `corepack pnpm audit --audit-level high` (passes; remaining advisories
are low/moderate)
- `corepack pnpm -r --filter ./sdk/typescript run build`
- `corepack pnpm exec eslint 'src/**/*.ts' 'tests/**/*.ts'`
- `cargo check --locked`
- `cargo build -p codex-cli`
- `bazel --output_user_root=/tmp/bazel-codex-dependabot
--ignore_all_rc_files mod deps --lockfile_mode=error`
- `just fmt`
Note: `corepack pnpm -r --filter ./sdk/typescript run test` was also
attempted after building `codex`; it is blocked on this workstation by
host-managed Codex MDM/auth state (`approval_policy` restrictions and
ChatGPT/API-key mismatch), not by this dependency change.
## Summary
- Add the missing `background_task_id: None` field to the
`AgentIdentityAuthRecord` fixture introduced in `auth_tests.rs`.
## Why
- Current `main` fails Bazel/rust-ci compile paths after the
background-task auth field landed and a later auth test fixture
constructed `AgentIdentityAuthRecord` without that new field.
- I intentionally removed the earlier broader CI-stability edits from
this PR. The code-mode timeout, external-agent migration snapshot, and
MCP resource timeout failures appear to be general/flaky or unrelated to
the agent identity merge stack rather than cleanly caused by it.
## Validation
- `cargo test -p codex-login
dummy_chatgpt_auth_does_not_create_cwd_auth_json_when_identity_is_set --
--nocapture`
- `just fmt`
## Problem
The TUI still imported path utilities and config-loader symbols through
app-server-client's legacy_core facade even though those APIs already
exist in utility/config crates. This is part of our ongoing effort to
whittle away at these old dependencies.
## Solution
Rewire imports to avoid the TUI directly importing from the core crate
and instead import from common lower-level crates. This PR doesn't
include any functional changes; it's just a simple rewiring.
Wires patch_updated events through app_server. These events are parsed
and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer.
The eventual goal is to use this to display better progress indicators in
the codex app.
- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.
This is the second cleanup in the await-holding lint stack. The
higher-level goal, following https://github.com/openai/codex/pull/18178
and https://github.com/openai/codex/pull/18398, is to enable Clippy
coverage for guards held across `.await` points without carrying broad
suppressions.
The stack is working toward enabling Clippy's
[`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
lint and the configurable
[`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
lint for Tokio guard types.
Several existing fields used `tokio::sync::Mutex<()>` only as
one-at-a-time async gates. Those guards intentionally lived across
`.await` while an operation was serialized. A mutex over `()` suggests
protected data and trips the await-holding lint shape; a single-permit
`tokio::sync::Semaphore` expresses the intended serialization directly.
## What changed
- Replace `Mutex<()>` serialization gates with `Semaphore::new(1)` for
agent identity ensure, exec policy updates, guardian review session
reuse, plugin remote sync, managed network proxy refresh, auth token
refresh, and RMCP session recovery.
- Update call sites from `lock().await` / `try_lock()` to
`acquire().await` / `try_acquire()`.
- Map closed-semaphore errors into the existing local error types, even
though these semaphores are owned for the lifetime of their managers.
- Update session test builders for the new
`managed_network_proxy_refresh_lock` type.
## Verification
- The split stack was verified at the final lint-enabling head with
`just clippy`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18403).
* #18698
* #18423
* #18418
* __->__ #18403
## Why
`PermissionProfile` needs stable, canonical file-system semantics before
it can become the primary runtime permissions abstraction. Without a
canonical form, callers have to keep re-deriving legacy sandbox maps and
profile comparisons remain lossy or order-dependent.
## What changed
This adds canonicalization helpers for `FileSystemPermissions` and
`PermissionProfile`, expands special paths into explicit sandbox
entries, and updates permission request/conversion paths to consume
those canonical entries. It also tightens the legacy bridge so root-wide
write profiles with narrower carveouts are not silently projected as
full-disk legacy access.
## Verification
- `cargo test -p codex-protocol
root_write_with_read_only_child_is_not_full_disk_write -- --nocapture`
- `cargo test -p codex-sandboxing permission -- --nocapture`
- `cargo test -p codex-tui permissions -- --nocapture`
- Migrates unloaded `thread/name/set` and `thread/memoryModeSet`
app-server writes behind the generic
`ThreadStore::update_thread_metadata` API rather than adding one-off
store methods for setting thread name or memory mode.
- Implements the local ThreadStore metadata patch path for thread name
and memory mode, including rollout append, legacy name index updates,
SessionMeta validation/update, SQLite reconciliation, and re-reading the
stored thread.
- Adds focused local thread-store unit coverage plus app-server
integration coverage for the migrated unloaded write paths.
## Summary
Side conversations can hide important state changes from the parent
conversation while the user is focused on the side thread. In
particular, the parent may finish, fail, need user input, or require an
approval while the side conversation remains visible. Users need a
lightweight signal for those states, but parent approval overlays should
not interrupt the side conversation itself.
This change adds parent-conversation status to the side conversation
context label and defers parent interactive overlays while side mode is
active. When the user exits side mode, pending parent approvals and
input requests are restored in the main thread. The pending approval
footer avoids duplicating the same parent approval status, and replayed
notice cells are filtered when restoring a pending interactive request
so tips or warnings do not crowd out the approval prompt.
The change is contained to the TUI side-conversation and thread replay
paths.
Example 1: Approval pending
<img width="752" height="35" alt="Screenshot 2026-04-19 at 12 56 07 PM"
src="https://github.com/user-attachments/assets/1cc0f1a3-9cab-4d60-aed2-96523ccafc20"
/>
Example 2: Turn complete
<img width="754" height="35" alt="Screenshot 2026-04-19 at 12 56 27 PM"
src="https://github.com/user-attachments/assets/653521a5-e298-4366-ae1c-72b56eb88eeb"
/>
## Problem
The TUI resume/fork picker was backfilling thread names from local
rollout indexes. This was left over from before the TUI was moved to the
app server. It should be using app-server APIs because the TUI might be
connected to a remote connection.
This bug wasn't (yet) reported by a user. I found it by asking Codex to
review places in the TUI code where it was still directly accessing the
CODEX_HOME directory rather than going through app-server APIs.
## Solution
The resume picker and session lookups should use app-server thread APIs
only. Remove legacy rollout name/list backfills, and avoid local name
reads in fork history.
## Testing
I manually tested `codex resume` and `codex resume --all` to look for
functional or performance regressions in the resume picker.
Fixes#18539.
## Summary
The recent `/mcp` performance work kept the default command fast by
avoiding resource and resource-template inventory probes, but it also
removed useful diagnostics for users trying to confirm MCP server state.
This keeps bare `/mcp` on the fast tools/auth path and adds `/mcp
verbose` for the slower diagnostic view. Verbose mode requests full MCP
server status from the app-server and restores status, resources, and
resource templates in the TUI output.
## Testing
In addition to running automation, I manually tested the feature to
confirm that it works.
## Why
Fresh app-server thread startup can create a shell snapshot through a
temp file and then promote it to the final snapshot path. The previous
implementation briefly wrapped the temp path in `ShellSnapshot`, so
after a successful rename its `Drop` attempted to delete the old temp
path and could log a false `ENOENT` warning.
Fixes#17549.
## What changed
- Validate the temp snapshot path directly before promotion.
- Rename the temp path directly to the final snapshot path.
- Keep explicit cleanup of the temp path on validation or finalization
failures.
## Summary
Introduces a single background/control-plane agent task for ChatGPT
backend requests that do not have a thread-scoped task, with
`AuthManager` owning the default ChatGPT backend authorization decision.
Callers now ask `AuthManager` for the default ChatGPT backend
authorization header. `AuthManager` decides whether that is bearer or
background AgentAssertion based on config/internal state, while
low-level bootstrap paths can explicitly request bearer-only auth.
This PR is stacked on PR4 and focuses on the shared background task auth
plumbing plus the first tranche of backend/control-plane consumers. The
remaining callsite wiring is split into PR4.2 to keep review size down.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
when enabled
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: https://github.com/openai/codex/pull/17980 - use task-scoped
`AgentAssertion` for downstream calls
- PR4.1: this PR - introduce AuthManager-owned background/control-plane
`AgentAssertion` auth
- PR4.2: https://github.com/openai/codex/pull/18260 - use background
task auth for additional backend/control-plane calls
## What Changed
- add background task registration and assertion minting inside
`codex-login`
- persist `agent_identity.background_task_id` separately from
per-session task state
- make `BackgroundAgentTaskManager` private to `codex-login`; call sites
do not instantiate or pass it around
- teach `AuthManager` the ChatGPT backend base URL and feature-derived
background auth mode from resolved config
- expose bearer-only helpers for bootstrap/registration/refresh-style
paths that must not use AgentAssertion
- wire `AuthManager` default ChatGPT authorization through app listing,
connector directory listing, remote plugins, MCP status/listing,
analytics, and core-skills remote calls
- preserve bearer fallback when the feature is disabled, the backend
host is unsupported, or background task registration is not available
## Validation
- `just fmt`
- `cargo check -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `cargo test -p codex-login agent_identity`
- `cargo test -p codex-model-provider bearer_auth_provider`
- `cargo test -p codex-core agent_assertion`
- `cargo test -p codex-app-server remote_control`
- `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
- `cargo test -p codex-models-manager manager::tests`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-cloud-tasks`
- `just fix -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `just fix -p codex-app-server`
- `git diff --check`
Due to the app-server rebase of the TUI, the review prompt was leaked
into the transcript on the TUI
This is not a security issue but it was bad UX. This PR fixes this
The initial goal of this PR was to stabilise the test
`fs_watch_allows_missing_file_targets`. After further investigation, it
turns out that this test was always failing and the unstability was
coming from a race between timeouts mostly
The goal of the test was to test what happens if a notifier gets
subscribed while a file does not exist yet. But actually the main code
was broken and in case of a file not existing yet, the notifier used to
never notify anything (even if the file ended up being created)
This PR fixes the main code (and the test). For this, we basically watch
the sup-directory when a file does not exist and refresh on it when the
files gets created
## Why
This addresses the review comment from #17751 about `marketplace/remove`
app-server test portability:
https://github.com/openai/codex/pull/17751#discussion_r3104378613
The API returns the removed installed root using the app-server's
effective `CODEX_HOME`. On macOS, temporary directory paths can appear
as either `/var/...` or `/private/var/...`, so comparing one raw path
against another can fail even when `marketplace/remove` behaves
correctly.
## What changed
- Removed the direct whole-response equality assertion for the installed
root path.
- Asserted the stable response field, `marketplace_name`, directly.
- Compared the expected and returned installed-root paths after
canonicalizing their existing parent directories, which avoids requiring
the removed leaf directory to still exist.
## Verification
- `cargo test -p codex-app-server
marketplace_remove_deletes_config_and_installed_root`
- `cargo test -p codex-app-server marketplace_remove`
Make the morpheus agent (which is the phase 2 memories agent) follow the
agent-v2 path system by naming it `/morpheus`. To maintain the path
primitive this means moving it to a dedicated `AgentControl`
Co-authored-by: Codex <noreply@openai.com>
## Summary
Add a new app-server `marketplace/remove` RPC on top of the shared
marketplace-remove implementation.
This change:
- adds `MarketplaceRemoveParams` / `MarketplaceRemoveResponse` to the
app-server protocol
- wires the new request through `codex_message_processor`
- reuses the shared core marketplace-remove flow from the stacked
refactor PR
- updates generated schema files and adds focused app-server coverage
## Validation
- `just write-app-server-schema`
- `just fmt`
- heavy compile/test coverage deferred to GitHub CI per request
## Summary
This is the AgentAssertion downstream slice for feature-gated agent
identity support, replacing the oversized AgentAssertion slice from PR
#17807.
It isolates task-scoped downstream AgentAssertion wiring on top of the
merged PR3.1 work without re-carrying the earlier agent registration,
task registration, or task-state history.
This PR includes the task-scoped bug-fix call sites from the review:
generic file upload auth, MCP OpenAI file upload auth, and ARC monitor
auth. Broader user/control-plane calls move to PR4.1 and PR4.2.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
when enabled
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: this PR - use task-scoped `AgentAssertion` downstream when
enabled
- PR4.1: https://github.com/openai/codex/pull/18094 - introduce
AuthManager-owned background/control-plane `AgentAssertion` auth
- PR4.2: https://github.com/openai/codex/pull/18260 - use background
task auth for additional backend/control-plane calls
## What Changed
- add AgentAssertion envelope generation in `codex-core`
- route downstream HTTP and websocket auth through AgentAssertion when
an agent task is present
- extend the model-provider auth provider so non-bearer authorization
schemes can be passed through cleanly
- make generic file uploads attach the full authorization header value
- make MCP OpenAI file uploads use the cached thread agent task
assertion when present
- make ARC monitor calls use the cached thread agent task assertion when
present
## Why
The original PR had drifted ancestry and showed a much larger diff than
the semantic change actually required. Restacking it onto PR3.1 keeps
the reviewable surface down to the downstream assertion slice.
## Validation
- `just fmt`
- `cargo check -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `cargo test -p codex-model-provider bearer_auth_provider`
- `cargo test -p codex-core agent_assertion`
- `cargo test -p codex-app-server remote_control`
- `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
- `cargo test -p codex-models-manager manager::tests`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-cloud-tasks`
- `cargo test -p codex-login agent_identity`
- `just fix -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `just fix -p codex-app-server`
- `git diff --check`
## Summary
Third PR in the split from #17956. Stacked on #18220.
- shows workspace-owner/member-specific rate-limit messages behind
`workspace_owner_usage_nudge`
- prompts workspace members to notify the owner or request a usage-limit
increase
- sends the confirmed nudge through the app-server API and renders
completion feedback
- adds focused TUI snapshot coverage for prompts and completion states
- feature gate
## Validation
- `cargo test -p codex-backend-client`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server rate_limits`
- `cargo test -p codex-tui workspace_`
- `cargo test -p codex-tui status_`
- `just fmt`
- `just fix -p codex-backend-client`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
## Summary
The TUI still imported several symbols through the transitional
app-server-client `legacy_core` facade even though those symbols are
already owned by smaller crates. This PR narrows that facade by rewiring
those imports directly to their owner crates.
## Changes
No functional changes, just import rewiring. This is part of our ongoing
effort to whittle away at the `legacy_core` namespace, which represents
all of the remaining symbols that the TUI imports from the core.
## Summary
Fixes#18313.
Recent TUI resume breadcrumbs could print a thread title instead of the
stable thread UUID. For sessions whose title was auto-derived from the
first prompt, that made the suggested codex resume command look like it
should resume a long prompt rather than the session ID.
This updates the TUI and CLI post-exit resume hints, plus the in-session
summary shown when switching/forking threads, to always use the stable
thread ID for these recovery breadcrumbs. Explicit name-based resume
support remains available elsewhere.
## Summary
Remove the skills message from the guardian dev message
## Test Plan
- [x] Ran locally
- [x] Added unit test
---------
Co-authored-by: Codex <noreply@openai.com>
Fast mode TUI copy currently names a specific plan-usage multiplier in
two lightweight promo/help surfaces. This swaps that exact multiplier
language for the broader increased plan usage wording we use elsewhere.
There are no behavior changes here; the slash command and startup tip
still point users at the same Fast mode flow.
## Summary
- persist registered agent tasks in the session state update stream so
the thread can reuse them
- prewarm task registration once identity registration succeeds, while
keeping startup failures best-effort
- isolate the session-side task lifecycle into a dedicated module so
AgentIdentityManager and RegisteredAgentTask do not leak across as many
core layers
## Testing
- cargo test -p codex-core startup_agent_task_prewarm
- cargo test -p codex-core
cached_agent_task_for_current_identity_clears_stale_task
- cargo test -p codex-core record_initial_history_
## Stack
1. Base PR: #18443 stops granting ACLs on `USERPROFILE`.
2. This PR: filters additional SSH-owned profile roots discovered from
SSH config.
## Bug
The base PR removes the broadest bad grant: `USERPROFILE` itself.
That still leaves one important case. A user profile child can be
SSH-owned even when its name is not one of our fixed exclusions.
For example:
```sshconfig
Host devbox
IdentityFile ~/.keys/devbox
CertificateFile ~/.certs/devbox-cert.pub
UserKnownHostsFile ~/.known_hosts_custom
Include ~/.ssh/conf.d/*.conf
```
After profile expansion, the sandbox might see these as normal profile
children:
```text
C:\Users\me\.keys
C:\Users\me\.certs
C:\Users\me\.known_hosts_custom
C:\Users\me\.ssh
```
Those paths have another owner: OpenSSH and the tools that manage SSH
identity and host-key state. Codex should not add sandbox ACLs to them.
OpenSSH describes this dependency tree in
[`ssh_config(5)`](https://man.openbsd.org/ssh_config.5), and the client
parser follows the same shape in `readconf.c`:
- `Include` recursively reads more config files and expands globs
- `IdentityFile` and `CertificateFile` name authentication files
- `UserKnownHostsFile`, `GlobalKnownHostsFile`, and `RevokedHostKeys`
name host-key files
- `ControlPath` and `IdentityAgent` can name profile-owned sockets or
control files
- these path directives can use forms such as `~`, `%d`, and `${HOME}`
## Change
This PR adds a small SSH config dependency scanner.
It starts at:
```text
~/.ssh/config
```
Then it returns concrete paths named by `Include` and by path-valued SSH
config directives:
```text
IdentityFile
CertificateFile
UserKnownHostsFile
GlobalKnownHostsFile
RevokedHostKeys
ControlPath
IdentityAgent
```
For example:
```sshconfig
IdentityFile ~/.keys/devbox
CertificateFile ~/.certs/devbox-cert.pub
Include ~/.ssh/conf.d/*.conf
```
returns paths like:
```text
C:\Users\me\.keys\devbox
C:\Users\me\.certs\devbox-cert.pub
C:\Users\me\.ssh\conf.d\devbox.conf
```
The setup code then maps those paths back to their top-level
`USERPROFILE` child and filters matching sandbox roots out of both the
writable and readable root lists.
## Why this shape
The parser reports what SSH config references. The sandbox setup code
decides which `USERPROFILE` roots are unsafe to grant.
That keeps the policy simple:
1. expand broad profile grants
2. remove the profile root
3. remove fixed sensitive profile folders
4. remove profile folders referenced by SSH config dependencies
If a path has two possible owners, the sandbox steps back. SSH keeps
control of SSH config, keys, certificates, known-hosts files, sockets,
and included config files.
## Tests
- `cargo test -p codex-windows-sandbox --lib`
- `just bazel-lock-check`
- `just fix -p codex-windows-sandbox`
- `git diff --check`
## Stack
1. This PR: expand and filter `USERPROFILE` roots.
2. Follow-up: #18493 filters SSH config dependency roots on top of this
base.
## Bug
On Windows, Codex can grant the sandbox ACL access to the whole user
profile directory.
That means the sandbox ACL can be applied under paths like:
```text
C:\Users\me\.ssh
C:\Users\me\.tsh
```
This breaks SSH. Windows OpenSSH checks permissions on SSH config and
key material. If Codex adds a sandbox group ACL to those files, OpenSSH
can reject the config or keys.
The bad interaction is:
1. Codex asks the Windows sandbox to grant access to `USERPROFILE`.
2. The sandbox applies ACLs under that root.
3. SSH-owned files get an extra ACL entry.
4. OpenSSH rejects those files because their permissions are no longer
strict enough.
## Why this happens more now
Codex now has more flows that naturally start in the user profile:
- a new chat can start in the user directory
- a project can be rooted in the user directory
- a user can start the Codex CLI from the user directory
Those are valid user actions. The bug is that `USERPROFILE` is too broad
a sandbox root.
## Change
This PR keeps the useful behavior of starting from the user profile
without granting the profile root itself.
The new flow is:
1. collect the normal read and write roots
2. if a root is exactly `USERPROFILE`, replace it with the direct
children of `USERPROFILE`
3. remove `USERPROFILE` itself from the final root list
4. apply the existing user-profile read exclusions to both read and
write roots
5. add `.tsh` and `.brev` to that exclusion list
So this input:
```text
C:\Users\me
```
becomes roots like:
```text
C:\Users\me\Desktop
C:\Users\me\Documents
C:\Users\me\Downloads
```
and does not include:
```text
C:\Users\me
C:\Users\me\.ssh
C:\Users\me\.tsh
C:\Users\me\.brev
```
If `USERPROFILE` cannot be listed, expansion falls back to the profile
root and the later filter removes it. That keeps the failure mode closed
for this bug.
## Why this shape
The sandbox still gets access to ordinary profile folders when the user
starts from home.
The sandbox no longer grants access to the profile root itself.
All filtering happens after expansion, for both read and write roots.
That gives us one simple rule: expand broad profile grants first, then
remove roots the sandbox must not own.
## Tests
- `just fmt`
- `cargo test -p codex-windows-sandbox`
- `just fix -p codex-windows-sandbox`
- `git diff --check`
## Summary
Fixes#18554.
The `/experimental` menu can submit the full experimental feature state
even when the user presses Enter without toggling anything. Previously,
Codex showed `Memories will be enabled in the next session.` whenever
the submitted updates included `Feature::MemoryTool = true`, so sessions
where Memories were already enabled could show a redundant warning on a
no-op save.
This change records whether `Feature::MemoryTool` was enabled before
applying feature updates and only emits the next-session notice when
Memories actually transitions from disabled to enabled.
The TUI supports long-running turns and agent threads, but quick side
questions have required interrupting the main flow or manually
forking/navigating threads. This PR adds a guarded `/side` flow so users
can ask brief side-conversation questions in an ephemeral fork while
keeping the primary thread focused. This also helps address the feature
request in #18125.
The implementation creates one side conversation at a time, lets `/side`
open either an empty side thread or immediately submit `/side
<question>`, and returns to the parent with Esc or Ctrl+C. Side
conversations get hidden developer guardrails that treat inherited
history as reference-only and steer the model away from workspace
mutations unless explicitly requested in the side conversation.
The TUI hides most slash commands while side mode is active, leaving
only `/copy`, `/diff`, `/mention`, and `/status` available there.
## Why
Users have asked to queue follow-up slash commands while a task is
running, including in #14081, #14588, #14286, and #13779. The previous
TUI behavior validated slash commands immediately, so commands that are
only meaningful once the current turn is idle could not be queued
consistently.
The queue should preserve what the user typed and defer command parsing
until the item is actually dispatched. This also gives `/fast`, `/review
...`, `/rename ...`, `/model`, `/permissions`, and similar slash
workflows the same FIFO behavior as plain queued prompts.
## What Changed
- Added a queued-input action enum so queued items can be dispatched as
plain prompts, slash commands, or user shell commands.
- Changed `Tab` queueing to accept slash-led prompts without validating
them up front, then parse and dispatch them when dequeued.
- Added `!` shell-command queueing for `Tab` while a task is running,
while preserving existing `Enter` behavior for immediate shell
execution.
- Moved queued slash dispatch through shared slash-command parsing so
inline commands, unavailable commands, unknown commands, and local
config commands report at dequeue time.
- Continued queue draining after local-only actions and after slash menu
cancellation or selection when no task is running.
- Preserved slash-popup completion behavior so `/mo<Tab>` completes to
`/model ` instead of queueing the prefix.
- Updated pending-input preview snapshots to show queued follow-up
inputs.
## Verification
I did a bunch of manual validation (and found and fixed a few bugs along
the way).
## Summary
`codex app` should be a platform-aware entry point for opening Codex
Desktop or helping users install it. Before this change, the command
only existed on macOS and its default installer URL always pointed at
the Apple Silicon DMG, which sent Intel Mac users to the wrong build.
This updates the macOS path to choose the Apple Silicon or Intel DMG
based on the detected processor, while keeping `--download-url` as an
advanced override. It also enables `codex app` on Windows, where the CLI
opens an installed Codex Desktop app when available and otherwise opens
the Windows installer URL.
---------
Co-authored-by: Felipe Coury <felipe.coury@openai.com>
# Summary
When a user finishes planning, the TUI asks whether to implement in the
current conversation or start fresh with the approved plan. The
clear-context choice is easier to evaluate when the prompt shows how
much context has already been used, because the user can see when
carrying the full prior conversation is likely to be less useful than
preserving only the plan.
<img width="1612" height="1312" alt="image"
src="https://github.com/user-attachments/assets/694bcf87-8be5-4e88-a412-e562af62d5f7"
/>
This PR adds that context signal directly to the clear-context option
while keeping the copy compact enough for the Plan-mode selection popup.
# What Changed
- Compute an optional context-usage label when opening the plan
implementation prompt.
- Show the label only on `Yes, clear context and implement`, where it
informs the cleanup decision.
- Prefer a percentage-used label when context-window information is
available, with a compact token-used fallback when only token totals are
known.
- Preserve the original option description when usage is unknown or
effectively zero.
- Add rustdoc comments around the prompt-copy boundary so future changes
keep the context label formatting and selection rendering
responsibilities clear.
# Testing
- `cargo test -p codex-tui plan_implementation`
# Notes
The footer continues to show context remaining as ambient status. The
implementation prompt intentionally shows context used because the user
is choosing whether to clean up the current thread before
implementation.
## Summary
- Add the executor-backed RMCP stdio transport.
- Wire MCP stdio placement through the executor environment config.
- Cover local and executor-backed stdio paths with the existing MCP test
helpers.
## Stack
```text
o #18027 [6/6] Fail exec client operations after disconnect
│
@ #18212 [5/6] Wire executor-backed MCP stdio
│
o #18087 [4/6] Abstract MCP stdio server launching
│
o #18020 [3/6] Add pushed exec process events
│
o #18086 [2/6] Support piped stdin in exec process API
│
o #18085 [1/6] Add MCP server environment config
│
o main
```
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Fixes#16637. (I hit this bug after 11h of work on a long-running task.)
Plugin cache initialization could panic when an already-absolute cache
path was normalized through `AbsolutePathBuf::from_absolute_path`,
because that path still consulted `current_dir()`.
This changes absolute-path normalization so already-absolute paths do
not depend on cwd, and makes plugin cache root construction available as
a fallible path through `PluginStore::try_new()`. Plugin cache subpaths
now use `AbsolutePathBuf::join()` instead of re-absolutizing derived
absolute paths.
## Summary
- Reverts PR #17749 so queued inter-agent mail can again preempt after
reasoning/commentary output item boundaries.
- Applies the revert to the current `codex/turn.rs` module layout and
restores the prior pending-input test expectations/snapshots.
## Testing
- `just fmt`
- `cargo test -p codex-core --test all pending_input`
- `cargo test -p codex-core` failed in unrelated
`tools::js_repl::tests::js_repl_imported_local_files_can_access_repl_globals`:
dotslash download hit `mktemp: mkdtemp failed ... Operation not
permitted` in the sandbox temp dir.
Co-authored-by: Codex <noreply@openai.com>
Adds max_context_window to model metadata and routes core context-window
reads through resolved model info. Config model_context_window overrides
are clamped to max_context_window when present; without an override, the
model context_window is used.
## Summary
Move the marketplace remove implementation into shared core logic so
both the CLI command and follow-up app-server RPC can reuse the same
behavior.
This change:
- adds a shared `codex_core::plugins::remove_marketplace(...)` flow
- moves validation, config removal, and installed-root deletion out of
the CLI
- keeps the CLI as a thin wrapper over the shared implementation
- adds focused core coverage for the shared remove path
## Validation
- `just fmt`
- focused local coverage for the shared remove path
- heavier follow-up validation deferred to stacked PR CI
## Summary
- Populate `PluginDetail.description` in core for uninstalled cross-repo
plugins when detailed fields are unavailable until install.
- Include the source Git URL plus optional path/ref/sha details in that
fallback description.
- Keep `details_unavailable_reason` as the structured signal while
app-server forwards the description normally.
- Add plugin-read coverage proving the response does not clone the
remote source just to show the message.
## Why
Uninstalled cross-repo plugins intentionally return sparse detail data
so listing/reading does not clone the plugin source. Without a
description, Desktop and TUI detail pages look like an ordinary empty
plugin. This gives users a concrete explanation and source pointer while
keeping the existing structured reason available for callers.
## Validation
- `just fmt`
- `cargo test -p codex-core
read_plugin_for_config_uninstalled_git_source_requires_install_without_cloning`
- `cargo test -p codex-app-server plugin_read --test all`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
Note: `cargo test -p codex-app-server` was also attempted before the
latest refactor and failed broadly in unrelated v2
thread/realtime/review/skills suites; the new plugin-read test passed in
that run as well.
Cap the model-visible skills section to a small share of the context
window, with a fallback character budget, and keep only as many implicit
skills as fit within that budget.
Emit a non-fatal warning when enabled skills are omitted, and add a new
app-server warning notification
Record thread-start skill metrics for total enabled skills, kept skills,
and whether truncation happened
---------
Co-authored-by: Matthew Zeng <mzeng@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
- trust-gate project `.codex` layers consistently, including repos that
have `.codex/hooks.json` or `.codex/execpolicy/*.rules` but no
`.codex/config.toml`
- keep disabled project layers in the config stack so nested trusted
project layers still resolve correctly, while preventing hooks and exec
policies from loading until the project is trusted
- update app-server/TUI onboarding copy to make the trust boundary
explicit and add regressions for loader, hooks, exec-policy, and
onboarding coverage
## Security
Before this change, an untrusted repo could auto-load project hooks or
exec policies from `.codex/` as long as `config.toml` was absent. This
makes trust the single gate for project-local config, hooks, and exec
policies.
## Stack
- Parent of #15936
## Test
- cargo test -p codex-core without_config_toml
---------
Co-authored-by: Codex <noreply@openai.com>
This PR adds inline enable/disable controls to the new /plugins browse
menu. Installed plugins can now be toggled directly from the list with
keyboard interaction, and the associated config-write plumbing is
included so the UI and persisted plugin state stay in sync. This also
includes the queued-write handling needed to avoid stale toggle
completions overwriting newer intent.
- Add toggleable plugin rows for installed plugins in /plugins
- Support Space to enable or disable without leaving the list
- Persist plugin enablement through the existing app/config write path
- Preserve the current selection while the list refreshes after a toggle
- Add tests and snapshot updates for toggling behavior
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Update the plugin API for the new remote plugin model.
The mental model is no longer “keep local plugin state in sync with
remote.” Instead, local and remote plugins are becoming separate
sources. Remote catalog entries can be shown directly from the remote
API before installation; after installation they are still downloaded
into the local cache for execution, but remote installed state will come
from the API and be held in memory rather than being read from config.
• ## API changes
- Remove `forceRemoteSync` from `plugin/list`, `plugin/install`, and
`plugin/uninstall`.
- Remove `remoteSyncError` from `plugin/list`.
- Add remote-capable metadata to `plugin/list` / `plugin/read`:
- nullable `marketplaces[].path`
- `source: { type: "remote", downloadUrl }`
- URL asset fields alongside local path fields:
`composerIconUrl`, `logoUrl`, `screenshotUrls`
- Make `plugin/read` and `plugin/install` source-compatible:
- `marketplacePath?: AbsolutePathBuf | null`
- `remoteMarketplaceName?: string | null`
- exactly one source is required at runtime
## Why
The large Rust test suites are slow and include some of our flakiest
tests, so we want to run them with Bazel native sharding while keeping
shard membership stable between runs.
This is the simpler follow-up to the explicit-label experiment in
#17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which
includes the stable test-name hashing support from
hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel
macros into that support.
Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's
per-shard test action caching. Using stable name hashing avoids
reshuffling every test when one test is added or removed.
## What Changed
`codex_rust_crate` now accepts `test_shard_counts` and applies the right
Bazel/rules_rust attributes to generated unit and integration test
rules. Matched tests are also marked `flaky = True`, giving them Bazel's
default three attempts.
This PR shards these labels 8 ways:
```text
//codex-rs/core:core-all-test
//codex-rs/core:core-unit-tests
//codex-rs/app-server:app-server-all-test
//codex-rs/app-server:app-server-unit-tests
//codex-rs/tui:tui-unit-tests
```
## Verification
`bazel query --output=build` over the selected public labels and their
inner unit-test binaries confirmed the expected `shard_count = 8`,
`flaky = True`, and `experimental_enable_sharding = True` attributes.
Also verified that we see the shards as expected in BuildBuddy so they
can be analyzed independently.
Co-authored-by: Codex <noreply@openai.com>
## Why
This branch brings the Bazel module pins for `rules_rs` and `llvm` up to
the latest BCR releases and aligns the root direct dependencies with the
versions the module graph already resolves to.
That gives us a few concrete wins:
- picks up newer upstream fixes in the `rules_rs` / `rules_rust` stack,
including work around repo-rule nondeterminism and default Cargo binary
target generation
- picks up test sharding support from the newer `rules_rust` stack
([hermeticbuild/rules_rust#13](https://github.com/hermeticbuild/rules_rust/pull/13))
- picks up newer built-in knowledge for common system crates like
`gio-sys`, `glib-sys`, `gobject-sys`, `libgit2-sys`, and `libssh2-sys`,
which gives us a future path to reduce custom build-script handling
- reduces local patch maintenance by dropping fixes that are now
upstream and rebasing the remaining Windows patch stack onto a newer
upstream base
- removes the direct-dependency warnings from `bazel-lock-check` by
making the root pins match the resolved graph
## What Changed
- bump `rules_rs` from `0.0.43` to `0.0.58`
- bump `llvm` from `0.6.8` to `0.7.1`
- bump `bazel_skylib` from `1.8.2` to `1.9.0` so the root direct dep
matches the resolved graph
- regenerate `MODULE.bazel.lock` for the updated module graph
- refresh the remaining Windows-specific patch stack against the newer
upstream sources:
- `patches/rules_rs_windows_gnullvm_exec.patch`
- `patches/rules_rs_windows_exec_linker.patch`
- `patches/rules_rust_windows_exec_std.patch`
- `patches/rules_rust_windows_msvc_direct_link_args.patch`
- remove patches that are no longer needed because the underlying fixes
are upstream now:
- `patches/rules_rs_delete_git_worktree_pointer.patch`
- `patches/rules_rust_repository_set_exec_constraints.patch`
## Validation
- `just bazel-lock-update`
- `just bazel-lock-check`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- pass split filesystem sandbox policy/cwd through apply_patch contexts,
while omitting legacy-equivalent policies to keep payloads small
- keep the fs helper compatible with legacy Landlock by avoiding helper
read-root permission expansion in that mode and disabling helper network
access
## Root Cause
`d626dc38950fb40a1a5ad0a8ffab2485e3348c53` routed exec-server filesystem
operations through a sandboxed helper. That path forwarded legacy
Landlock into a helper policy shape that could require direct
split-policy enforcement. Sandboxed `apply_patch` hit that edge through
the filesystem abstraction.
The same 0.121 edit-regression path is consistent with #18354: normal
writes route through the `apply_patch` filesystem helper, fail under
sandbox, and then surface the generic retry-without-sandbox prompt.
Fixes#18069Fixes#18354
## Validation
- `cd codex-rs && just fmt`
- earlier branch validation before merging current `origin/main` and
dropping the now-separate PATH fix:
- `cd codex-rs && cargo test -p codex-exec-server`
- `cd codex-rs && cargo test -p codex-core file_system_sandbox_context`
- `cd codex-rs && just fix -p codex-exec-server`
- `cd codex-rs && just fix -p codex-core`
- `git diff --check`
- `cd codex-rs && cargo clean`
---------
Co-authored-by: Codex <noreply@openai.com>
This is the first mechanical cleanup in a stack whose higher-level goal
is to enable Clippy coverage for async guards held across `.await`
points.
The follow-up commits enable Clippy's
[`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
lint and the configurable
[`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
lint for Tokio guard types. This PR handles the cases where the
underlying issue is not protected shared mutable state, but a
`tokio::sync::mpsc::UnboundedReceiver` wrapped in `Arc<Mutex<_>>` so
cloned owners can call `recv().await`.
Using a mutex for that shape forces the receiver lock guard to live
across `.await`. Switching these paths to `async-channel` gives us
cloneable `Receiver`s, so each owner can hold a receiver handle directly
and await messages without an async mutex guard.
## What changed
- In `codex-rs/code-mode`, replace the turn-message
`mpsc::UnboundedSender`/`UnboundedReceiver` plus `Arc<Mutex<Receiver>>`
with `async_channel::Sender`/`Receiver`.
- In `codex-rs/codex-api`, replace the realtime websocket event receiver
with an `async_channel::Receiver`, allowing `RealtimeWebsocketEvents`
clones to receive without locking.
- Add `async-channel` as a dependency for `codex-code-mode` and
`codex-api`, and update `Cargo.lock`.
## Verification
- The split stack was verified at the final lint-enabling head with
`just clippy`.
## Summary
- add first-class marketplace support for git-backed plugin sources
- keep the newer marketplace parsing behavior from `main`, including
alternate manifest locations and string local sources
- materialize remote plugin sources during install, detail reads, and
non-curated cache refresh
- expose git plugin source metadata through the app-server protocol
## Details
This teaches the marketplace parser to accept all of the following:
- local string sources such as `"source": "./plugins/foo"`
- local object sources such as
`{"source":"local","path":"./plugins/foo"}`
- remote repo-root sources such as
`{"source":"url","url":"https://github.com/org/repo.git"}`
- remote subdir sources such as
`{"source":"git-subdir","url":"owner/repo","path":"plugins/foo","ref":"main","sha":"..."}`
It also preserves the newer tolerant behavior from `main`: invalid or
unsupported plugin entries are skipped instead of breaking the whole
marketplace.
## Validation
- `cargo test -p codex-core plugins::marketplace::tests`
- `just fix -p codex-core`
- `just fmt`
## Notes
- A full `cargo test -p codex-core` run still hit unrelated existing
failures in agent and multi-agent tests during this session; the
marketplace-focused suite passed after the rebase resolution.
Follow-up to https://github.com/openai/codex/pull/18178, where we called
out enabling the await-holding lint as a follow-up.
The long-term goal is to enable Clippy coverage for async guards held
across awaits. This PR is intentionally only the first, low-risk cleanup
pass: it narrows obvious lock guard lifetimes and leaves
`codex-rs/Cargo.toml` unchanged so the lint is not enabled until the
remaining cases are fixed or explicitly justified. It intentionally
leaves the active-turn/turn-state locking pattern alone because those
checks and mutations need to stay atomic.
## Common fixes used here
These are the main patterns reviewers should expect in this PR, and they
are also the patterns to reach for when fixing future `await_holding_*`
findings:
- **Scope the guard to the synchronous work.** If the code only needs
data from a locked value, move the lock into a small block, clone or
compute the needed values, and do the later `.await` after the block.
- **Use direct one-line mutations when there is no later await.** Cases
like `map.lock().await.remove(&id)` are acceptable when the guard is
only needed for that single mutation and the statement ends before any
async work.
- **Drain or clone work out of the lock before notifying or awaiting.**
For example, the JS REPL drains pending exec senders into a local vector
and the websocket writer clones buffered envelopes before it serializes
or sends them.
- **Use a `Semaphore` only when serialization is intentional across
async work.** The test serialization guards intentionally span awaited
setup or execution, so using a semaphore communicates "one at a time"
without holding a mutex guard.
- **Remove the mutex when there is only one owner.** The PTY stdin
writer task owns `stdin` directly; the old `Arc<Mutex<_>>` did not
protect shared access because nothing else had access to the writer.
- **Do not split locks that protect an atomic invariant.** This PR
deliberately leaves active-turn/turn-state paths alone because those
checks and mutations need to stay atomic. Those cases should be fixed
separately with a design change or documented with `#[expect]`.
## What changed
- Narrow scoped async mutex guards in app-server, JS REPL, network
approval, remote-control websocket, and the RMCP test server.
- Replace test-only async mutex serialization guards with semaphores
where the guard intentionally lives across async work.
- Let the PTY pipe writer task own stdin directly instead of wrapping it
in an async mutex.
## Verification
- `just fix -p codex-core -p codex-app-server -p codex-rmcp-client -p
codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness`
- `just clippy -p codex-core`
- `cargo test -p codex-core -p codex-app-server -p codex-rmcp-client -p
codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness` was
run; the app-server suite passed, and `codex-core` failed in the local
sandbox on six otel approval tests plus
`suite::user_shell_cmd::user_shell_command_does_not_set_network_sandbox_env_var`,
which appear to depend on local command approval/default rules and
`CODEX_SANDBOX_NETWORK_DISABLED=1` in this environment.
## Summary
- preserve a small fs-helper runtime env allowlist (`PATH`, temp vars)
instead of launching the sandboxed helper with an empty env
- add unit coverage for the allowlist and transformed sandbox request
env
- add a Linux smoke test that starts the test exec-server with a fake
`bwrap` on `PATH`, runs a sandboxed fs write through the remote fs
helper path, and asserts that bwrap path was exercised
## Validation
- `cd /tmp/codex-worktrees/fs-helper-env-defaults/codex-rs && export
PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH
&& bazel test --bes_backend= --bes_results_url=
//codex-rs/exec-server:exec-server-file_system-test
--test_filter=sandboxed_file_system_helper_finds_bwrap_on_preserved_path`
- `cd /tmp/codex-worktrees/fs-helper-env-defaults/codex-rs && export
PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH
&& bazel test --bes_backend= --bes_results_url=
//codex-rs/exec-server:exec-server-unit-tests
--test_filter="helper_env|sandbox_exec_request_carries_helper_env"`
- earlier on this branch before the smoke-test harness adjustment: `cd
/tmp/codex-worktrees/fs-helper-env-defaults/codex-rs && export
PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH
&& bazel test --bes_backend= --bes_results_url=
//codex-rs/exec-server:all`
Co-authored-by: Codex <noreply@openai.com>
## Summary
First PR in the split from #17956.
- adds the core/app-server `RateLimitReachedType` shape
- maps backend `rate_limit_reached_type` into Codex rate-limit snapshots
- carries the field through app-server notifications/responses and
generated schemas
- updates existing constructors/tests for the new optional field
## Validation
- `cargo test -p codex-backend-client`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server rate_limits`
- `cargo test -p codex-tui workspace_`
- `cargo test -p codex-tui status_`
- `just fmt`
- `just fix -p codex-backend-client`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
This PR moves `/plugins` onto the shared tabbed selection-list
infrastructure and introduces the new v2 menu. The menu now groups
plugins into All Plugins, Installed, OpenAI Curated, and per-marketplace
tabs.
- Rebuild /plugins on top of the shared tabbed selection list
- Add All Plugins, Installed, OpenAI Curated, and per-marketplace tabs
- Preserve active tab and selected-row behavior across popup refreshes
- Add duplicate marketplace tab-label disambiguation
- Update browse-mode popup tests and snapshots
Co-authored-by: Codex <noreply@openai.com>
# Summary
This removes startup `skills/list` from the critical path to first
input. In release measurements, median startup-to-input time improved
from `307.5 ms` to `191.0 ms` across 30 measured runs with 5 warmups.
# Background
Startup currently waits for a forced `skills/list` app-server request
before scheduling the first usable TUI frame. That makes skill metadata
freshness part of the process-launch-to-input path, even though the
prompt can safely accept normal input before skill metadata has finished
loading.
I measured startup from process launch until the TUI reports that the
user can type. The measurement harness watched the startup measurement
record, killed Codex after a successful sample, and enforced a timeout
so repeated runs would not leave TUI processes behind. The debug runs
had enough outliers that I used median as the primary signal and ran a
baseline self-compare to understand the noise floor.
# Why skills/list
The `skills/list` cut was the best practical optimization because it
improved startup without changing the important readiness contract: when
the prompt is shown, it is still backed by an active session. Only
enrichment data arrives later.
| Candidate | Result | Decision |
| --- | --- | --- |
| Defer startup `skills/list` | Debug median improved from `524.0 ms` to
`348.0 ms`; release median improved from `307.5 ms` to `191.0 ms`. |
Keep |
| Defer fresh `thread/start` | Debug median improved from `494.0 ms` to
`256.0 ms`, but the prompt could appear before an active thread was
attached. | Reject as too risky for this PR |
| Avoid forced skills config reload | Debug median moved from `509.0 ms`
to `512.0 ms`. | Reject as neutral |
| Skip fresh history metadata | Debug median moved from `496.5 ms` to
`531.5 ms`. | Reject as regression/noise |
| Defer app-server startup | Not implemented because it would only
permit a loading frame unless the TUI gained a deliberate pre-server
state. | Out of scope |
# Implementation
`App::refresh_startup_skills` now clones the app-server request handle,
spawns a background task, and issues the same forced `skills/list`
request after the first frame is scheduled. When the request completes,
the task sends `AppEvent::SkillsListLoaded` back through the normal app
event queue.
The existing skills response handling still converts the app-server
response, updates the chat widget, and emits invalid `SKILL.md`
warnings. Explicit user-initiated skills refreshes still use the
existing synchronous app command path, so callers that intentionally
requested fresh skill state do not race ahead of their own refresh.
# Tradeoffs
The main tradeoff is a narrow theoretical race at startup: skill mention
completion depends on a background `skills/list` response, so it could
briefly show stale or empty metadata if opened before that response
arrives. In manual testing, pressing `$` as soon as possible after
launch still showed populated skill metadata, so this risk appears
minimal in normal use. Plain input remains available immediately, and
the UI updates through the existing skills response path once the
refresh completes.
This PR does not change how skills are discovered, cached,
force-reloaded, displayed, enabled, or warned about. It only changes
when the startup refresh is allowed to complete relative to the first
usable TUI frame.
# Verification
- `cargo test -p codex-tui`
## Summary
- Move local MCP stdio process startup behind a launcher trait.
- Preserve existing local stdio behavior while making transport creation
explicit.
## Stack
```text
o #18027 [6/6] Fail exec client operations after disconnect
│
o #18212 [5/6] Wire executor-backed MCP stdio
│
@ #18087 [4/6] Abstract MCP stdio server launching
│
o #18020 [3/6] Add pushed exec process events
│
o #18086 [2/6] Support piped stdin in exec process API
│
o #18085 [1/6] Add MCP server environment config
│
o main
```
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
PR Babysitter can reply directly to GitHub code review comments when
feedback is non-actionable, already addressed, or not valid. Those
replies should be visibly attributed so reviewers do not mistake an
automated Codex response for a message from the human operator.
This updates the skill instructions to require GitHub code review
replies from the babysitter to start with `[codex]`.
## Changes
- Adds the `[codex]` prefix requirement to the core PR Babysitter
workflow.
- Repeats the requirement in the review comment handling guidance where
agents decide whether to reply to a review thread.
## Summary
- Add a pushed `ExecProcessEvent` stream alongside retained
`process/read` output.
- Publish local and remote output, exit, close, and failure events.
- Cover the event stream with shared local/remote exec process tests.
## Testing
- `cargo check -p codex-exec-server`
- `cargo check -p codex-rmcp-client`
- Not run: `cargo test` per repo instruction; CI will cover.
## Stack
```text
o #18027 [6/6] Fail exec client operations after disconnect
│
o #18212 [5/6] Wire executor-backed MCP stdio
│
o #18087 [4/6] Abstract MCP stdio server launching
│
@ #18020 [3/6] Add pushed exec process events
│
o #18086 [2/6] Support piped stdin in exec process API
│
o #18085 [1/6] Add MCP server environment config
│
o main
```
---------
Co-authored-by: Codex <noreply@openai.com>
To improve performance of UI loads from the app, add two main
improvements:
1. The `thread/list` api now gets a `sortDirection` request field and a
`backwardsCursor` to the response, which lets you paginate forwards and
backwards from a window. This lets you fetch the first few items to
display immediately while you paginate to fill in history, then can
paginate "backwards" on future loads to catch up with any changes since
the last UI load without a full reload of the entire data set.
2. Added a new `thread/turns/list` api which also has sortDirection and
backwardsCursor for the same behavior as `thread/list`, allowing you the
same small-fetch for immediate display followed by background fill-in
and resync catchup.
## Why
The Bazel workflow has multiple jobs that run concurrently for the same
target triple. In particular, the Windows `test`, `clippy`, and
`verify-release-build` jobs could all miss and then attempt to save the
same Bazel repository cache key:
```text
bazel-cache-${target}-${lockhash}
```
Because `actions/cache` entries are immutable, only one job can reserve
that key. The others can report failures such as:
```text
Failed to save: Unable to reserve cache with key bazel-cache-x86_64-pc-windows-gnullvm-..., another job may be creating this cache.
```
Adding only the workflow name would not separate these jobs because they
all run inside the same `Bazel` workflow. The key needs a job-level
namespace as well.
## What Changed
- Added a required `cache-scope` input to
`.github/actions/prepare-bazel-ci/action.yml`.
- Moved Bazel repository cache key construction into the shared action
and exposed the computed key as `repository-cache-key`.
- Exposed the exact restore result as `repository-cache-hit` so save
steps can skip exact cache hits.
- Updated `.github/workflows/bazel.yml` to pass `cache-scope: bazel-${{
github.job }}` for the `test`, `clippy`, and `verify-release-build`
jobs.
- The scoped restore key is now the only fallback. This avoids carrying
a temporary restore path for the old unscoped cache namespace.
## Verification
- Parsed `.github/actions/prepare-bazel-ci/action.yml` and
`.github/workflows/bazel.yml` with Ruby's YAML parser.
- `actionlint` is not installed in this workspace, so I could not run a
GitHub Actions semantic lint locally.
## Why
Unused imports in `core/tests/suite/unified_exec.rs` in the Windows
build were not caught by Bazel CI on
https://github.com/openai/codex/pull/18096. I spot-checked
https://github.com/openai/codex/actions/workflows/rust-ci-full.yml?query=branch%3Amain
and noticed that builds were consistently red. This revealed that our
Cargo builds _were_ properly catching these issues, identifying a
Windows-specific coverage hole in the Bazel clippy job.
The Windows Bazel clippy job uses `--skip_incompatible_explicit_targets`
so it can lint a broad target set without failing immediately on targets
that are genuinely incompatible with Windows. However, with the default
Windows host platform, `rust_test` targets such as
`//codex-rs/core:core-all-test` could be skipped before the clippy
aspect reached their integration-test modules. As a result, the imports
in `core/tests/suite/unified_exec.rs` were not being linted by the
Windows Bazel clippy job at all.
The clippy diagnostic that Windows Bazel should have surfaced was:
```text
error: unused import: `codex_config::Constrained`
--> core\tests\suite\unified_exec.rs:8:5
|
8 | use codex_config::Constrained;
| ^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D unused-imports` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unused_imports)]`
error: unused import: `codex_protocol::permissions::FileSystemAccessMode`
--> core\tests\suite\unified_exec.rs:11:5
|
11 | use codex_protocol::permissions::FileSystemAccessMode;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: unused import: `codex_protocol::permissions::FileSystemPath`
--> core\tests\suite\unified_exec.rs:12:5
|
12 | use codex_protocol::permissions::FileSystemPath;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: unused import: `codex_protocol::permissions::FileSystemSandboxEntry`
--> core\tests\suite\unified_exec.rs:13:5
|
13 | use codex_protocol::permissions::FileSystemSandboxEntry;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: unused import: `codex_protocol::permissions::FileSystemSandboxPolicy`
--> core\tests\suite\unified_exec.rs:14:5
|
14 | use codex_protocol::permissions::FileSystemSandboxPolicy;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
## What changed
- Run the Windows Bazel clippy job with the MSVC host platform via
`--windows-msvc-host-platform`, matching the Windows Bazel test job.
This keeps `--skip_incompatible_explicit_targets` while ensuring Windows
`rust_test` targets such as `//codex-rs/core:core-all-test` are still
linted.
- Remove the unused imports from `core/tests/suite/unified_exec.rs`.
- Add `--print-failed-action-summary` to
`.github/scripts/run-bazel-ci.sh` so Bazel action failures can be
summarized after the build exits.
## Failure reporting
Once the coverage issue was fixed, an intentionally reintroduced unused
import made the Windows Bazel clippy job fail as expected. That exposed
a separate usability problem: because the job keeps `--keep_going`, the
top-level Bazel output could still end with:
```text
ERROR: Build did NOT complete successfully
FAILED:
```
without the underlying rustc/clippy diagnostic being visible in the
obvious part of the GitHub Actions log.
To keep `--keep_going` while making failures actionable, the wrapper now
scans the captured Bazel console output for failed actions and prints
the matching rustc/clippy diagnostic block. When a diagnostic block is
found, it is emitted both as a GitHub `::error` annotation and as plain
expanded log output, rather than being hidden in a collapsed group.
## Verification
To validate the CI path, I intentionally introduced an unused import in
`core/tests/suite/unified_exec.rs`. The Windows Bazel clippy job failed
as expected, confirming that the integration-test module is now covered
by Bazel clippy. The same failure also verified that the wrapper
surfaces the matching clippy diagnostics directly in the Actions output.
## Summary
- Normalize deferred MCP and dynamic tools into `ToolSearchEntry` values
before constructing `ToolSearchHandler`.
- Move the tool-search entry adapter out of `tools/handlers` and into
`tools/tool_search_entry.rs` so the handlers directory stays focused on
handlers.
- Keep `ToolSearchHandler` operating over one generic entry list for
BM25 search, namespace grouping, and per-bucket default limits.
## Why
Follow-up cleanup for #17849. The dynamic tool-search support made the
handler juggle source-specific MCP and dynamic tool lists, index
arithmetic, output conversion, and namespace emission. This keeps source
adaptation outside the handler so the search loop itself is smaller and
source-agnostic.
## Validation
- `just fmt`
- `cargo test -p codex-core tools::handlers::tool_search::tests`
- `git diff --check`
- `cargo test -p codex-core` currently fails in unrelated
`plugins::manager::tests::list_marketplaces_ignores_installed_roots_missing_from_config`;
rerunning that single test fails the same way at
`core/src/plugins/manager_tests.rs:1692`.
---------
Co-authored-by: pash <pash@openai.com>
Summary
- replace the thread/read persisted-load helper with
ThreadStore::read_thread
- move SQLite/rollout summary, name, fork metadata, and history loading
for persisted reads into LocalThreadStore
- leave getConversationSummary unchanged for a later PR
Context
- Replaces closed stacked PR #18232 after PR #18231 merged and its base
branch was deleted.
## TL;DR
- Adds a second Plan Mode handoff: implement the approved plan after
clearing context.
- Keeps the existing same-thread `Yes, implement this plan` action
unchanged.
- Reuses the `/clear` thread-start path and submits the approved plan as
the fresh thread's first prompt.
- Covers the new popup option, event plumbing, initial-message behavior,
and disabled states in TUI tests.
## Problem
Plan Mode already asks whether to implement an approved plan, but the
only affirmative path continues in the same thread. That is useful when
the planning conversation itself is still valuable, but it does not
support the workflow where exploratory planning context is discarded and
implementation starts from the final approved plan as the only
model-visible handoff.
<img width="1253" height="869" alt="image"
src="https://github.com/user-attachments/assets/90023d75-c330-4919-bed8-518671c3474b"
/>
## Mental model
There are now two implementation choices after a proposed plan. The
existing choice, `Yes, implement this plan`, is unchanged: it switches
to Default mode and submits `Implement the plan.` in the current thread.
The new choice, `Yes, clear context and implement`, treats the proposed
plan as a handoff artifact. It clears the UI/session context through the
same thread-start source used by `/clear`, then submits an initial
prompt containing the approved plan after the fresh thread is
configured.
The important distinction is that the new path is not compaction. The
model receives a deliberate implementation prompt built from the
approved plan markdown, not a summary of the previous planning
transcript. Both implementation choices require the Default
collaboration preset to be available, so the popup does not offer a
coding handoff when the fresh thread would fall back to another mode.
## Non-goals
This change does not alter `/clear`, `/compact`, or the existing
same-context Plan Mode implementation option. It does not add protocol
surface area or app-server schema changes. It also does not carry the
previous transcript path or a generated planning summary into the new
model context.
## Tradeoffs
The fresh-context option relies on the approved plan being sufficiently
complete. That matches the Plan Mode contract, but it means vague plans
will produce weaker implementation starts than a compacted transcript
would. The upside is that rejected ideas, exploratory dead ends, and
planning corrections do not leak into the implementation turn.
The current implementation stores the latest proposed plan in
`ChatWidget` rather than deriving it from history cells at selection
time. This keeps the popup action simple and deterministic, but it makes
the cache lifecycle important: it must be reset when a new task starts
so an old plan cannot be submitted later.
## Architecture
The TUI stores the most recent completed proposed-plan markdown when a
plan item completes. The Plan Mode approval popup uses that cache to
enable the fresh-context option and to build a first-turn prompt that
instructs the model to implement the approved plan in a fresh context.
Selecting the new option emits a TUI-internal
`ClearUiAndSubmitUserMessage` event. `App` handles that event by reusing
the existing clear flow: clear terminal state, reset app UI state, start
a new app-server thread with `ThreadStartSource::Clear`, and attach a
replacement `ChatWidget` with an initial user message. The existing
initial-message suppression in `enqueue_primary_thread_session` ensures
the prompt is submitted only after the new session is configured and any
startup replay is rendered.
## Observability
The previous thread remains resumable through the existing clear-session
summary hint. There is no new telemetry or protocol event for this path,
so debugging should start at the TUI event boundary: confirm the popup
emitted `ClearUiAndSubmitUserMessage`, confirm the app-server thread
start used `ThreadStartSource::Clear`, then confirm the fresh widget
submitted the initial user message after `SessionConfigured`.
## Tests
The Plan Mode popup snapshots cover the new option and preserve the
original option as the first/default action. Unit coverage verifies the
original same-context option still emits `SubmitUserMessageWithMode`,
the new option emits `ClearUiAndSubmitUserMessage` with the approved
plan embedded verbatim, and the clear-context option is disabled when
Default mode is unavailable or no approved plan exists. The broader
`codex-tui` test package passes with the updated fresh-thread
initial-message plumbing.
Rename `no_memories_if_mcp_or_web_search` →
`disable_on_external_context` with backward compatibility
While doing so, we add a key alias system on our layer merging system.
What we try to avoid is a case where a company managed config use an old
name while the user has a new name in it's local config (which would
make the deserialization fail)
## Why
`origin/main` picked up two changes that crossed in flight:
- #18209 refactored config loading to read through `ExecutorFileSystem`,
changing `load_requirements_toml` to take a filesystem handle and an
`AbsolutePathBuf`.
- #17740 added managed `deny_read` requirements tests that still called
`load_requirements_toml` with the previous two-argument signature.
Once both landed, `just clippy` failed because the new tests no longer
matched the current helper API.
## What
- Updates the two managed `deny_read` requirements tests to convert the
fixture path to `AbsolutePathBuf` before loading.
- Passes `LOCAL_FS.as_ref()` into `load_requirements_toml` so these
tests follow the filesystem abstraction introduced by #18209.
## Verification
- `just clippy`
- `cargo test -p codex-core load_requirements_toml_resolves_deny_read`
- `cargo test -p codex-core --test all
unified_exec_enforces_glob_deny_read_policy`
## Summary
- adds managed requirements support for deny-read filesystem entries
- constrains config layers so managed deny-read requirements cannot be
widened by user-controlled config
- surfaces managed deny-read requirements through debug/config plumbing
This PR lets managed requirements inject deny-read filesystem
constraints into the effective filesystem sandbox policy.
User-controlled config can still choose the surrounding permission
profile, but it cannot remove or weaken the managed deny-read entries.
## Managed deny-read shape
A managed requirements file can declare exact paths and glob patterns
under `[permissions.filesystem]`:
```toml
# /etc/codex/requirements.toml
[permissions.filesystem]
deny_read = [
"/Users/alice/.gitconfig",
"/Users/alice/.ssh",
"./managed-private/**/*.env",
]
```
Those entries are compiled into the effective filesystem policy as
`access = none` rules, equivalent in shape to filesystem permission
entries like:
```toml
[permissions.workspace.filesystem]
"/Users/alice/.gitconfig" = "none"
"/Users/alice/.ssh" = "none"
"/absolute/path/to/managed-private/**/*.env" = "none"
```
The important difference is that the managed entries come from
requirements, so lower-precedence user config cannot remove them or make
those paths readable again.
Relative managed `deny_read` entries are resolved relative to the
directory containing the managed requirements file. Glob entries keep
their glob suffix after the non-glob prefix is normalized.
## Runtime behavior
- Managed `deny_read` entries are appended to the effective
`FileSystemSandboxPolicy` after the selected permission profile is
resolved.
- Exact paths become `FileSystemPath::Path { access: None }`; glob
patterns become `FileSystemPath::GlobPattern { access: None }`.
- When managed deny-read entries are present, `sandbox_mode` is
constrained to `read-only` or `workspace-write`; `danger-full-access`
and `external-sandbox` cannot silently bypass the managed read-deny
policy.
- On Windows, the managed deny-read policy is enforced for direct file
tools, but shell subprocess reads are not sandboxed yet, so startup
emits a warning for that platform.
- `/debug-config` shows the effective managed requirement as
`permissions.filesystem.deny_read` with its source.
## Stack
1. #15979 - glob deny-read policy/config/direct-tool support
2. #18096 - macOS and Linux sandbox enforcement
3. This PR - managed deny-read requirements
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Fixes#18160.
iTerm2 can append the current foreground process to tab titles, and
Codex's terminal-title updates were causing that decoration to appear as
`(codex")` with a stray trailing quote. Codex was writing OSC 0 title
sequences terminated with ST (`ESC \`). Some terminal title integrations
appear to accept that title update but still expose the ST terminator in
their own process/title decoration.
## Changes
- Update `codex-rs/tui/src/terminal_title.rs` to terminate OSC 0 title
updates with BEL instead of ST.
- Update the focused terminal-title encoding test to assert the
BEL-terminated sequence.
## Compatibility
This should be low risk: the title payload and update timing are
unchanged, and BEL is the form already emitted by
`crossterm::terminal::SetTitle` in the crossterm version used by this
repository. BEL is also the widely supported xterm-family title
terminator used by common terminals and multiplexers. The main
theoretical risk would be a very old or unusual terminal that accepted
only ST and not BEL for OSC title termination, but that is unlikely
compared with the observed iTerm2 issue.
## Verification
- `cargo test -p codex-tui terminal_title`
- `cargo test -p codex-tui`
Fixes#18179.
## Why
The fullscreen `/resume` picker accepted Up/Down navigation but ignored
Ctrl+P/Ctrl+N, which made it inconsistent with other TUI selection flows
such as `ListSelectionView`-backed pickers and composer navigation.
## What Changed
Updated `codex-rs/tui/src/resume_picker.rs` so the resume picker treats
Ctrl+P/Ctrl+N as aliases for Up/Down, including the raw `^P`/`^N`
control-character events some terminals emit without a CONTROL modifier.
## Why
We need `PermissionRequest` hook support!
Also addresses:
- https://github.com/openai/codex/issues/16301
- run a script on Hook to do things like play a sound to draw attention
but actually no-op so user can still approve
- can omit the `decision` object from output or just have the script
exit 0 and print nothing
- https://github.com/openai/codex/issues/15311
- let the script approve/deny on its own
- external UI what will run on Hook and relay decision back to codex
## Reviewer Note
There's a lot of plumbing for the new hook, key files to review are:
- New hook added in `codex-rs/hooks/src/events/permission_request.rs`
- Wiring for network approvals
`codex-rs/core/src/tools/network_approval.rs`
- Wiring for tool orchestrator `codex-rs/core/src/tools/orchestrator.rs`
- Wiring for execve
`codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`
## What
- Wires shell, unified exec, and network approval prompts into the
`PermissionRequest` hook flow.
- Lets hooks allow or deny approval prompts; quiet or invalid hooks fall
back to the normal approval path.
- Uses `tool_input.description` for user-facing context when it helps:
- shell / `exec_command`: the request justification, when present
- network approvals: `network-access <domain>`
- Uses `tool_name: Bash` for shell, unified exec, and network approval
permission-request hooks.
- For network approvals, passes the originating command in
`tool_input.command` when there is a single owning call; otherwise falls
back to the synthetic `network-access ...` command.
<details>
<summary>Example `PermissionRequest` hook input for a shell
approval</summary>
```json
{
"session_id": "<session-id>",
"turn_id": "<turn-id>",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/path/to/cwd",
"hook_event_name": "PermissionRequest",
"model": "gpt-5",
"permission_mode": "default",
"tool_name": "Bash",
"tool_input": {
"command": "rm -f /tmp/example"
}
}
```
</details>
<details>
<summary>Example `PermissionRequest` hook input for an escalated
`exec_command` request</summary>
```json
{
"session_id": "<session-id>",
"turn_id": "<turn-id>",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/path/to/cwd",
"hook_event_name": "PermissionRequest",
"model": "gpt-5",
"permission_mode": "default",
"tool_name": "Bash",
"tool_input": {
"command": "cp /tmp/source.json /Users/alice/export/source.json",
"description": "Need to copy a generated file outside the workspace"
}
}
```
</details>
<details>
<summary>Example `PermissionRequest` hook input for a network
approval</summary>
```json
{
"session_id": "<session-id>",
"turn_id": "<turn-id>",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/path/to/cwd",
"hook_event_name": "PermissionRequest",
"model": "gpt-5",
"permission_mode": "default",
"tool_name": "Bash",
"tool_input": {
"command": "curl http://codex-network-test.invalid",
"description": "network-access http://codex-network-test.invalid"
}
}
```
</details>
## Follow-ups
- Implement the `PermissionRequest` semantics for `updatedInput`,
`updatedPermissions`, `interrupt`, and suggestions /
`permission_suggestions`
- Add `PermissionRequest` support for the `request_permissions` tool
path
---------
Co-authored-by: Codex <noreply@openai.com>
add new `tool_search_always_defer_mcp_tools` feature flag that always
defers all mcp tools rather than deferring once > 100 deferrable tools.
add new tests, also move `mcp_exposure` tests into dedicated file rather
than polluting `codex_tests`.
… import
## Why
`externalAgentConfig/import` used to spawn plugin imports in the
background and return immediately. That meant local marketplace imports
could still be in flight when the caller refreshed plugin state, so
newly imported plugins would not show up right away.
This change makes local marketplace imports complete before the RPC
returns, while keeping remote marketplace imports asynchronous so we do
not block on remote fetches.
## What changed
- split plugin migration details into local and remote marketplace
imports based on the external config source
- import local marketplaces synchronously during
`externalAgentConfig/import`
- return pending remote plugin imports to the app-server so it can
finish them in the background
- clear the plugin and skills caches before responding to plugin
imports, and again after background remote imports complete, so the next
`plugin/list` reloads fresh state
- keep marketplace source parsing encapsulated behind
`is_local_marketplace_source(...)` instead of re-exporting the internal
enum
- add core and app-server coverage for the synchronous local import path
and the pending remote import path
## Verification
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core` (currently fails an existing unrelated
test:
`config_loader::tests::cli_override_can_update_project_local_mcp_server_when_project_is_trusted`)
- `cargo test` (currently fails existing `codex-app-server` integration
tests in MCP/skills/thread-start areas, plus the unrelated `codex-core`
failure above)
## Summary
This fixes a Windows-only failure in the exec policy multi-segment shell
test. The test was meant to verify that a compound shell command only
bypasses sandboxing when every parsed segment has an explicit exec
policy allow rule.
On Windows, the read-only sandbox setup is intentionally treated as
lacking sandbox protection, so the old fixture could take the approval
path before reaching the intended bypass assertion. The test now uses
the workspace-write sandbox policy, keeping the focus on the per-segment
bypass rule while preserving the expected bypass_sandbox false result
when only cat is explicitly allowed.
## Summary
This changes Codex logout so managed ChatGPT auth is revoked against
AuthAPI before local auth state is removed. CLI logout, TUI `/logout`,
and the app-server account logout path now use the token-revoking logout
flow instead of only deleting `auth.json` / credential store state.
## Root Cause
Logout previously cleared only local auth storage. That removed Codex's
local credentials but did not ask the backend to invalidate the
refresh/access token state associated with a managed ChatGPT login.
## Behavior
For managed ChatGPT auth, logout sends the stored refresh token to
`https://auth.openai.com/oauth/revoke` with `token_type_hint:
refresh_token` and the Codex OAuth client id, then deletes all local
auth stores after revocation succeeds. If only an access token is
available, it falls back to revoking that access token. API key auth and
externally supplied `chatgptAuthTokens` are still only cleared locally
because Codex does not own a refresh token for those modes.
Revocation failures are fail-closed: if Codex cannot load stored auth or
the backend revoke call fails, logout returns an error and leaves local
auth in place so the user can retry instead of silently clearing local
state while backend tokens remain valid.
## Validation
ran local version of `codex-cli` with staging overrides/harness for auth
ran `codex login` then `codex logout`:
saw auth.json clear and backend revocation endpoints were called
```
POST /oauth/revoke
status: 200
revoking access token
should clear auth session
clearing auth session due to token revocation
successfully revoked session and access token
CANONICAL-API-LINE Response: status='200' method='POST' path='/oauth/revoke
```
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- refactor thread/read into explicit persisted-load, live-load, and
merge steps
- preserve existing SQLite/filesystem/live-thread behavior exactly
- keep ThreadStore migration out of this PR so the next PR is easier to
review
Validation
- this one's a pure reorganization that relies on existing test coverage
## Summary
Move the Computer Use tool suggestion into core Codex plugin discovery.
Also search `openai-bundled` when listing suggested plugins, with test
coverage for overlap between baked-in suggestions and
`tool_suggest.discoverables`.
## Test plan
Tested locally:
- `cargo test -p codex-core list_tool_suggest_discoverable_plugins`
Load plugin manifests through a shared discoverable-path helper so
manifest reads, installs, and skill names all see the same alternate
manifest location.
## Summary
- Add `codex-model-provider` as the runtime home for model-provider
behavior that does not belong in `codex-core`, `codex-login`, or
`codex-api`.
- The new crate wraps configured `ModelProviderInfo` in a
`ModelProvider` trait object that can resolve the API provider config,
provider-scoped auth manager, and request auth provider for each call.
- This centralizes provider auth behavior in one place today, and gives
us an extension point for future provider-specific auth, model listing,
request setup, and related runtime behavior.
## Tests
Ran tests manually to make sure that provider auth under different
configs still work as expected.
---------
Co-authored-by: pakrym-oai <pakrym@openai.com>
Adds new events for streaming apply_patch changes from responses api.
This is to enable clients to show progress during file writes.
Caveat: This does not work with apply_patch in function call mode, since
that required adding streaming json parsing.
## Summary
- mark `features.use_legacy_landlock` as a deprecated feature flag
- emit a startup deprecation notice when the flag is configured
- add feature- and core-level regression coverage for the notice
<img width="1288" height="93" alt="Screenshot 2026-04-15 at 11 14 00 PM"
src="https://github.com/user-attachments/assets/fffc628b-614c-4521-9374-64e50a269252"
/>
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- adds macOS Seatbelt deny rules for unreadable glob patterns
- expands unreadable glob matches on Linux and masks them in bwrap,
including canonical symlink targets
- keeps Linux glob expansion robust when `rg` is unavailable in minimal
or Bazel test environments
- adds sandbox integration coverage that runs `shell` and `exec_command`
with a `**/*.env = none` policy and verifies the secret contents do not
reach the model
## Linux glob expansion
```text
Prefer: rg --files --hidden --no-ignore --glob <pattern> -- <search-root>
Fallback: internal globset walker when rg is not installed
Failure: any other rg failure aborts sandbox construction
```
```
[permissions.workspace.filesystem]
glob_scan_max_depth = 2
[permissions.workspace.filesystem.":project_roots"]
"**/*.env" = "none"
```
This keeps the common path fast without making sandbox construction
depend on an ambient `rg` binary. If `rg` is present but fails for
another reason, the sandbox setup fails closed instead of silently
omitting deny-read masks.
## Platform support
- macOS: subprocess sandbox enforcement is handled by Seatbelt regex
deny rules
- Linux: subprocess sandbox enforcement is handled by expanding existing
glob matches and masking them in bwrap
- Windows: policy/config/direct-tool glob support is already on `main`
from #15979; Windows subprocess sandbox paths continue to fail closed
when unreadable split filesystem carveouts require runtime enforcement,
rather than silently running unsandboxed
## Stack
1. #15979 - merged: cross-platform glob deny-read
policy/config/direct-tool support for macOS, Linux, and Windows
2. This PR - macOS/Linux subprocess sandbox enforcement plus Windows
fail-closed clarification
3. #17740 - managed deny-read requirements
## Verification
- Added integration coverage for `shell` and `exec_command` glob
deny-read enforcement
- `cargo check -p codex-sandboxing -p codex-linux-sandbox --tests`
- `cargo check -p codex-core --test all`
- `cargo clippy -p codex-linux-sandbox -p codex-sandboxing --tests`
- `just bazel-lock-check`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Switch the unknown-thread MCP resource read test from the stdio
subprocess to the in-process app-server path.
- Keep the assertion focused on the returned error message while
avoiding child-process teardown timing issues in nextest.
## Testing
- Not run (not requested)
## Summary
This is the minimal client-side follow-up for the Codex Auto Review
model slug rollout. It updates the guardian reviewer preferred model
from `gpt-5.4` to `codex-auto-review`, so the client can rely on the
backend catalog + Statsig mapping instead of hardcoding the GPT-5.4
slug.
Context:
https://openai.slack.com/archives/C0AF9328RL0/p1775777479388369?thread_ts=1775773094.071629&cid=C0AF9328RL0
## Testing
- `cargo fmt --package codex-core --check`
- `cargo test -p codex-core guardian::`
- `bazel test --experimental_remote_downloader= --test_output=errors
//codex-rs/core:core-unit-tests --test_arg=guardian`
This PR adds shared bottom-pane selection-list for future `/plugins`
menu work and wires the existing `/plugins` menu into the new
list-rendering path without changing it to tabs yet. The main
user-visible effect is that the current plugin list now renders as a
denser single-line list with shared name-column sizing, while the tabbed
selection support remains available for follow-up PRs but is currently
unused in production menus.
- Add generic tabbed selection-list support to the bottom pane,
including per-tab headers/items and tab-aware list state
- Add single-line row rendering with ellipsis truncation for dense list
UIs
- Add shared name-column width support so descriptions align
consistently across rows
- Wire the current /plugins menu to the new single-line and shared
column-width behavior only
- Keep tabbed menu adoption deferred; no existing menu is switched to
tabs in this PR
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- set the realtime v2 server VAD silence delay to 500ms
- update the default realtime 1.5 backend prompt to the v4 text
- keep the session payload and prompt rendering tests aligned with those
changes
## Why
- the VAD change gives the voice path a longer pause before ending the
user's turn
- the prompt change makes the default bundled realtime prompt match the
current v4 content
## Validation
- `cargo +1.93.0 test -p codex-core realtime_prompt --manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-api
realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item
--manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-app-server --test all
'suite::v2::realtime_conversation::realtime_webrtc_start_emits_sdp_notification'
--manifest-path /tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml
-- --exact`
# Why
We already emit analytics for completed hook runs, but we don't have
matching OTEL metrics to track hook volume and latency.
# What
- add `codex.hooks.run` and `codex.hooks.run.duration_ms`
- tag both metrics with `hook_name`, `source`, and `status`
- emit the metrics from the completed hook path
Verified locally against a dummy OTLP collector
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Stack PR3 for feature-gated agent identity support.
This PR adds per-thread agent task registration behind
`features.use_agent_identity`. Tasks are minted on the first real user
turn and cached in thread runtime state for later turns.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - this PR, original
task registration slice
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: https://github.com/openai/codex/pull/17980 - use `AgentAssertion`
downstream when enabled
## Validation
Covered as part of the local stack validation pass:
- `just fmt`
- `cargo test -p codex-core --lib agent_identity`
- `cargo test -p codex-core --lib agent_assertion`
- `cargo test -p codex-core --lib websocket_agent_task`
- `cargo test -p codex-api api_bridge`
- `cargo build -p codex-cli --bin codex`
## Notes
The full local app-server E2E path is still being debugged after PR
creation. The current branch stack is directionally ready for review
while that follow-up continues.
## Summary
- cap the Windows Bazel test lane at `--jobs=8` to reduce local runner
pressure
- keep Linux and macOS Bazel test concurrency unchanged
- make failed-test log tailing resolve `bazel-testlogs` with the same CI
config and Windows host-platform context as the failed invocation
- prefer Bazel-reported `test.log` paths and normalize Windows path
separators before tailing
## Context
The Windows Bazel workflow currently uses `ci-windows`, which does not
inherit the remote executor config. This means the lane runs the `//...`
test suite locally and otherwise falls back to the repo-wide `common
--jobs=30`. The new Windows-only override is intended to reduce local
executor pressure without changing coverage.
## Validation
Not run locally; this is a CI workflow change and the draft PR is
intended to exercise the GitHub Actions lane directly.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- remove the final whole-blob truncation pass from realtime
startup-context assembly
- enforce fixed per-section budgets, including each section heading
- keep the existing per-section caps and raise the overall realtime
startup-context budget to `5300`, matching the sum of those section
budgets
- add focused tests for the new wrapping and section-budget behavior
## Why
The previous flow truncated each section and then middle-truncated the
final combined startup-context blob again. Small input changes could
shift that combined cut point, which made retained context unstable and
caused nondeterministic tests.
## Impact
Startup context now preserves section boundaries and ordering
deterministically. Each section is still budgeted independently, but the
final assembled blob is no longer truncated again as a single opaque
string. To match that design, the overall startup-context token budget
is updated to the sum of the existing section budgets rather than
lowering the section caps.
## Validation
- `cargo +1.93.0 test -p codex-core realtime_context`
- `cargo +1.93.0 test -p codex-core --test all
suite::realtime_conversation::conversation_start_injects_startup_context_from_thread_history
-- --exact`
- `cargo +1.93.0 test -p codex-core --test all
suite::realtime_conversation::conversation_startup_context_current_thread_selects_many_turns_by_budget
-- --exact`
- `cargo +1.93.0 test -p codex-core --test all
suite::realtime_conversation::conversation_startup_context_falls_back_to_workspace_map
-- --exact`
- `cargo +1.93.0 test -p codex-core --test all
suite::realtime_conversation::conversation_startup_context_is_truncated_and_sent_once_per_start
-- --exact`
## Problem
When a user resumed or forked a session, the TUI could render the
restored thread history immediately, but it did not receive token usage
until a later model turn emitted a fresh usage event. That left the
context/status UI blank or stale during the exact window where the user
expects resumed state to look complete. Core already reconstructed token
usage from the rollout; the missing behavior was app-server lifecycle
replay to the client that just attached.
## Mental model
Token usage has two representations. The rollout is the durable source
of historical `TokenCount` events, and the core session cache is the
in-memory snapshot reconstructed from that rollout on resume or fork.
App-server v2 clients do not read core state directly; they learn about
usage through `thread/tokenUsage/updated`. The fix keeps those roles
separate: core exposes the restored `TokenUsageInfo`, and app-server
sends one targeted notification after a successful `thread/resume` or
`thread/fork` response when that restored snapshot exists.
This notification is not a new model event. It is a replay of
already-persisted state for the client that just attached. That
distinction matters because using the normal core event path here would
risk duplicating `TokenCount` entries in the rollout and making future
resumes count historical usage twice.
## Non-goals
This change does not add a new protocol method or payload shape. It
reuses the existing v2 `thread/tokenUsage/updated` notification and the
TUI’s existing handler for that notification.
This change does not alter how token usage is computed, accumulated,
compacted, or written during turns. It only exposes the token usage that
resume and fork reconstruction already restored.
This change does not broadcast historical usage replay to every
subscribed client. The replay is intentionally scoped to the connection
that requested resume or fork so already-attached clients are not
surprised by an old usage update while they may be rendering live
activity.
## Tradeoffs
Sending the usage notification after the JSON-RPC response preserves a
clear lifecycle order: the client first receives the thread object, then
receives restored usage for that thread. The tradeoff is that usage is
still a notification rather than part of the `thread/resume` or
`thread/fork` response. That keeps the protocol shape stable and avoids
duplicating usage fields across response types, but clients must
continue listening for notifications after receiving the response.
The helper selects the latest non-in-progress turn id for the replayed
usage notification. This is conservative because restored usage belongs
to completed persisted accounting, not to newly attached in-flight work.
The fallback to the last turn preserves a stable wire payload for
unusual histories, but histories with no meaningful completed turn still
have a weak attribution story.
## Architecture
Core already seeds `Session` token state from the last persisted rollout
`TokenCount` during `InitialHistory::Resumed` and
`InitialHistory::Forked`. The new core accessor exposes the complete
`TokenUsageInfo` through `CodexThread` without giving app-server direct
session mutation authority.
App-server calls that accessor from three lifecycle paths: cold
`thread/resume`, running-thread resume/rejoin, and `thread/fork`. In
each path, the server sends the normal response first, then calls a
shared helper that converts core usage into
`ThreadTokenUsageUpdatedNotification` and sends it only to the
requesting connection.
The tests build fake rollouts with a user turn plus a persisted token
usage event. They then exercise `thread/resume` and `thread/fork`
without starting another model turn, proving that restored usage arrives
before any next-turn token event could be produced.
## Observability
The primary debug path is the app-server JSON-RPC stream. After
`thread/resume` or `thread/fork`, a client should see the response
followed by `thread/tokenUsage/updated` when the source rollout includes
token usage. If the notification is absent, check whether the rollout
contains an `event_msg` payload of type `token_count`, whether core
reconstruction seeded `Session::token_usage_info`, and whether the
connection stayed attached long enough to receive the targeted
notification.
The notification is sent through the existing
`OutgoingMessageSender::send_server_notification_to_connections` path,
so existing app-server tracing around server notifications still
applies. Because this is a replay, not a model turn event, debugging
should start at the resume/fork handlers rather than the turn event
translation in `bespoke_event_handling`.
## Tests
The focused regression coverage is `cargo test -p codex-app-server
emits_restored_token_usage`, which covers both resume and fork. The core
reconstruction guard is `cargo test -p codex-core
record_initial_history_seeds_token_info_from_rollout`.
Formatting and lint/fix passes were run with `just fmt`, `just fix -p
codex-core`, and `just fix -p codex-app-server`. Full crate test runs
surfaced pre-existing unrelated failures in command execution and plugin
marketplace tests; the new token usage tests passed in focused runs and
within the app-server suite before the unrelated command execution
failure.
I believe this use of `expect()` was introduced in
https://github.com/openai/codex/pull/17826, but was not flagged by CI.
Though I did see it in the diagnostics panel in VS Code, so it's worth
cleaning up.
I guess our current CI does include `examples/` when running Clippy?
# Why
Add product analytics for hook handler executions so we can understand
which hooks are running, where they came from, and whether they
completed, failed, stopped, or blocked work.
# What
- add the new `codex_hook_run` analytics event and payload plumbing in
`codex-rs/analytics`
- emit hook-run analytics from the shared hook completion path in
`codex-rs/core`
- classify hook source from the loaded hook path as `system`, `user`,
`project`, or `unknown`
```
{
"event_type": "codex_hook_run",
"event_params": {
"thread_id": "string",
"turn_id": "string",
"model_slug": "string",
"hook_name": "string, // any HookEventName
"hook_source": "system | user | project | unknown",
"status": "completed | failed | stopped | blocked"
}
}
```
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- replace the unsubscribe-during-turn test's sleep/polling flow with a
gated streaming SSE response
- add request-count notification support to the streaming SSE test
server so the test can wait for the in-flight Responses request
deterministically
## Scope
- codex-rs/app-server/tests/suite/v2/thread_unsubscribe.rs
- codex-rs/core/tests/common/streaming_sse.rs
## Validation
- Not run locally; this is a narrow extraction from the prior CI-green
branch.
---------
Co-authored-by: Codex <noreply@openai.com>
This was flagged by the Codex Security tool: the `state` lock was held
longer than necessary, which included being held across an `async` call,
increasing the potential for deadlock.
While this was flagged by the Codex Security tool, I will look into
enabling
https://rust-lang.github.io/rust-clippy/stable/index.html#await_holding_lock
in a follow-up PR (though unfortunately, that Clippy rule claims it
reports false positives when `drop()` is used to drop a guard instead of
using the end of block scope to drop). Though I can't seem to find a
Clippy rule that checks for opportunities to drop a guard as soon as it
is no longer referenced, in general.
## Summary
- Add best-effort auto-upgrade for user-configured Git marketplaces
recorded in `config.toml`.
- Track the last activated Git revision with `last_revision` so
unchanged marketplace sources skip clone work.
- Trigger the upgrade from plugin startup and `plugin/list`, while
preserving existing fail-open plugin behavior with warning logs rather
than new user-visible errors.
## Details
- Remote configured marketplaces use `git ls-remote` to compare the
source/ref against the recorded revision.
- Upgrades clone into a staging directory, validate that
`.agents/plugins/marketplace.json` exists and that the manifest name
matches the configured marketplace key, then atomically activate the new
root.
- Local `.agents/plugins/marketplace.json` marketplaces remain live
filesystem state and are not auto-pulled.
- Existing non-curated plugin cache refresh is kicked after successful
marketplace root upgrades.
## Validation
- `just write-config-schema`
- `cargo test -p codex-core marketplace_upgrade`
- `cargo check -p codex-cli -p codex-app-server`
- `just fix -p codex-core`
Did not run the complete `cargo test` suite because the repo
instructions require asking before a full core workspace run.
Addresses #17951
Problem: The TUI treated skills/list failures as fatal during refresh,
so proxy/firewall responses that break plugin discovery could crash the
session.
Solution: Route startup and refresh skills/list responses through shared
graceful handling that logs a warning and keeps the TUI running.
## Summary
- Add an explicit stdin mode to process/start.
- Keep normal non-interactive exec stdin closed while allowing
pipe-backed processes.
## Stack
```text
o #18027 [8/8] Fail exec client operations after disconnect
│
o #18025 [7/8] Cover MCP stdio tests with executor placement
│
o #18089 [6/8] Wire remote MCP stdio through executor
│
o #18088 [5/8] Add executor process transport for MCP stdio
│
o #18087 [4/8] Abstract MCP stdio server launching
│
o #18020 [3/8] Add pushed exec process events
│
@ #18086 [2/8] Support piped stdin in exec process API
│
o #18085 [1/8] Add MCP server environment config
│
o main
```
Co-authored-by: Codex <noreply@openai.com>
- Add a "remote" thread store implementation
- Implement the remote thread store as a thin wrapper that makes grpc
calls to a configurable service endpoint
- Implement only the thread/list method to start
- Encode the grpc method/param shape as protobufs in the remote
implementation
A wart: the proto generation script is an "example" binary target. This
is an example target only because Cargo lets examples use
dev-dependencies, which keeps tonic-prost-build out of the normal
codex-thread-store dependency surface. A regular bin would either need
to add proto generation deps as normal runtime deps, or use a
feature-gated optional dep, which this repo’s manifest checks explicitly
reject.
## Summary
This makes `DangerFullAccess` / yolo tool execution fully opt out of
managed-network enforcement.
Previously, yolo turns could have `turn.network` stripped while tool
orchestration still derived `enforce_managed_network=true` from
`requirements.toml.network`. That created an inconsistent state where
the turn had no managed proxy attached, but tool execution still behaved
like managed networking was active.
This updates the tool orchestration and JS REPL paths to treat managed
networking as active only when the current turn actually has
`turn.network`.
## Behavior
- Yolo / `DangerFullAccess`: no managed proxy, no managed-network
enforcement.
- Guardian / workspace-write with managed proxy: managed-network
enforcement still applies.
- Avoids the half-state where yolo has no proxy but still gets
managed-network sandbox behavior.
## Tests
- `just fmt`
- `cargo test -p codex-core
danger_full_access_tool_attempts_do_not_enforce_managed_network --
--nocapture`
- `cargo test -p codex-core danger_full_access -- --nocapture`
- `just fix -p codex-core`
Co-authored-by: jgershen-oai <jgershen@openai.com>
## Summary
- Promote `image_generation` from under-development to stable
- Enable image generation by default in the feature registry
- Update feature coverage for the new launch-state expectation
- Add the missing image-generation auth fixture field in a tool registry
test
## Testing
- `just fmt`
- `cargo test -p codex-features`
- `cargo test -p codex-tools` currently fails:
`test_full_toolset_specs_for_gpt5_codex_unified_exec_web_search` needs
its expected default tool list updated for `image_generation`
Addresses #18011
Problem: #16987 allowed zero-token TUI exits to print resume hints,
which exposed precomputed thread ids before their rollout files were
persisted; #17222 made the same invalid hint visible when switching
sessions via `/resume`.
Solution: Only include resume commands for TUI sessions backed by a
materialized non-empty rollout, and cover both missing-rollout and
persisted-rollout summary behavior.
Testing: Manually verified by pressing Ctrl+D before the first prompt
and confirming that no "to continue this session" message was generated.
Addresses #12178
Problem: The TUI /rename prompt opened blank even when the current
thread already had a custom name, making small edits awkward.
Solution: Let custom prompts receive initial text and prefill /rename
with the existing thread name while preserving the empty prompt for
unnamed threads.
Testing: Manually verified that the feature works by using `/rename`
with unnamed and already-named threads.
## Summary
- hide deferred MCP/app nested tool descriptions from the `exec` prompt
in code mode
- add short guidance that omitted nested tools are still available
through `ALL_TOOLS`
- cover the code_mode_only path with an integration test that discovers
and calls a deferred app tool
## Motivation
`code_mode_only` exposes only top-level `exec`/`wait`, but the `exec`
description could still include a large nested-tool reference. This
keeps deferred nested tools callable while avoiding that prompt bloat.
## Tests
- `just fmt`
- `just fix -p codex-code-mode`
- `just fix -p codex-tools`
- `cargo test -p codex-code-mode
exec_description_mentions_deferred_nested_tools_when_available`
- `cargo test -p codex-tools
create_code_mode_tool_matches_expected_spec`
- `cargo test -p codex-core
code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`
Addresses #18045
Problem: `/statusline` exposed both `context-remaining` and
`context-remaining-percent` after conflicting PRs attempted to address
the same context-status issue, including #17637, allowing duplicate
footer segments.
Solution: Remove the duplicate `context-remaining-percent` status-line
item and update status-line tests and snapshots to use only canonical
`context-remaining`.
## Summary
- Add an MCP server environment setting with local as the default.
- Thread the default through config serialization, schema generation,
and existing config fixtures.
## Stack
```text
o #18027 [8/8] Fail exec client operations after disconnect
│
o #18025 [7/8] Cover MCP stdio tests with executor placement
│
o #18089 [6/8] Wire remote MCP stdio through executor
│
o #18088 [5/8] Add executor process transport for MCP stdio
│
o #18087 [4/8] Abstract MCP stdio server launching
│
o #18020 [3/8] Add pushed exec process events
│
o #18086 [2/8] Support piped stdin in exec process API
│
@ #18085 [1/8] Add MCP server environment config
│
o main
```
Co-authored-by: Codex <noreply@openai.com>
Fix app-server startup when `remote_control = true` is enabled without
ChatGPT auth.
Remote control now starts in a degraded/retrying state instead of
failing app-server initialization, so Desktop is not stranded before the
initial initialize handshake.
Get rid of an `expect()` that caused a `panic` in the TUI
<img width="1320" height="415" alt="Screenshot 2026-04-16 at 15 30 20"
src="https://github.com/user-attachments/assets/588aaf6f-b009-4b58-8daf-56c3a9d6fe3b"
/>
Basically in `from_absolute_path` there is a `absolutize::absolutize`
that calls a `current_dir()` . But the dir in which Codex was running
got re-generated (because of Codex I guess but I can't exactly see the
source). So `current_dir()` returns an `ENOENT` and 💥
Make sure Bazel logs shows every errors so that we can debug flakes +
fix a small flake on Windows by updating the sleep command to a
`Start-Sleep` instead of a PowerShell nested command (otherwise we had
double nesting which is absurdely slow)
Stabilizes the Responses API proxy header test by splitting the coverage
at the right boundary:
- Core integration test now verifies parent/subagent identity headers
directly from captured `/responses` requests.
- Proxy dump unit test now verifies those identity headers are preserved
in dumped request JSON.
- Removes the flaky real proxy process + temp-file dump polling path
from the core test.
## Summary
- parse chatgpt_account_is_fedramp from signed ChatGPT auth metadata
- add _account_is_fedramp=true to ChatGPT backend-api requests only for
FedRAMP ChatGPT-auth accounts
Addresses https://github.com/openai/codex/issues/17143
Problem: TUI interrupts without an active turn stopped cancelling slow
MCP startup after routing through the app-server APIs.
Solution: Route no-active-turn interrupts through app-server as startup
cancels, acknowledge them immediately, and emit cancelled MCP startup
updates.
Testing: I manually confirmed that MCP cancellation didn't work prior to
this PR and works after the fix was in place.
Split plugin loading, marketplace, and related infrastructure out of
core into codex-core-plugins, while keeping the core-facing
configuration and orchestration flow in codex-core.
---------
Co-authored-by: Codex <noreply@openai.com>
- [x] Add resource uri meta to tool call item so that the app-server
client can start prefetching resources immediately without loading mcp
server status.
## Summary
- Promote `Feature::ToolSearch` to `Stable` and enable it in the default
feature set
- Update feature tests and tool registry coverage to match the new
default
- Adjust the search-tool integration test to assert the default-on path
and explicit disable fallback
## Testing
- `just fmt`
- `cargo test -p codex-features`
- `cargo test -p codex-core --test all search_tool`
- `cargo test -p codex-tools`
- When launching the TUI client, if YOLO mode is enabled, display this
in the header.
- Eligibility is determined by `approval_policy = "never"` and
`sandbox_mode = "danger-full-access"`
<img width="886" height="230" alt="image"
src="https://github.com/user-attachments/assets/d7064778-e32c-4123-8e44-ca0c9016ab09"
/>
## Motivation
Codex needs a repeatable workflow for updating PR metadata after a pull
request already exists. This is more specific than generic GitHub
handling: the assistant needs to preserve author-provided body content,
explain why the PR exists before listing implementation details, and
describe only the net change under review, including when Sapling stacks
put a PR on top of another PR instead of `main`.
## Changes
- Adds `.codex/skills/codex-pr-body/SKILL.md`.
- Documents how to infer the target PR from the current branch or
commit, including Sapling-specific PR metadata and `sl sl` output.
- Defines the expected PR body update behavior: inspect the existing
body, preserve key content such as images, avoid local absolute paths,
use Markdown formatting, include relevant issue/PR references, and call
out developer docs follow-up only when applicable.
- Captures stacked-PR handling so generated PR text describes the change
between the PR's base and head, rather than unrelated ancestor changes.
## Verification
Not run; this is a Codex skill documentation addition.
## Summary
- Make command/exec output-delta tests accumulate streamed chunks
instead of assuming complete logical output in a single notification.
- Collect stdout and stderr independently so stream interleaving does
not fail the pipe streaming test.
## Why
The command/exec protocol exposes output as deltas, so tests should not
rely on chunk boundaries being stable. A line like `out-start\n` may
arrive split across multiple notifications, and stdout/stderr
notifications may interleave.
## Validation
- `just fmt`
- `git diff --check`
- `cargo test -p codex-app-server suite::v2::command_exec`
**Summary**
- prevent managed requirements.toml network settings from leaking into
DangerFullAccess / yolo turns by gating managed proxy attachment on
sandbox mode
- keep guardian/sandboxed modes on the managed proxy path, while making
true yolo bypass the proxy entirely, including /shell full-access
commands
## Summary
- wrap realtime startup context in
`<startup_context>...</startup_context>` tags
- prefix V2 mirrored user text and relayed backend text with `[USER]` /
`[BACKEND]`
- remove the V2 progress suffix and replace the final V2 handoff output
with a short completion acknowledgement while preserving the existing V1
wrapper
## Testing
- cargo test -p codex-api
realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item
-- --exact
- cargo test -p codex-app-server webrtc_v2_background_agent_
- cargo test -p codex-app-server webrtc_v2_text_input_is_
- cargo test -p codex-core conversation_user_text_turn_is_
## Summary
This PR significantly improves the standalone installer experience.
The main changes are:
1. We now install the codex binary and other dependencies in a
subdirectory under CODEX_HOME.
(`CODEX_HOME/packages/standalone/releases/...`)
2. We replace the `codex.js` launcher that npm/bun rely on with logic in
the Rust binary that automatically resolves its dependencies (like
ripgrep)
## Motivation
A few design constraints pushed this work.
1. Currently, the entrypoint to codex is through `codex.js`, which
forces a node dependency to kick off our rust app. We want to move away
from this so that the entrypoint to codex does not rely on node or
external package managers.
2. Right now, the native script adds codex and its dependencies directly
to user PATH. Given that codex is likely to add more binary dependencies
than ripgrep, we want a solution which does not add arbitrary binaries
to user PATH -- the only one we want to add is the `codex` command
itself.
3. We want upgrades to be atomic. We do not want scenarios where
interrupting an upgrade command can move codex into undefined state (for
example, having a new codex binary but an old ripgrep binary). This was
~possible with the old script.
4. Currently, the Rust binary uses heuristics to determine which
installer created it. These heuristics are flaky and are tied to the
`codex.js` launcher. We need a more stable/deterministic way to
determine how the binary was installed for standalone.
5. We do not want conflicting codex installations on PATH. For example,
the user installing via npm, then installing via brew, then installing
via standalone would make it unclear which version of codex is being
launched and make it tough for us to determine the right upgrade
command.
## Design
### Standalone package layout
Standalone installs now live under `CODEX_HOME/packages/standalone`:
```text
$CODEX_HOME/
packages/
standalone/
current -> releases/0.111.0-x86_64-unknown-linux-musl
releases/
0.111.0-x86_64-unknown-linux-musl/
codex
codex-resources/
rg
```
where `standalone/current` is a symlink to a release directory.
On Windows, the release directory has the same shape, with `.exe` names
and Windows helpers in `codex-resources`:
```text
%CODEX_HOME%\
packages\
standalone\
current -> releases\0.111.0-x86_64-pc-windows-msvc
releases\
0.111.0-x86_64-pc-windows-msvc\
codex.exe
codex-resources\
rg.exe
codex-command-runner.exe
codex-windows-sandbox-setup.exe
```
This gives us:
- atomic upgrades because we can fully stage a release before switching
`standalone/current`
- a stable way for the binary to recognize a standalone install from its
canonical `current_exe()` path under CODEX_HOME
- a clean place for binary dependencies like `rg`, Windows sandbox
helpers, and, in the future, our custom `zsh` etc
### Command location
On Unix, we add a symlink at `~/.local/bin/codex` which points directly
to the `$CODEX_HOME/packages/standalone/current/codex` binary. This
becomes the main entrypoint for the CLI.
On Windows, we store the link at
`%LOCALAPPDATA%\Programs\OpenAI\Codex\bin`.
### PATH persistence
This is a tricky part of the PR, as there's no ~super reliable way to
ensure that we end up on PATH without significant tradeoffs.
Most Unix variants will have `~/.local/bin` on PATH already, which means
we *should* be fine simply registering the command there in most cases.
However, there are cases where this is not the case. In these cases, we
directly edit the profile depending on the shell we're in.
- macOS zsh: `~/.zprofile`
- macOS bash: `~/.bash_profile`
- Linux zsh: `~/.zshrc`
- Linux bash: `~/.bashrc`
- fallback: `~/.profile`
On Windows, we update the User `Path` environment variable directly and
we don't need to worry about shell profiles.
### Standalone runtime detection
This PR adds a new shared crate, `codex-install-context`, which computes
install ownership once per process and caches it in a `OnceLock`.
That context includes:
- install manager (`Standalone`, `Npm`, `Bun`, `Brew`, `Other`)
- the managed standalone release directory, when applicable
- the managed standalone `codex-resources` directory, when present
- the resolved `rg_command`
The standalone path is detected by canonicalizing `current_exe()`,
canonicalizing CODEX_HOME via `find_codex_home()`, and checking whether
the binary is running from under
`$CODEX_HOME/packages/standalone/releases`.
We intentionally do not use a release metadata file. The binary path is
the source of truth.
### Dependency resolution
For standalone installs, `grep_files` now resolves bundled `rg` from
`codex-resources` next to the Codex binary.
For npm/bun/brew/other installs, `grep_files` falls back to resolving
`rg` from PATH.
For Windows standalone installs, Windows sandbox helpers are still found
as direct siblings when present. If they are not direct siblings, the
lookup also checks the sibling `codex-resources` directory.
### TUI update path
The TUI now has `UpdateAction::StandaloneUnix` and
`UpdateAction::StandaloneWindows`, which rerun the standalone install
commands.
Unix update command:
```sh
sh -c "curl -fsSL https://chatgpt.com/codex/install.sh | sh"
```
Windows update command:
```powershell
powershell -c "irm https://chatgpt.com/codex/install.ps1|iex"
```
The Windows updater runs PowerShell directly. We do this because `cmd
/C` would parse the `|iex` as a cmd pipeline instead of passing it to
PowerShell.
## Additional installer behavior
- standalone installs now warn about conflicting npm/bun/brew-managed
`codex` installs and offer to uninstall them
- same-version reruns do not redownload the release if it is already
staged locally
## Testing
Installer smoke tests run:
- macOS: fresh install into isolated `HOME` and `CODEX_HOME` with
`scripts/install/install.sh --release latest`
- macOS: reran the installer against the same isolated install to verify
the same-version/update path and PATH block idempotence
- macOS: verified the installed `codex --version` and bundled
`codex-resources/rg --version`
- Windows: parsed `scripts/install/install.ps1` with PowerShell via
`[scriptblock]::Create(...)`
- Windows: verified the standalone update action builds a direct
PowerShell command and does not route the `irm ...|iex` command through
`cmd /C`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- honor `_meta["codex/imageDetail"] == "original"` on MCP image content
and map it to `detail: "original"` where supported
- strip that detail back out when the active model does not support
original-detail image inputs
- update code-mode `image(...)` to accept individual MCP image blocks
- teach `js_repl` / `codex.emitImage(...)` to preserve the same hint
from raw MCP image outputs
- document the new `_meta` contract and add generic RMCP-backed coverage
across protocol, core, code-mode, and js_repl paths
## Summary
1. Revert https://github.com/openai/codex/pull/17848 so the Bazel and
`BUILD` file changes leave `main`.
2. Prepare for a narrower follow up that restores only `SECURITY.md`.
## Validation
1. Reviewed the revert diff against `main`.
2. Ran a clean diff check before push.
- Discover marketplace manifests from different supported layout paths
instead of only .agents/plugins/marketplace.json.
- Accept local plugin sources written either as { source: "local", path:
... } or as a direct string path.
- Skip unsupported or invalid plugin source entries without failing the
entire marketplace, and keep valid local plugins loadable.
Dismiss stale TUI app-server approvals after remote resolution
When an approval, user-input prompt, or elicitation request is resolved
by another client, the TUI now dismisses the matching local UI instead
of leaving stale prompts behind and emitting a misleading local
cancellation.
This change teaches pending app-server request tracking to map
`serverRequest/resolved` notifications back to the concrete request type
and stable request key, then propagates that resolved request into TUI
prompt state. Approval, request-user-input, and MCP elicitation overlays
now drop the resolved current or queued request quietly, advance to the
next queued request when present, and avoid emitting abort/cancel events
for stale UI.
The latest update also retires matching prompts while they are still
deferred behind active streaming and suppresses buffered active-thread
requests whose app-server request id has already been resolved before
drain. `ChatWidget` removes a resolved request from both the deferred
interrupt queue and the materialized bottom-pane stack, while
active-thread request handling verifies the app-server request is still
pending before showing a prompt. Lifecycle events such as exec begin/end
remain queued so approved work can still render normally.
Tests cover resolved-request mapping, overlay dismissal behavior,
deferred prompt pruning for same-turn user input, exec approval IDs,
lifecycle-event retention, and the buffered active-thread ordering
regression.
Validation:
- `just fmt`
- `git diff --check`
- `cargo test -p codex-tui
resolved_buffered_approval_does_not_become_actionable_after_drain`
- `cargo test -p codex-tui
enqueue_primary_thread_session_replays_buffered_approval_after_attach`
- `cargo test -p codex-tui chatwidget::interrupts`
- `just fix -p codex-tui`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Re-enable remote variants for the exec-server filesystem
sandbox/symlink tests that were made local-only in PR #17671.
- Restore `use_remote` parameterization for the readable-root,
normalized symlink escape, symlink removal, and symlink
copy-preservation cases.
- Preserve `mode={use_remote}` context on key async filesystem failures
so CI failures point at the local or remote lane.
## Validation
- `cd codex-rs && just fmt`
- Not run: `bazel test
//codex-rs/exec-server:exec-server-file_system-test` per local Codex
development guidance to avoid test runs unless explicitly requested.
Co-authored-by: Codex <noreply@openai.com>
# Summary
- implement local ThreadStore archive/unarchive operations
- implement local ThreadStore read_thread operation
- break up the various ThreadStore local method implementations into
separate files
- migrate app-server archive/unarchive and core archive fixture to use
ThreadStore (but not all read operations yet!)
- use the ThreadStore's read operation as a proxy check for thread
persistence/existence in the app server code
- move all other filesystem operations related to archive (path
validation etc) into the local thread store.
# Tests
- add dedicated local store archive/unarchive tests
## Summary
1. Add a Security Boundaries section to `SECURITY.md`.
2. Point readers to the Codex Agent approvals and security documentation
for sandboxing, approvals, and network controls.
## Validation
1. Reviewed the `SECURITY.md` diff in a clean worktree.
2. No tests run. Docs only change.
Azure Responses providers were still falling back to local compaction
because the compaction gate only checked
`ModelProviderInfo::is_openai()`.
Move the capability check onto `ModelProviderInfo` with
`supports_remote_compaction()`, backed by the existing Azure Responses
endpoint detection used in `codex-api`, and have `core::compact`
delegate to that helper.
Add regression coverage for:
- OpenAI providers using remote compaction
- Azure providers using remote compaction
- non-OpenAI/non-Azure providers staying on the local path
resolves#17773
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
## Why
#17763 moved sandbox-state delivery for MCP tool calls to request
`_meta` via the `codex/sandbox-state-meta` experimental capability.
Keeping the older `codex/sandbox-state` capability meant Codex still
maintained a second transport that pushed updates with the custom
`codex/sandbox-state/update` request at server startup and when the
session sandbox policy changed.
That duplicate MCP path is redundant with the per-tool-call metadata
path and makes the sandbox-state contract larger than needed. The
existing managed network proxy refresh on sandbox-policy changes is
still needed, so this keeps that behavior separate from the removed MCP
notification.
## What Changed
- Removed the exported `MCP_SANDBOX_STATE_CAPABILITY` and
`MCP_SANDBOX_STATE_METHOD` constants.
- Removed detection of `codex/sandbox-state` during MCP initialization
and stopped sending `codex/sandbox-state/update` at server startup.
- Removed the `McpConnectionManager::notify_sandbox_state_change`
plumbing while preserving the managed network proxy refresh when a user
turn changes sandbox policy.
- Slimmed `McpConnectionManager::new` so startup paths pass only the
initial `SandboxPolicy` needed for MCP elicitation state.
- Kept `codex/sandbox-state-meta` support intact; servers that opt in
still receive the current `SandboxState` on tool-call request `_meta`
([remaining call
path](ff2d3c1e72/codex-rs/core/src/mcp_tool_call.rs (L487-L526))).
- Added regression coverage for refreshing the live managed network
proxy on a per-turn sandbox-policy change.
## Verification
- `cargo test -p codex-core
new_turn_refreshes_managed_network_proxy_for_sandbox_change`
- `cargo test -p codex-mcp`
## Summary
- Track outbound remote-control sequence IDs independently for each
client stream.
- Retain unacked outbound messages per stream using FIFO buffers.
- Require stream-scoped acks and update tests for contiguous per-stream
sequencing.
## Why
The remote-control peer uses outbound sequence gaps to detect lost
messages and re-initialize. A single global outbound sequence counter
can create apparent gaps on an individual stream when another stream
receives an interleaved message.
## Validation
- `just fmt`
- `cargo test -p codex-app-server remote_control`
- `just fix -p codex-app-server`
- `git diff --check`
## Summary
- Move auth header construction into the
`AuthProvider::add_auth_headers` contract.
- Inline `CoreAuthProvider` header mutation in its provider impl and
remove the shared header-map helper.
- Update HTTP, websocket, file upload, sideband websocket, and test auth
callsites to use the provider method.
- Add direct coverage for `CoreAuthProvider` auth header mutation.
## Testing
- `just fmt`
- `cargo test -p codex-api`
- `cargo test -p codex-core
client::tests::auth_request_telemetry_context_tracks_attached_auth_and_retry_phase`
- `cargo test -p codex-core` failed on unrelated/reproducible
`tools::handlers::multi_agents::tests::multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message`
---------
Co-authored-by: Celia Chen <celia@openai.com>
## Summary
- Keep the existing local-build test announcement as the first
announcement entry
- Add the CLI update reminder for versions below `0.120.0`
- Remove expired onboarding and gpt-5.3-codex announcement entries
<img width="1576" height="276" alt="Screenshot 2026-04-15 at 1 32 53 PM"
src="https://github.com/user-attachments/assets/10b55d0b-09cd-4de0-ab51-4293d811b80c"
/>
Builds on top of #17659
Move the filesystem + sqlite thread listing-related operations inside of
a local ThreadStore implementation and call ThreadStore from the places
that used to perform these filesystem/sqlite operations.
This is the first of a series of PRs that will implement the rest of the
local ThreadStore.
Testing:
- added unit tests for the thread store implementation
- adjusted some unit tests in the realtime + personality packages whose
callsites changed. Specifically I'm trying to hide ThreadMetadata inside
of the local implementation and make ThreadMetadata a sqlite
implementation detail concern rather than a public interface, preferring
the more generate StoredThread interface instead
- added a corner case test for the personality migration package that
wasn't covered by the existing test suite
- adjust the behavior of searched thread listing to run the existing
local rollout repair/backfill pass _before_ querying SQLite results, so
callers using ThreadStore::list_threads do not miss matches after a
partial metadata warm-up
## Summary
- Ensure direct namespaced MCP tool groups are emitted with a non-empty
namespace description even when namespace metadata is missing or blank.
- Add regression coverage for missing MCP namespace descriptions.
## Cause
Latest `main` can serialize a direct namespaced MCP tool group with an
empty top-level `description`. The namespace description path used
`unwrap_or_default()` when `tool_namespaces` did not include metadata
for that namespace, so the outbound Responses API payload could contain
a tool like `{"type":"namespace","description":""}`. The Responses API
rejects that because namespace tool descriptions must be a non-empty
string.
## Fix
- Add a fallback namespace description: `Tools in the <namespace>
namespace.`
- Preserve provided namespace descriptions after trimming, but treat
blank descriptions as missing.
### Issue I am seeing
This is what I am seeing on the local build.
<img width="1593" height="488" alt="Screenshot 2026-04-15 at 10 55 55
AM"
src="https://github.com/user-attachments/assets/bab668ba-bf17-4c71-be4e-b102202fce57"
/>
---------
Co-authored-by: Sayan Sisodiya <sayan@openai.com>
## Why
While reviewing https://github.com/openai/codex/pull/17958, the helper
name `is_azure_responses_wire_base_url` looked misleading because the
helper returns true for either the `azure` provider name or an Azure
Responses `base_url`. The new name makes both inputs part of the
contract.
## What
- Rename `is_azure_responses_wire_base_url` to
`is_azure_responses_provider`.
- Move the `openai.azure.` marker into
`matches_azure_responses_base_url` so all base URL marker matching is
centralized.
- Keep `Provider::is_azure_responses_endpoint()` behavior unchanged.
## Verification
- Compared the parent and current implementations.
`name.eq_ignore_ascii_case("azure")` still returns true before
consulting `base_url`, `None` still returns false, base URLs are still
lowercased before marker matching, and the same Azure marker set is
checked.
- Ran `cargo test -p codex-api`.
## Summary
- Skip directory entries whose metadata lookup fails during
`fs/readDirectory`
- Add an exec-server regression test covering a broken symlink beside
valid entries
## Testing
- `just fmt`
- `cargo test -p codex-exec-server` (started, but dependency/network
updates stalled before completion in this environment)
## Summary
Stack PR 2 of 4 for feature-gated agent identity support.
This PR adds agent identity registration behind
`features.use_agent_identity`. It keeps the app-server protocol
unchanged and starts registration after ChatGPT auth exists rather than
requiring a client restart.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - this PR
- PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
when enabled
- PR4: https://github.com/openai/codex/pull/17388 - use `AgentAssertion`
downstream when enabled
## Validation
Covered as part of the local stack validation pass:
- `just fmt`
- `cargo test -p codex-core --lib agent_identity`
- `cargo test -p codex-core --lib agent_assertion`
- `cargo test -p codex-core --lib websocket_agent_task`
- `cargo test -p codex-api api_bridge`
- `cargo build -p codex-cli --bin codex`
## Notes
The full local app-server E2E path is still being debugged after PR
creation. The current branch stack is directionally ready for review
while that follow-up continues.
## Summary
- Remove the exec-server-side manual filesystem request path preflight
before invoking the sandbox helper.
- Keep sandbox helper policy construction and platform sandbox
enforcement as the access boundary.
- Add a portable local+remote regression for writing through an
explicitly configured alias root.
- Remove the metadata symlink-escape assertion that depended on the
deleted manual preflight; no replacement metadata-specific access probe
is added.
## Tests
- `cargo test -p codex-exec-server --lib`
- `cargo test -p codex-exec-server --test file_system`
- `git diff --check`
stacked on #17402.
MCP tools returned by `tool_search` (deferred tools) get registered in
our `ToolRegistry` with a different format than directly available
tools. this leads to two different ways of accessing MCP tools from our
tool catalog, only one of which works for each. fix this by registering
all MCP tools with the namespace format, since this info is already
available.
also, direct MCP tools are registered to responsesapi without a
namespace, while deferred MCP tools have a namespace. this means we can
receive MCP `FunctionCall`s in both formats from namespaces. fix this by
always registering MCP tools with namespace, regardless of deferral
status.
make code mode track `ToolName` provenance of tools so it can map the
literal JS function name string to the correct `ToolName` for
invocation, rather than supporting both in core.
this lets us unify to a single canonical `ToolName` representation for
each MCP tool and force everywhere to use that one, without supporting
fallbacks.
## Summary
- Fix marketplace-add local path detection on Windows by using
`Path::is_absolute()`.
- Make marketplace-add local-source tests parse/write TOML through the
same helpers instead of raw string matching.
- Update `rand` 0.9.x to 0.9.3 and document the remaining audited `rand`
0.8.5 advisory exception.
- Refresh `MODULE.bazel.lock` after the Cargo.lock update.
## Why
Latest `main` had two independent CI blockers: marketplace-add tests
were not portable to Windows path/TOML escaping, and cargo-deny still
reported `RUSTSEC-2026-0097` after the recent rustls-webpki fix.
## Validation
- `cargo test -p codex-core marketplace_add -- --nocapture`
- `cargo deny --all-features check`
- `just bazel-lock-check`
- `just fix -p codex-core`
- `just fmt`
- `git diff --check`
## Changes
Allows sandboxes to restrict overall network access while granting
access to specific unix sockets on mac.
## Details
- `codex sandbox macos`: adds a repeatable `--allow-unix-socket` option.
- `codex-sandboxing`: threads explicit Unix socket roots into the macOS
Seatbelt profile generation.
- Preserves restricted network behavior when only Unix socket IPC is
requested, and preserves full network behavior when full network is
already enabled.
## Verification
- `cargo test -p codex-cli -p codex-sandboxing`
- `cargo build -p codex-cli --bin codex`
- verified that `codex sandbox macos --allow-unix-socket /tmp/test.sock
-- test-client` grants access as expected
## Changes
Allows MCPs to opt in to receiving sandbox config info through `_meta`
on model-initiated tool calls. This lets MCPs adhere to the thread's
sandbox if they choose to.
## Details
- Adds the `codex/sandbox-state-meta` experimental MCP capability.
- Tracks whether each MCP server advertises that capability.
- When a server opts in, `codex-core` injects the current `SandboxState`
into model-initiated MCP tool-call request `_meta`.
## Verification
- added an integration test for the capability
`exec()` had a number of arguments that were unused, making the function
signature misleading. This PR aims to clean things up to clarify the
role of this function and to clarify which fields of `ExecParams` are
unused and why.
## Why
`spawn_command_under_seatbelt()` in `codex-rs/core/src/seatbelt.rs` had
fallen out of production use and was only referenced by test-only
wrappers. That left us with sandbox tests that could stay green even if
the actual seatbelt exec path regressed, because production shell
execution now flows through `SandboxManager::transform()` and
`ExecRequest::from_sandbox_exec_request()` instead of that helper.
Removing the dead helper also exposed one downstream `codex-exec`
integration test that still imported it, which broke `just clippy`.
## What Changed
- Removed `codex-rs/core/src/seatbelt.rs` and stopped exporting
`codex_core::seatbelt`.
- Removed the redundant `codex-rs/core/tests/suite/seatbelt.rs` coverage
that only exercised the dead helper.
- Kept the `openpty` regression check, but moved it into
`codex-rs/core/tests/suite/exec.rs` so it now runs through
`process_exec_tool_call()`.
- Fixed the seatbelt denial test in `codex-rs/core/tests/suite/exec.rs`
to use `/usr/bin/touch`, so it actually exercises the sandbox instead of
a nonexistent path.
- Updated `codex-rs/exec/tests/suite/sandbox.rs` on macOS to build the
sandboxed command through `build_exec_request()` and spawn the
transformed command, instead of importing the removed helper.
- Left the lower-level seatbelt policy coverage in
`codex-rs/sandboxing/src/seatbelt_tests.rs`, where the policy generator
is still covered directly.
## Verification
- `cargo test -p codex-core suite::exec::`
- `cargo test -p codex-exec`
- `cargo clippy -p codex-exec --tests -- -D warnings`
## Summary
- reuse a shared remote exec-server for remote-aware codex-core
integration tests within a test binary process
- keep per-test remote cwd creation and cleanup so tests retain
workspace isolation
- leave codex_self_exe, codex_linux_sandbox_exe, cwd_path(), and
workspace_path() behavior unchanged
## Validation
- rustfmt codex-rs/core/tests/common/test_codex.rs
- git diff --check
- CI is running on the updated branch
Fix clippy warnings in external agent config migration
```
error: this expression creates a reference which is immediately dereferenced by the compiler
--> core/src/external_agent_config.rs:188:55
|
188 | let migrated = build_config_from_external(&settings)?;
| ^^^^^^^^^ help: change this to: `settings`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.93.0/index.html#needless_borrow
= note: requested on the command line with `-D clippy::needless-borrow`
error: useless conversion to the same type: `codex_utils_absolute_path::AbsolutePathBuf`
--> core/src/external_agent_config.rs:355:27
|
355 | match AbsolutePathBuf::try_from(
| ___________________________^
356 | | add_marketplace_outcome
357 | | .installed_root
358 | | .join(INSTALLED_MARKETPLACE_MANIFEST_RELATIVE_PATH),
359 | | ) {
| |_____________________^
|
= help: consider removing `AbsolutePathBuf::try_from()`
= help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.93.0/index.html#useless_conversion
= note: `-D clippy::useless-conversion` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::useless_conversion)]`
error: aborting due to 2 previous errors
```
## Summary
- wrap routed delegation text in a small XML envelope before submitting
it as a user turn
- escape XML text content so the envelope stays well formed
- update focused coverage for the wrapper and the affected routed-turn
expectations
## What
Disable `Feature::CodexHooks` when building guardian review session
config
## Why
Guardian review sessions were respecting the Stop hook and could ingest
synthetic `<hook_prompt>` user turns Guardian should ignore hooks, while
the main session and regular subagents continue to respect them
In other words Guardian was getting ralph-looped
Co-authored-by: Codex <noreply@openai.com>
### **Issue**
guardian_parallel_reviews_fork_from_last_committed_trunk_history was
failing on Windows/Bazel with a stack overflow:
`thread
'guardian::tests::guardian_parallel_reviews_fork_from_last_committed_trunk_history'
has overflowed its stack`
- This problem was a stack-headroom problem
### **Solution**
Reduced stack pressure in the guardian async path by boxing thin wrapper
futures, and run the affected test on a dedicated 2 MiB thread stack.
Concretely:
- added Box::pin(...) around thin async wrapper hops in the guardian
review/delegate path
- changed
guardian_parallel_reviews_fork_from_last_committed_trunk_history to run
inside an explicitly sized thread stack so it has enough headroom in
low-stack environments
## Summary
- Port marketplace source support into the shared core marketplace-add
flow
- Support local marketplace directory sources
- Support direct `marketplace.json` URL sources
- Persist the new source types in config/schema and cover them in CLI
and app-server tests
## Validation
- `cargo test -p codex-core marketplace_add`
- `cargo test -p codex-cli marketplace_add`
- `cargo test -p codex-app-server marketplace_add`
- `just write-config-schema`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-cli`
## Context
Current `main` moved marketplace-add behavior into shared core code and
still assumed only git-backed sources. This change keeps that structure
but restores support for local directories and direct manifest URLs in
the shared path.
## Why
`main` recently needed
[#17691](https://github.com/openai/codex/pull/17691) because code behind
`cfg(not(debug_assertions))` was not being compiled by the Bazel PR
workflow. Our existing CI only built the fast/debug configuration, so
PRs could stay green while release-only Rust code still failed to
compile. This PR adds a release-style compile check that is cheap enough
to run on every PR.
## What Changed
- Added a `verify-release-build` job to `.github/workflows/bazel.yml`.
- Represented each supported OS once in that job's matrix: x64 Linux,
arm64 macOS, and x64 Windows.
- Kept the build close to fastbuild cost by using
`--compilation_mode=fastbuild` while forcing Rust to compile with
`-Cdebug-assertions=no`, which makes `cfg(not(debug_assertions))` true
without also turning on release optimizations or debug-info generation.
- Added comments in `.github/workflows/bazel.yml` and
`scripts/list-bazel-release-targets.sh` to make the job's intent and
target scope explicit.
- Restored the Bazel repository cache save behavior to run after every
non-cancelled job, matching
[#16926](https://github.com/openai/codex/pull/16926), and removed the
now-unused `repository-cache-hit` output from `prepare-bazel-ci`.
- Reused the shared `prepare-bazel-ci` action from the parent PR so the
new job does not duplicate Bazel setup boilerplate.
## Verification
- Used `bazel aquery` on `//codex-rs/tui:codex-tui` to confirm the Rust
compile still uses `opt-level=0` and `debuginfo=0` while passing
`-Cdebug-assertions=no`.
- Parsed `.github/workflows/bazel.yml` as YAML locally.
- Ran `bash -n scripts/list-bazel-release-targets.sh`.
## Summary
- Allows selected MCP results to return a larger default result set.
- Keeps the existing default cap for other MCP results.
- Applies the cap consistently when higher explicit limits are
requested.
## Testing
- `cargo test -p codex-core tool_search`
- Ran a local CLI smoke test with two stdio MCP servers exposing 100
tools each; the selected-server query returned 20 tools and the
regular-server query returned 8.
- Add trace-only wire logging for realtime websocket request/event text
payloads and the WebRTC call SDP request.
- Gate raw realtime logs behind
`RUST_LOG=codex_api::realtime_websocket::wire=trace` so normal logs stay
quiet.
---------
Co-authored-by: Codex <noreply@openai.com>
Introduce a ThreadStore interface for mediating access to the filesystem
(rollout jsonl files + sqlite db) based thread storage.
In later PRs we'll move the existing fs code behind a "local"
implementation of this ThreadStore interface.
This PR should be a no-op behaviorally, it only introduces the
interface.
Problem: PR #17372 moved initialized request handling into
`dispatch_initialized_client_request`, leaving analytics code that uses
`connection_id` without a local binding and breaking `codex-app-server`
builds.
Solution: Restore the `connection_id` binding from
`connection_request_id` before initialized request validation and
analytics tracking.
## Summary
Fix the TUI `$` skill popup so personal skills appear reliably when
Codex is connected to a remote app-server.
## What changed
- load skills on TUI startup with an explicit forced refresh
- refresh skills using the actual current cwd instead of an empty `cwds`
list
- resync an already-open `$` popup when skill mentions are updated
- add a regression test for refreshing an open mention popup
## Root cause
The TUI was sometimes sending `list_skills` with `cwds: []` after
`SessionConfigured`.
For the launchd app-server flow, the server resolved that empty cwd list
to its own process cwd, which was `/`. The response therefore came back
tagged with `cwd: "/"`, and the TUI later filtered skills by exact cwd
match against the actual project cwd such as `/Users/starr/code/dream`.
That dropped all personal skills from the mention list, so `$` only
showed plugins/apps.
## Verification
Built successfully with remote cache disabled:
```bash
cd /Users/starr/code/codex-worktrees/starr-skill-popup-20260413130509
bazel --output_base=/tmp/codex-bazel-verify-starr-skill-popup build //codex-rs/cli:codex --noremote_accept_cached --noremote_upload_local_results --disk_cache=
```
Also verified interactively in a PTY against the live app-server at
`ws://127.0.0.1:4511`:
- launched the built TUI
- typed `$`
- confirmed personal skills appeared in the popup, including entries
such as `Applied Devbox`, `CI Debug`, `Channel Summarization`, `Codex PR
Review`, and `Daily Digest`
## Files changed
- `codex-rs/tui/src/app.rs`
- `codex-rs/tui/src/chatwidget.rs`
- `codex-rs/tui/src/bottom_pane/chat_composer.rs`
Co-authored-by: Codex <noreply@openai.com>
## Summary
- route apply_patch runtime execution through the selected Environment
filesystem instead of the local self-exec path
- keep the standalone apply_patch command surface intact while restoring
its launcher/test/docs contract
- add focused apply_patch filesystem sandbox regression coverage
## Validation
- remote devbox Bazel run in progress
- passed: //codex-rs/apply-patch:apply-patch-unit-tests
--test_filter=test_read_file_utf8_with_context_reports_invalid_utf8
- in progress / follow-up: focused core and exec Bazel test slices on
dev
## Follow-up under review
- remote pre-verification and approval/retry behavior still need
explicit scrutiny for delete/update flows
- runtime sandbox-denial classification may need a tighter assertion
path than rendered stderr matching
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
This stack adds a new Bazel CI lane that verifies Rust code behind
`cfg(not(debug_assertions))`, but adding that job directly to
`.github/workflows/bazel.yml` would duplicate the same setup in multiple
places. Extracting the shared setup first keeps the follow-up change
easier to review and reduces the chance that future Bazel workflow edits
drift apart.
## What Changed
- Added `.github/actions/prepare-bazel-ci/action.yml` as a composite
action for the Bazel job bootstrap shared by multiple workflow jobs.
- Moved the existing Bazel setup, repository-cache restore, and
execution-log setup behind that action.
- Updated the `test` and `clippy` jobs in `.github/workflows/bazel.yml`
to call `prepare-bazel-ci`.
- Exposed `repository-cache-hit` and `repository-cache-path` outputs so
callers can keep the existing cache-save behavior without duplicating
the restore step.
## Verification
- Parsed `.github/workflows/bazel.yml` as YAML locally after rebasing
the stack.
- CI will exercise the refactored jobs end to end.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17704).
* #17705
* __->__ #17704
## Summary
- Refactors `MessageProcessor` and per-connection session state so
initialized service RPC handling can be moved into spawned tasks in a
follow-up PR.
- Shares the processor and initialized session data with
`Arc`/`OnceLock` instead of mutable borrowed connection state.
- Keeps initialized request handling synchronous in this PR; it does
**not** call `tokio::spawn` for service RPCs yet.
## Testing
- `just fmt`
- `cargo test -p codex-app-server` *(fails on existing hardening gaps
covered by #17375, #17376, and #17377; the pipelined config regression
passed before the unrelated failures)*
- `just fix -p codex-app-server`
In the app-server debug client, allow redirecting output to a file in
addition to just stdout. Shell redirecting works OK but is a bit weird
with the interactive mode of the debug client since a bunch of newlines
get dumped into the shell. With async messages from MCPs starting it's
also tricky to actually type in a prompt.
To allow the ability to have guaranteed-unique cursors, we make two
important updates:
* Add new updated_at_ms and created_at_ms columns that are in
millisecond precision
* Guarantee uniqueness -- if multiple items are inserted at the same
millisecond, bump the new one by one millisecond until it becomes unique
This lets us use single-number cursors for forwards and backwards paging
through resultsets and guarantee that the cursor is a fixed point to do
(timestamp > cursor) and get new items only.
This updated implementation is backwards-compatible since multiple
appservers can be running and won't handle the previous method well.
## Summary
Adds `thread_source` field to the existing Codex turn metadata sent to
Responses API
- Sends `thread_source: "user"` for user-initiated sessions: CLI, VS
Code, and Exec
- Sends `thread_source: "subagent"` for subagent sessions
- Omits `thread_source` for MCP, custom, and unknown session sources
- Uses the existing turn metadata transport:
- HTTP requests send through the `x-codex-turn-metadata` header
- WebSocket `response.create` requests send through
`client_metadata["x-codex-turn-metadata"]`
## Testing
- `cargo test -p codex-protocol
session_source_thread_source_name_classifies_user_and_subagent_sources`
- `cargo test -p codex-core turn_metadata_state`
- `cargo test -p codex-core --test responses_headers
responses_stream_includes_turn_metadata_header_for_git_workspace_e2e --
--nocapture`
## Summary
This PR removes `image_detail_original` as a runtime experiment and
makes original image detail available whenever the selected model
supports it.
Concretely, this change:
- drops the `image_detail_original` feature flag from the feature
registry and generated config schema
- makes tool-emitted image detail depend only on
`ModelInfo.supports_image_detail_original`
- updates `view_image` and `code_mode`/`js_repl` image emission to use
that capability check directly
- removes now-redundant experiment-specific tests and instruction
coverage
- keeps backward compatibility for existing configs by silently ignoring
a stale `features.image_detail_original` entry
The net effect is that `detail: "original"` is always available on
supported models, without requiring an experiment toggle.
- Add outputModality to thread/realtime/start and wire text/audio output
selection through app-server, core, API, and TUI.\n- Rename the realtime
transcript delta notification and add a separate transcript done
notification that forwards final text from item done without correlating
it with deltas.
This changes multi-agent v2 mailbox handling so incoming inter-agent
messages no longer preempt an in-flight sampling stream at reasoning or
commentary output-item boundaries.
## Summary
Move `codex marketplace add` onto a shared core implementation so the
CLI and app-server path can use one source of truth.
This change:
- adds shared marketplace-add orchestration in `codex-core`
- switches the CLI command to call that shared implementation
- removes duplicated CLI-only marketplace add helpers
- preserves focused parser and add-path coverage while moving the shared
behavior into core tests
## Why
The new `marketplace/add` RPC should reuse the same underlying
marketplace-add flow as the CLI. This refactor lands that consolidation
first so the follow-up app-server PR can be mostly protocol and handler
wiring.
## Validation
- `cargo test -p codex-core marketplace_add`
- `cargo test -p codex-cli marketplace_cmd`
- `just fix -p codex-core`
- `just fix -p codex-cli`
- `just fmt`
## Summary
- Pin Rust git patch dependencies to immutable revisions and make
cargo-deny reject unknown git and registry sources unless explicitly
allowlisted.
- Add checked-in SHA-256 coverage for the current rusty_v8 release
assets, wire those hashes into Bazel, and verify CI override downloads
before use.
- Add rusty_v8 MODULE.bazel update/check tooling plus a Bazel CI guard
so future V8 bumps cannot drift from the checked-in checksum manifest.
- Pin release/lint cargo installs and all external GitHub Actions refs
to immutable inputs.
## Future V8 bump flow
Run these after updating the resolved `v8` crate version and checksum
manifest:
```bash
python3 .github/scripts/rusty_v8_bazel.py update-module-bazel
python3 .github/scripts/rusty_v8_bazel.py check-module-bazel
```
The update command rewrites the matching `rusty_v8_<crate_version>`
`http_file` SHA-256 values in `MODULE.bazel` from
`third_party/v8/rusty_v8_<crate_version>.sha256`. The check command is
also wired into Bazel CI to block drift.
## Notes
- This intentionally excludes RustSec dependency upgrades and
bubblewrap-related changes per request.
- The branch was rebased onto the latest origin/main before opening the
PR.
## Validation
- cargo fetch --locked
- cargo deny check advisories
- cargo deny check
- cargo deny check sources
- python3 .github/scripts/rusty_v8_bazel.py check-module-bazel
- python3 .github/scripts/rusty_v8_bazel.py update-module-bazel
- python3 -m unittest discover -s .github/scripts -p
'test_rusty_v8_bazel.py'
- python3 -m py_compile .github/scripts/rusty_v8_bazel.py
.github/scripts/rusty_v8_module_bazel.py
.github/scripts/test_rusty_v8_bazel.py
- repo-wide GitHub Actions `uses:` audit: all external action refs are
pinned to 40-character SHAs
- yq eval on touched workflows and local actions
- git diff --check
- just bazel-lock-check
## Hash verification
- Confirmed `MODULE.bazel` hashes match
`third_party/v8/rusty_v8_146_4_0.sha256`.
- Confirmed GitHub release asset digests for denoland/rusty_v8
`v146.4.0` and openai/codex `rusty-v8-v146.4.0` match the checked-in
hashes.
- Streamed and SHA-256 hashed all 10 `MODULE.bazel` rusty_v8 asset URLs
locally; every downloaded byte stream matched both `MODULE.bazel` and
the checked-in manifest.
## Pin verification
- Confirmed signing-action pins match the peeled commits for their tag
comments: `sigstore/cosign-installer@v3.7.0`, `azure/login@v2`, and
`azure/trusted-signing-action@v0`.
- Pinned the remaining tag-based action refs in Bazel CI/setup:
`actions/setup-node@v6`, `facebook/install-dotslash@v2`,
`bazelbuild/setup-bazelisk@v3`, and `actions/cache/restore@v5`.
- Normalized all `bazelbuild/setup-bazelisk@v3` refs to the peeled
commit behind the annotated tag.
- Audited Cargo git dependencies: every manifest git dependency uses
`rev` only, every `Cargo.lock` git source has `?rev=<sha>#<same-sha>`,
and `cargo deny check sources` passes with `required-git-spec = "rev"`.
- Shallow-fetched each distinct git dependency repo at its pinned SHA
and verified Git reports each object as a commit.
This PR teaches the TUI to render guardian review timeouts as explicit
terminal history entries instead of dropping them from the live
timeline.
It adds timeout-specific history cells for command, patch, MCP tool, and
network approval reviews.
It also adds snapshot tests covering both the direct guardian event path
and the app-server notification path.
## Summary\n- add an exec-server package-local test helper binary that
can run exec-server and fs-helper flows\n- route exec-server filesystem
tests through that helper instead of cross-crate codex helper
binaries\n- stop relying on Bazel-only extra binary wiring for these
tests\n\n## Testing\n- not run (per repo guidance for codex changes)
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Add `turn/inject_items` app-server v2 request support for appending
raw Responses API items to a loaded thread history without starting a
turn.
- Generate JSON schema and TypeScript protocol artifacts for the new
params and empty response.
- Document the new endpoint and include a request/response example.
- Preserve compatibility with the typo alias `turn/injet_items` while
returning the canonical method name.
## Testing
- Not run (not requested)
## Why
For more advanced MCP usage, we want the model to be able to emit
parallel MCP tool calls and have Codex execute eligible ones
concurrently, instead of forcing all MCP calls through the serial block.
The main design choice was where to thread the config. I made this
server-level because parallel safety depends on the MCP server
implementation. Codex reads the flag from `mcp_servers`, threads the
opted-in server names into `ToolRouter`, and checks the parsed
`ToolPayload::Mcp { server, .. }` at execution time. That avoids relying
on model-visible tool names, which can be incomplete in
deferred/search-tool paths or ambiguous for similarly named
servers/tools.
## What was added
Added `supports_parallel_tool_calls` for MCP servers.
Before:
```toml
[mcp_servers.docs]
command = "docs-server"
```
After:
```toml
[mcp_servers.docs]
command = "docs-server"
supports_parallel_tool_calls = true
```
MCP calls remain serial by default. Only tools from opted-in servers are
eligible to run in parallel. Docs also now warn to enable this only when
the server’s tools are safe to run concurrently, especially around
shared state or read/write races.
## Testing
Tested with a local stdio MCP server exposing real delay tools. The
model/Responses side was mocked only to deterministically emit two MCP
calls in the same turn.
Each test called `query_with_delay` and `query_with_delay_2` with `{
"seconds": 25 }`.
| Build/config | Observed | Wall time |
| --- | --- | --- |
| main with flag enabled | serial | `58.79s` |
| PR with flag enabled | parallel | `31.73s` |
| PR without flag | serial | `56.70s` |
PR with flag enabled showed both tools start before either completed;
main and PR-without-flag completed the first delay before starting the
second.
Also added an integration test.
Additional checks:
- `cargo test -p codex-tools` passed
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server` passed
- `git diff --check` passed
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
Cap mirrored user text sent to realtime with the existing 300-token turn
budget while preserving the full model turn.
Adds integration coverage for capped realtime mirror payloads.
---------
Co-authored-by: Codex <noreply@openai.com>
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
### Motivation
- Switch the default model used for memory Phase 2 (consolidation) to
the newer `gpt-5.4` model.
### Description
- Change the Phase 2 model constant from `"gpt-5.3-codex"` to
`"gpt-5.4"` in `codex-rs/core/src/memories/mod.rs`.
### Testing
- Ran `just fmt`, which completed successfully.
- Attempted `cargo test -p codex-core`, but the build failed in this
environment because the `codex-linux-sandbox` crate requires the system
`libcap` pkg-config entry and the required system packages could not be
installed, so the test run was blocked.
------
[Codex
Task](https://chatgpt.com/codex/cloud/tasks/task_i_69d977693b48832a967e78d73c66dc8e)
The recent release broke, codex suggested this as the fix
Source failure:
https://github.com/openai/codex/actions/runs/24362949066/job/71147202092
Probably from
ac82443d07
For why it got in:
```
The relevant setup:
.github/workflows/rust-ci.yml (line 1) runs on PRs, but for codex-rs it only does:
cargo fmt --check
cargo shear
argument-comment lint via Bazel
no cargo check, no cargo clippy over the workspace, no cargo test over codex-tui
.github/workflows/rust-ci-full.yml (line 1) runs on pushes to main and branches matching **full-ci**. That one does compile TUI because:
codex-rs/Cargo.toml includes "tui" as a workspace member
lint_build runs cargo clippy --target ... --tests --profile ...
the matrix includes both dev and release profiles
tests runs cargo nextest run ..., but only dev-profile tests
Release CI also compiles it indirectly. .github/workflows/rust-release.yml (line 235) builds --bin codex, and cli/Cargo.toml (line 46) depends on codex-tui.
```
Codex tested locally with `cargo check -p codex-tui --release` and was
able to repro, and verified that this fixed it
Currently app-server may unload actively running threads once the last
connection disconnects, which is not expected.
Instead track when was the last active turn & when there were any
subscribers the last time, also add 30 minute idleness/no subscribers
timer to reduce the churn.
- stop `list_tool_suggest_discoverable_plugins()` from reloading the
curated marketplace for each discoverable plugin
- reuse a direct plugin-detail loader against the already-resolved
marketplace entry
The trigger was to stop those logs spamming:
```
d=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/life-science-research/.codex-plugin/plugin.json
2026-04-13T12:27:30.402Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/build-ios-apps/.codex-plugin/plugin.json
2026-04-13T12:27:30.402Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/life-science-research/.codex-plugin/plugin.json
2026-04-13T12:27:30.405Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/build-ios-apps/.codex-plugin/plugin.json
2026-04-13T12:27:30.406Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/life-science-research/.codex-plugin/plugin.json
2026-04-13T12:27:30.408Z WARN [019d81cf-6f69-7230-98aa-74294ff2dc5a] codex_core::plugins::manifest - session_loop{thread_id=019d81cf-6f69-7230-98aa-74294ff2dc5a}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id="019d86c8-0a8e-7013-b442-109aabbf75c9" codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=019d81cf-6f69-7230-98aa-74294ff2dc5a turn.id=019d86c8-0a8e-7013-b442-109aabbf75c9 model=gpt-5.4}: ignoring interface.defaultPrompt: prompt must be at most 128 characters path=/Users/jif/.codex/.tmp/plugins/plugins/build-ios-apps/.codex-plugin/plugin.json
```
## Summary
This updates the Windows elevated sandbox setup/refresh path to include
the legacy `compute_allow_paths(...).deny` protected children in the
same deny-write payload pipe added for split filesystem carveouts.
Concretely, elevated setup and elevated refresh now both build
deny-write payload paths from:
- explicit split-policy deny-write paths, preserving missing paths so
setup can materialize them before applying ACLs
- legacy `compute_allow_paths(...).deny`, which includes existing
`.git`, `.codex`, and `.agents` children under writable roots
This lets the elevated backend protect `.git` consistently with the
unelevated/restricted-token path, and removes the old janky hard-coded
`.codex` / `.agents` elevated setup helpers in favor of the shared
payload path.
## Root Cause
The landed split-carveout PR threaded a `deny_write_paths` pipe through
elevated setup/refresh, but the legacy workspace-write deny set from
`compute_allow_paths(...).deny` was not included in that payload. As a
result, elevated workspace-write did not apply the intended deny-write
ACLs for existing protected children like `<cwd>/.git`.
## Notes
The legacy protected children still only enter the deny set if they
already exist, because `compute_allow_paths` filters `.git`, `.codex`,
and `.agents` with `exists()`. Missing explicit split-policy deny paths
are preserved separately because setup intentionally materializes those
before applying ACLs.
## Validation
- `cargo fmt --check -p codex-windows-sandbox`
- `cargo test -p codex-windows-sandbox`
- `cargo build -p codex-cli -p codex-windows-sandbox --bins`
- Elevated `codex exec` smoke with `windows.sandbox='elevated'`: fresh
git repo, attempted append to `.git/config`, observed `Access is
denied`, marker not written, Deny ACE present on `.git`
- Unelevated `codex exec` smoke with `windows.sandbox='unelevated'`:
fresh git repo, attempted append to `.git/config`, observed `Access is
denied`, marker not written, Deny ACE present on `.git`
Addresses #17593
Problem: A regression introduced in
https://github.com/openai/codex/pull/16492 made thread/start fail when
Codex could not persist trusted project state, which crashes startup for
users with read-only config.toml.
Solution: Treat trusted project persistence as best effort and keep the
current thread's config trusted in memory when writing config.toml
fails.
Problem: PR #17601 updated context-compaction replay to call a new
ChatWidget handler, but the handler was never implemented, breaking
codex-tui compilation on main.
Solution: Render context-compaction replay through the existing
info-message path, preserving the intended `Context compacted` UI marker
without adding a one-off handler.
2026-04-13 09:20:10 -07:00
1898 changed files with 261505 additions and 84031 deletions
5. If the failure is likely caused by the current branch, patch code locally, commit, and push.
6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them.
7. If a review item is actionable and correct, patch code locally, commit, push, and then mark the associated review thread/comment as resolved once the fix is on GitHub.
8. If a review item from another author is non-actionable, already addressed, or not valid, post one reply on the comment/thread explaining that decision (for example answering the question or explaining why no change is needed). If the watcher later surfaces your own reply, treat that self-authored item as already handled and do not reply again.
8. If a review item from another author is non-actionable, already addressed, or not valid, post one reply on the comment/thread explaining that decision (for example answering the question or explaining why no change is needed). Prefix the GitHub reply body with `[codex]` so it is clear the response is automated. If the watcher later surfaces your own reply, treat that self-authored item as already handled and do not reply again.
9. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`.
10. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
11. On every loop, look for newly surfaced review feedback before acting on CI failures or mergeability state, then verify mergeability / merge-conflict status (for example via `gh pr view`) alongside CI.
@@ -99,7 +99,7 @@ When you agree with a comment and it is actionable:
5. Resume watching on the new SHA immediately (do not stop after reporting the push).
6. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again.
If you disagree or the comment is non-actionable/already addressed, reply once directly on the GitHub comment/thread so the reviewer gets an explicit answer, then continue the watcher loop. If the watcher later surfaces your own reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again.
If you disagree or the comment is non-actionable/already addressed, reply once directly on the GitHub comment/thread so the reviewer gets an explicit answer, then continue the watcher loop. Prefix any GitHub reply to a code review comment/thread with `[codex]` so it is clear the response is automated and not from the human user. If the watcher later surfaces your own reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again.
If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
For agent changes prefer integration tests over unit tests. Integration tests are under `core/suite` and use `test_codex` to set up a test instance of codex.
Features that change the agent logic MUST add an integration test:
- Provide a list of major logic changes and user-facing behaviors that need to be tested.
If unit tests are needed, put them in a dedicated test file (*_tests.rs).
Avoid test-only functions in the main implementation.
Check whether there are existing helpers to make tests more streamlined and readable.
description: Run a final code review on a pull request
---
Use subagents to review code using all code-review-* skills in this repository other than this orchestrator. One subagent per skill. Pass full skill path to subagents. Use xhigh reasoning.
You must return every single issue from every subagent. You can return an unlimited number of findings.
Use raw Markdown to report findings.
Number findings for ease of reference.
Each finding must include a specific file path and line number.
If the GitHub user running the review is the owner of the pull request add a `code-reviewed` label.
Do not leave GitHub comments unless explicitly asked.
description: Run a GitHub issue digest for openai/codex by feature-area labels, all areas, and configurable time windows. Use when asked to summarize recent Codex bug reports or enhancement requests, especially for owner-specific labels such as tui, exec, app, or similar areas.
---
# Codex Issue Digest
## Objective
Produce a headline-first, insight-oriented digest of `openai/codex` issues for the requested feature-area labels over the previous 24 hours by default. Honor a different duration when the user asks for one, for example "past week" or "48 hours". Default to a summary-only response; include details only when requested.
Include only issues that currently have `bug` or `enhancement` plus at least one requested owner label. If the user asks for all areas or all labels, collect `bug`/`enhancement` issues across all labels.
## Inputs
- Feature-area labels, for example `tui exec`
-`all areas` / `all labels` to scan all current feature labels
Use `--window "past week"` or `--window-hours 168` when the user asks for a non-default duration. Use `--all-labels` when the user says all areas or all labels.
2. Use the JSON as the source of truth. It includes new issues, new issue comments, new reactions/upvotes, current labels, current reaction counts, model-ready `summary_inputs`, and detailed `digest_rows`.
3. Choose the output mode from the user's request:
- Default mode: start the report with `## Summary` and do not emit `## Details`.
- Details-upfront mode: if the user asks for details, a table, a full digest, "include details", or similar, start with `## Summary`, then include `## Details`.
- Follow-up details mode: if the user asks for more detail after a summary-only digest, produce `## Details` from the existing collector JSON when it is still available; otherwise rerun the collector.
4. In `## Summary`, write a headline-first executive summary:
- The first nonblank line under `## Summary` must be a single-line headline or judgment, not a bullet. It should be useful even if the reader stops there.
- On quiet days, prefer exactly: `No major issues reported by users.` Use this when there are no elevated rows, no newly repeated theme, and nothing that needs owner action.
- When users are surfacing notable issues, make the headline name the count or theme, for example `Two issues are being surfaced by users:`.
- Immediately under an active headline, list only the issues or themes driving attention, ordered by importance. Start each line with the row's `attention_marker` when present, then a concise owner-readable description and inline issue refs.
- Treat `🔥🔥` as headline-worthy and `🔥` as elevated. Do not add fire emoji yourself; only copy the row's `attention_marker`.
- Keep any extra summary detail after the headline to 1-3 terse lines, only when it adds a decision-relevant caveat, repeated theme, or owner action.
- Do not include routine counts, broad stats, or low-signal table summaries in `## Summary` unless they change the headline. Put metadata and optional counts in `## Details` or the footer.
- In default mode, end the report with a concise prompt such as `Want details? I can expand this into the issue table.` Keep this separate from the summary headline so the headline stays clean.
- Cluster and name themes yourself from `summary_inputs`; the collector intentionally does not hard-code issue categories.
- Use a cluster only when the issues genuinely share the same product problem. If several issues merely share a broad platform or label, describe them individually.
- Do not omit a repeated theme just because its individual issues fall below the details table cutoff. Several similar reports should be called out as a repeated customer concern.
- For single-issue rows, summarize the concern directly instead of calling it a cluster.
- Use inline numbered issue links from each relevant row's `ref_markdown`.
- Example quiet summary:
```markdown
## Summary
No major issues reported by users.
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
- Example active summary:
```markdown
## Summary
Two issues are being surfaced by users:
🔥🔥 Terminal launch hangs on startup [1](https://github.com/openai/codex/issues/123)
🔥 Resume switches model providers unexpectedly [2](https://github.com/openai/codex/issues/456)
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
5. In `## Details`, when details are requested, include a compact table only when useful:
- Prefer rows from `digest_rows`; include a `Refs` column using each row's `ref_markdown`.
- Keep the table short; omit low-signal rows when the summary already covers them.
- Use compact columns such as marker, area, type, description, interactions, and refs.
- The `Description` cell should be a short owner-readable phrase. Use row `description`, title, body excerpts, and recent comments, but do not mechanically copy the raw GitHub issue title when it contains incidental details.
- A clear quiet/no-concern sentence when there is no meaningful signal.
6. Use the JSON `attention_marker` exactly. It is empty for normal rows, `🔥` for elevated rows, and `🔥🔥` for very high-attention rows. The actual cutoffs are in `attention_thresholds`.
7. Use inline numbered references where a row or bullet points to issues, for example `Compaction bugs [1](https://github.com/openai/codex/issues/123), [2](https://github.com/openai/codex/issues/456)`. Do not add a separate footnotes section.
8. Label `interactions` as `Interactions`; it counts posts/comments/reactions during the requested window, not unique people.
9. Mention the collector `script_version`, repo checkout `git_head`, and time window in one compact source line. In default mode, put this before the details prompt so the final line still asks whether the user wants details. In details-upfront mode, it can be the footer.
## Reaction Handling
The collector uses GitHub reactions endpoints, which include `created_at`, to count reactions created during the digest window for hydrated issues. It reports both in-window reaction counts and current reaction totals. Treat current reaction totals as standing engagement, and treat `new_reactions` / `new_upvotes` as windowed activity.
By default, the collector fetches issue comments with `since=<window start>` and caps the number of comment pages per issue. This keeps very long historical threads from dominating a digest run and focuses the report on recent posts. Use `--fetch-all-comments` only when exhaustive comment history is more important than runtime.
GitHub issue search is still seeded by issue `updated_at`, so a purely reaction-only issue may be missed if reactions do not bump `updated_at`. Covering every reaction-only case would require either a persisted snapshot store or a broader scan of labeled issues.
## Attention Markers
The collector scales attention markers by the requested time window. The baseline is 5 human user interactions for `🔥` and 10 for `🔥🔥` over 24 hours; longer or shorter windows scale those cutoffs linearly and round up. For example, a one-week report uses 35 and 70 interactions. Human user interactions are human-authored new issue posts, human-authored new comments, and human reactions created during the window, including upvotes. Bot posts and bot reactions are excluded. In prose, explain this as high user interaction rather than naming the emoji.
## Freshness
The automation should run from a repo checkout that contains this skill. For shared daily use, prefer one of these patterns:
- Run the automation in a checkout that is refreshed before the automation starts, for example with `git pull --ff-only`.
- If the automation cannot safely mutate the checkout, have it report the current `git_head` from the collector output so readers know which skill/script version produced the digest.
## Sample Owner Prompt
```text
Use $codex-issue-digest to run the Codex issue digest for labels tui and exec over the previous 24 hours.
```
```text
Use $codex-issue-digest to run the Codex issue digest for all areas over the past week.
"Issues are selected when they currently have bug or enhancement plus at least one requested owner label and were updated during the window.",
"By default, issue comments are fetched with since=window_start and a max page cap to avoid long historical threads; use --fetch-all-comments when exhaustive comment history is needed.",
"New issue comments are filtered by comment creation time within the window from the fetched comment set.",
"Reaction events are counted by GitHub reaction created_at timestamps for hydrated issues and fetched comments.",
"Current reaction totals are standing engagement signals; new_reactions and new_upvotes are windowed activity.",
"The collector does not assign semantic clusters; use summary_inputs as model-ready evidence for report-time clustering.",
"Pure reaction-only issues may be missed if GitHub issue search does not surface them via updated_at.",
"Issues updated during the window without a new issue body or new comment are retained because label/status edits can still be useful owner signals.",
description: Update the title and body of one or more pull requests.
---
## Determining the PR(s)
When this skill is invoked, the PR(s) to update may be specified explicitly, but in the common case, the PR(s) to update will be inferred from the branch / commit that the user is currently working on. For ordinary Git usage (i.e., not Sapling as discussed below), you may have to use a combination of `git branch` and `gh pr view <branch> --repo openai/codex --json number --jq '.number'` to determine the PR associated with the current branch / commit.
## PR Body Contents
When invoked, use `gh` to edit the pull request body and title to reflect the contents of the specified PR. Make sure to check the existing pull request body to see if there is key information that should be preserved. For example, NEVER remove an image in the existing pull request body, as the author may have no way to recover it if you remove it.
It is critically important to explain _why_ the change is being made. If the current conversation in which this skill is invoked has discussed the motivation, be sure to capture this in the pull request body.
The body should also explain _what_ changed, but this should appear after the _why_.
Limit discussion to the _net change_ of the commit. It is generally frowned upon to discuss changes that were attempted but later undone in the course of the development of the pull request. When rewriting the pull request body, you may need to eliminate details such as these when they are no longer appropriate / of interest to future readers.
Avoid references to absolute paths on my local disk. When talking about a path that is within the repository, simply use the repo-relative path.
It is generally helpful to discuss how the change was verified. That said, it is unnecessary to mention things that CI checks automatically, e.g., do not include "ran `just fmt`" as part of the test plan. Though identifying the new tests that were purposely introduced to verify the new behavior introduced by the pull request is often appropriate.
Make use of Markdown to format the pull request professionally. Ensure "code things" appear in single backticks when referenced inline. Fenced code blocks are useful when referencing code or showing a shell transcript. Also, make use of GitHub permalinks when citing existing pieces of code that are relevant to the change.
Make sure to reference any relevant pull requests or issues, though there should be no need to reference the pull request in its own PR body.
If there is documentation that should be updated on https://developers.openai.com/codex as a result of this change, please note that in a separate section near the end of the pull request. Omit this section if there is no documentation that needs to be updated.
## Working with Stacks
Sometimes a pull request is composed of a stack of commits that build on one another. In these cases, the PR body should reflect the _net_ change introduced by the stack as a whole, rather than the individual commits that make up the stack.
Similarly, sometimes a user may be using a tool like Sapling to leverage _stacked pull requests_, in which case the `base` of the PR may be the a branch that is the `head` of another PR in the stack rather than `main`. In this case, be sure to discuss only the net change between the `base` and `head` of the PR that is being opened against that stacked base, rather than the changes relative to `main`.
## Sapling
If `.git/sl/store` is present, then this Git repository is governed by Sapling SCM (https://sapling-scm.com).
In Sapling, run the following to see if there is a GitHub pull request associated with the current revision:
Alternatively, you can run `sl sl` to see the current development branch and whether there is a GitHub pull request associated with the current commit. For example, if the output were:
```
@ cb032b31cf 72 minutes ago mbolin #11412
╭─╯ tui: show non-file layer content in /debug-config
│
o fdd0cd1de9 Today at 20:09 origin/main
│
~
```
-`@` indicates the current commit is `cb032b31cf`
- it is a development branch containing a single commit branched off of `origin/main`
- it is associated with GitHub pull request #11412
@@ -19,6 +19,12 @@ In the codex-rs folder where the rust code lives:
- You can run `just argument-comment-lint` to run the lint check locally. This is powered by Bazel, so running it the first time can be slow if Bazel is not warmed up, though incremental invocations should take <15s. Most of the time, it is best to update the PR and let CI take responsibility for checking this (or run it asynchronously in the background after submitting the PR). Note CI checks all three platforms, which the local run does not.
- When possible, make `match` statements exhaustive and avoid wildcard arms.
- Newly added traits should include doc comments that explain their role and how implementations are expected to use them.
- Discourage both `#[async_trait]` and `#[allow(async_fn_in_trait)]` in Rust traits.
- Prefer native RPITIT trait methods with explicit `Send` bounds on the returned future, as in `3c7f013f9735` / `#16630`.
- Implementations may still use `async fn foo(&self, ...) -> T` when they satisfy that contract.
- Do not use `#[allow(async_fn_in_trait)]` as a shortcut around spelling the future contract explicitly.
- When writing tests, prefer comparing the equality of entire objects over fields one by one.
- When making a change that adds or changes an API, ensure that the documentation in the `docs/` folder is up to date if applicable.
- Prefer private modules and explicitly exported public crate API.
@@ -44,6 +50,8 @@ In the codex-rs folder where the rust code lives:
`codex-rs/tui/src/bottom_pane/mod.rs`, and similarly central orchestration modules.
- When extracting code from a large module, move the related tests and module/type docs toward
the new implementation so the invariants stay close to the code that owns them.
- Avoid adding new standalone methods to `codex-rs/tui/src/chatwidget.rs` unless the change is
trivial; prefer new modules/files and keep `chatwidget.rs` focused on orchestration.
- When running Rust commands (e.g. `just fix` or `cargo test`) be patient with the command and never try to kill them using the PID. Rust lock can make the execution slow, this is expected.
Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:
@@ -11,3 +11,7 @@ Our security program is managed through Bugcrowd, and we ask that any validated
## Vulnerability Disclosure Program
Our Vulnerability Program Guidelines are defined on our [Bugcrowd program page](https://bugcrowd.com/engagements/openai).
## How to operate CODEX safely
For details on Codex security boundaries, including sandboxing, approvals, and network controls, see [Agent approvals & security](https://developers.openai.com/codex/agent-approvals-security).
content="Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager."
# Matches 0.x.y versions from 0.0.y through 0.119.y; excludes 0.120.0 and newer.
To try a writable legacy sandbox mode with these commands, pass an explicit config override such
as `-c 'sandbox_mode="workspace-write"'`.
### Selecting a sandbox policy via `--sandbox`
The Rust CLI exposes a dedicated `--sandbox` (`-s`) flag that lets you pick the sandbox policy **without** having to reach for the generic `-c/--config` option:
@@ -94,7 +97,7 @@ In `workspace-write`, Codex also includes `~/.codex/memories` in its writable ro
This folder is the root of a Cargo workspace. It contains quite a bit of experimental code, but here are the key crates:
- [`core/`](./core) contains the business logic for Codex. Ultimately, we hope this to be a library crate that is generally useful for building other Rust/native applications that use Codex.
- [`core/`](./core) contains the business logic for Codex. Ultimately, we hope this becomes a library crate that is generally useful for building other Rust/native applications that use Codex.
- [`exec/`](./exec) "headless" CLI for use in automation.
- [`tui/`](./tui) CLI that launches a fullscreen TUI built with [Ratatui](https://ratatui.rs/).
- [`cli/`](./cli) CLI multitool that provides the aforementioned CLIs via subcommands.
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.