## Why
The migration away from `SandboxPolicy` needs new configs to start from
permissions profiles instead of deriving profiles from legacy sandbox
modes. Existing users can have empty `config.toml` files, and we should
not rewrite user-owned config files that may live in shared
repositories.
This PR introduces built-in profile names so an empty config can resolve
to a canonical `PermissionProfile`, while explicit named `[permissions]`
profiles still behave predictably.
## What changed
- Adds built-in `default_permissions` profile names:
- `:read-only` maps to `PermissionProfile::read_only()`.
- `:workspace` maps to the workspace-write profile, including
project-root metadata carveouts.
- `:danger-no-sandbox` maps to `PermissionProfile::Disabled`, preserving
the distinction between no sandbox and a broad managed sandbox.
- Reserves the `:` prefix for built-in profiles so user-defined
`[permissions]` profiles cannot collide with future built-ins.
- Allows `default_permissions` to reference a built-in profile without
requiring a `[permissions]` table.
- Makes an otherwise empty config choose a built-in profile by
trust/platform context: trusted or untrusted project roots use
`:workspace` when the platform supports that sandbox, while roots
without a trust decision use `:read-only`.
- Keeps legacy `sandbox_mode` configs on the legacy path, and still
rejects user-defined `[permissions]` profiles that omit
`default_permissions` so we do not silently guess among custom profiles.
- Preserves compatibility behavior for implicit defaults: bare
`network.enabled = true` allows runtime network without starting the
managed proxy, explicit profile proxy policy still starts the proxy, and
implicit workspace/add-dir roots keep legacy metadata carveouts.
## Verification
- `cargo test -p codex-core builtin --lib`
- `cargo test -p codex-core profile_network_proxy_config`
- `cargo test -p codex-core
implicit_builtin_workspace_profile_preserves_add_dir_metadata_carveouts`
- `cargo test -p codex-core
permissions_profiles_network_enabled_allows_runtime_network_without_proxy`
- `cargo test -p codex-core
permissions_profiles_proxy_policy_starts_managed_network_proxy`
## Documentation
Public Codex config docs should mention these built-in names when the
`[permissions]` config format is ready to document as stable.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19900).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* #20008
* __->__ #19900
## Why
Managed sessions use `NO_PROXY` to keep a small set of destinations on
the direct path by default. The old default also bypassed all IPv4
link-local addresses in `169.254.0.0/16`, which includes metadata
endpoints such as `169.254.169.254`. Because `NO_PROXY` is evaluated by
the client before the request reaches the managed proxy, requests to
that range could skip proxy-side allowlist and local-binding checks
entirely. On hosts where a link-local metadata service is reachable,
that creates a path to sensitive environment metadata or credentials
outside the intended enforcement point.
## What changed
- remove the default IPv4 link-local `169.254.0.0/16` bypass from the
managed proxy environment
- keep the existing loopback and private-network defaults unchanged
- update the regression assertion to lock in the narrower default
## Security impact
Link-local requests now stay on the managed-proxy path by default, so
the proxy can apply configured policy before they reach metadata-style
endpoints or other link-local services.
## Verification
- `cargo test -p codex-network-proxy`
Co-authored-by: Codex <noreply@openai.com>
## Summary
This extends external agent detection/import beyond config artifacts so
Codex can detect recent sessions files from the external agent home and
import them into Codex rollout history.
## What changed
- Added a focused `external_agent_sessions` module for:
- session discovery
- source-record parsing
- rollout construction
- import ledger tracking
- Wired session detection/import into the app-server external agent
config API.
- Added compaction handling so large imported sessions can be resumed
safely before the first follow-up turn.
## Testing
Added coverage for:
- recent-session detection
- custom-title handling
- recency filtering
- dedupe and re-detect-after-source-change behavior
- visible imported turn construction
- backward-compatible import payload deserialization
- end-to-end RPC import flow
- rejection of undetected session paths
- repeat-import behavior
- large-session compaction before first follow-up
Ran:
- `cargo test -p codex-app-server external_agent_config_import_ --test
all`
## Summary
- exit shell mode when `Esc` is pressed while the absorbed `!` is the
only input
- add direct regression coverage plus a composer snapshot for the
restored normal prompt state
## Root cause
Shell mode stores the leading `!` outside the editable textarea. After
typing only `!`, the textarea is empty but the composer is still in bash
mode, so the existing empty-composer `Esc` handling never runs.
## Validation
- `just fmt`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::esc_exits_empty_shell_mode`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::footer_mode_snapshots`
- `cargo insta pending-snapshots`
`cargo test -p codex-tui` still reports unrelated existing `/status`
snapshot drift in this local environment because the rendered
permissions text is `workspace-write with network access` instead of the
older `read-only` fixture text.
Move local resume and fork cwd filtering to `thread/list` instead of
filtering in the TUI. This makes the `/resume` menu feel slightly faster
to load when working in repos with many historical threads, and
centralizes the cwd filtering in app-server.
**Affected:**
- /resume from inside the TUI.
- codex resume with no session ID and without --last
- codex resume --all
- codex fork with no session ID and without --last
- codex fork --all
**Not affected:**
- codex resume <id>
- codex fork <id>
- codex resume --last
- codex fork --last
Steps to test performance improvement in a real Codex environment:
- Launch `codex resume` using compiled binary in a directory that has
seen many threads.
- Launch `codex resume` using release binary in same directory.
- Observe difference in time-to-full-page as threads load.
## Summary
- suggest Plan mode when the current composer draft contains the
standalone word `plan`
- shares the Codex App heuristics for detection
- excludes things line `/plan` and the word plan in shell mode
- reuse the existing `Shift+Tab` mode cycle and add thread-scoped
dismissal with `Esc`
- replace the normal footer hint while the reminder is visible so the
statusline stays anchored
https://github.com/user-attachments/assets/01123ae8-cee6-4e95-b563-44655c071cde
## Why
The desktop app already nudges users toward Plan mode when their draft
clearly signals planning intent. The TUI had the underlying `/plan` and
`Shift+Tab` flows, but no equivalent reminder at the moment the user was
most likely to benefit from them.
## Details
The reminder is shown only when Plan mode is available, the draft
contains standalone `plan`, the user is not already in Plan mode, the
composer is actionable, and the current thread has not dismissed the
reminder. Slash-command and shell-command drafts are excluded.
The first implementation used an extra composer row, but that moved the
statusline whenever the heuristic fired. This version keeps the layout
stable by rendering the reminder in the existing footer row instead.
## Validation
- `INSTA_UPDATE=always cargo test -p codex-tui
chatwidget::tests::plan_mode::plan_mode_nudge -- --nocapture`
- `just fmt`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui`
- `cargo insta pending-snapshots`
- `git diff --check`
## Why
Network access approval prompts were showing the generic retry reason,
which made auto-review focus on the blocked connection instead of the
command that caused it. This makes network approvals easier to assess by
telling the reviewer to evaluate whether the triggering command was
authorised by the user and within policy, and to treat the network call
as acceptable when it is a reasonable consequence of that command.
## What changed
- Split guardian approval request prompt rendering so `NetworkAccess`
has a dedicated branch.
- For network requests, show `Network approval context` and `Network
access JSON` instead of `Retry reason` / `Planned action JSON`.
- Added regression coverage for the network approval prompt wording and
for omitting retry reason in this case.
## Verification
- `cargo test -p codex-core
guardian::tests::build_guardian_prompt_items_explains_network_access_review_scope`
## Why
- Without change: MCP tool call spans include request-side details such
as server, tool, call ID, connector, session, and turn.
- Issue: Some useful telemetry is only known by the MCP server after it
handles the tool call, such as target identity or whether the call
triggered a user-facing flow.
## What Changed
- With change: Codex reads allowlisted telemetry from
`_meta["codex/telemetry"]["span"]` and records it on the
`mcp.tools.call` span.
- Adds span fields for `codex.mcp.target.id` and
`codex.mcp.user_flow.triggered`, with strict type checks and bounded
target ID length.
## Verification
`codex-rs/core/src/mcp_tool_call_tests.rs`
## Why
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with `session_id` and `turn_id`.
- Issue: MCP servers may want the turn start timestamp to measure
internal latency relative to turn start.
## What Changed
- With change: turn metadata now includes `turn_started_at_unix_ms`,
which is propagated to MCP tool calls in
`_meta["x-codex-turn-metadata"]`.
## Verification
- `codex-rs/core/src/mcp_tool_call_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/src/turn_timing_tests.rs`
- `codex-rs/core/tests/responses_headers.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
## Why
Several bug reports describe thread shutdown (including subagent
threads) leaving stdio MCP server processes behind. These reports all
point at the same lifecycle gap: Codex launches stdio MCP servers, but
the session-level shutdown path does not explicitly close MCP clients or
terminate the server process tree.
Fixes#12491Fixes#12976Fixes#18881Fixes#19469
## History
This is best understood as a regression/coverage gap in MCP session
lifecycle management, not as stdio MCP cleanup being absent all along.
#10710 added process-group cleanup for stdio MCP servers, but that
cleanup only runs when the `RmcpClient`/transport is dropped. The older
reports (#12491 and #12976) came after that cleanup existed, which
suggests the remaining problem was that some higher-level shutdown paths
kept the MCP manager alive or replaced it without explicitly draining
clients. The newer reports (#18881 and #19469) exposed the same family
around manager replacement and shutdown.
## What changed
- Added an explicit stdio MCP process handle in `codex-rmcp-client` so
local MCP servers terminate their process group and executor-backed MCP
servers call the executor process terminator.
- Added `RmcpClient::shutdown()` and manager-level MCP shutdown draining
so session shutdown, channel-close fallback, MCP refresh, and connector
probing stop owned MCP clients.
- Added regression coverage that starts a stdio MCP server, begins an
in-flight blocking tool call, shuts down the client, and asserts the
server process exits.
## Verification
- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-mcp`
- `just fix -p codex-rmcp-client`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
- Manual before/after validation with a temporary repro script:
- Pre-fix binary from `HEAD^` (`fed0a8f4fa`): reproduced the leak with
surviving MCP server and child PIDs, `survivors=[77583, 77592]`,
`leaked=true`.
- Post-fix binary from this branch (`67e318148b`): verified both MCP
processes were gone after interrupting `codex exec`, `survivors=[]`,
`leaked=false`.
## Why
Fixes#19814.
The TUI's current `Worked for ...` timing behavior is a leftover from
#9599. At that point, models could emit multiple assistant messages in
one turn for preambles/commentary, but the TUI did not yet have a
reliable signal that an assistant message was the final answer when it
started streaming. To avoid showing an ever-growing elapsed time on each
preamble separator, #9599 made the separator timer incremental by
tracking elapsed time since the previous separator.
That workaround is no longer the right model for the final
completed-turn display. Since then, #16638 added protocol-native turn
timing, including `duration_ms` on turn completion. With that cumulative
duration available at the point where the TUI renders the completed-turn
separator, the UI can show the actual turn duration directly instead of
carrying per-separator timing state.
## What Changed
- Thread `duration_ms` into `ChatWidget::on_task_complete` from both
legacy `TurnCompleteEvent` handling and app-server `TurnCompleted`
notifications.
- Use `duration_ms` for the final `Worked for ...` separator, falling
back to the status indicator timer only when the protocol duration is
unavailable.
- Keep mid-turn separators before later assistant text as plain visual
dividers instead of clocked `Worked for ...` separators.
- Remove the old incremental separator timer state and helper
(`last_separator_elapsed_secs` / `worked_elapsed_from`).
- Add a snapshot regression test for a turn that runs a command and then
completes with a final answer, verifying the final separator uses the
cumulative turn duration.
## Verification
- `cargo test -p codex-tui
final_worked_for_uses_cumulative_turn_duration_snapshot`
- `just fix -p codex-tui`
Manual repro prompt:
```text
Manual timing repro. First send a short preamble/commentary sentence before using tools. Then run exactly this shell command: sleep 75; echo MANUAL_TIMING_DONE. After the command finishes, give a final answer that says "done". Do not skip the preamble.
```
After this change, the mid-turn break before the final answer should be
a plain divider, and the final completed-turn separator should show
`Worked for ...` using the cumulative turn duration.
Before:
<img width="414" height="102" alt="Screenshot 2026-04-27 at 10 09 01 PM"
src="https://github.com/user-attachments/assets/b9e2ce01-2460-40e4-a5c4-c9ba8add2557"
/>
After:
<img width="485" height="149" alt="Screenshot 2026-04-27 at 10 09 07 PM"
src="https://github.com/user-attachments/assets/d24089ae-d4e2-41b6-b966-07c98706ead4"
/>
## Summary
Make FileSystemSandboxPolicy the semantic source of truth for project
root metadata protection. Under writable roots, `.git`, `.codex`, and
`.agents` stay protected unless user policy grants an explicit write
rule for that metadata path.
## Scope
1. Add `protected_metadata_names` to `WritableRoot`.
2. Teach `FileSystemSandboxPolicy::can_write_path_with_cwd` to reject
protected metadata writes under writable roots unless explicitly
allowed.
3. Default workspace write profiles to protect `.git`, `.codex`, and
`.agents`.
4. Add the Linux fallback setup needed before Linux enforcement lands
later in the stack.
## Reviewer Focus
1. The policy decision belongs in FileSystemSandboxPolicy, not shell
command parsing.
2. Legacy SandboxPolicy remains a compatibility projection, not the
source of the new rule.
3. Explicit user write rules can still opt into these metadata paths.
## Stack
1. Policy primitive: this PR
2. macOS Seatbelt adapter: #19847
3. Shell preflight UX: #19848
4. Runtime profile propagation: #19849
5. Linux bubblewrap adapter: #19852
## Validation
1. codex protocol permissions tests
2. formatting for codex protocol and codex linux sandbox
3. diff whitespace check
## Why
The TUI currently handles keyboard shortcuts as hard-coded event matches
spread across app, composer, pager, list, approval, and navigation code.
That makes shortcuts hard to customize, makes displayed hints easy to
drift from actual behavior, and makes future keymap work riskier because
there is no central action inventory.
This PR adds the foundation for configurable, action-based keymaps
without adding the interactive remapping UI yet. Onboarding
intentionally stays on fixed startup shortcuts because users cannot
reasonably configure keymaps before completing onboarding.
This is PR1 in the keymap stack:
- PR1: #18593: configurable keymap foundation
- PR2: #18594: `/keymap` picker and guided remapping UI
- PR3: #18595: Vim composer mode and the remap option
## Design Notes
The new model resolves named actions into concrete runtime bindings once
from config, then passes those bindings to the UI surfaces that handle
input or render shortcut hints.
The main concepts are:
- **Context**: a scope where an action is active, such as `global`,
`chat`, `composer`, `editor`, `pager`, `list`, or `approval`.
- **Action**: a named operation inside a context, such as
`global.open_transcript`, `composer.submit`, or `pager.close`.
- **Binding**: one or more single-key shortcuts assigned to an action,
written as config strings such as `ctrl-t`, `alt-backspace`, or
`page-down`. Multi-step sequences such as `ctrl-x ctrl-s`, `g g`, or
leader-key flows are not part of this PR.
- **Resolution order**: context-specific config wins first, supported
global fallbacks come next, and built-in defaults fill in anything
unset.
- **Explicit unbinding**: an empty array removes an action binding in
that scope and does not fall through to a fallback binding.
- **Conflict validation**: a resolved keymap rejects duplicate active
bindings inside the same scope so one keypress cannot dispatch two
actions.
## What Changed
- Added `TuiKeymap` config support under `[tui.keymap]`, including typed
contexts/actions, key alias normalization, generated schema coverage,
and user-facing config errors.
- Added `RuntimeKeymap` resolution in `codex-rs/tui/src/keymap.rs`,
including fallback precedence, built-in defaults, explicit unbinding,
and per-context conflict validation.
- Rewired existing TUI handlers to consume resolved keymap actions
instead of directly matching hard-coded keys in each component.
- Updated key hint rendering and footer/pager/list surfaces so displayed
shortcuts follow the resolved keymap.
- Kept onboarding shortcuts fixed in
`codex-rs/tui/src/onboarding/keys.rs` instead of exposing them through
`[tui.keymap]`.
## Validation
The branch includes focused coverage for config parsing, key
normalization, runtime fallback resolution, explicit unbinding,
duplicate-key conflict validation, default keymap consistency,
onboarding startup key behavior, and UI hint snapshots affected by
resolved key bindings.
## Why
Codex enables enhanced keyboard reporting while the TUI owns the
terminal. In iTerm2, exiting the TUI with Ctrl+C can intermittently
leave the parent shell receiving raw CSI-u / `modifyOtherKeys` fragments
instead of normal key input.
Final terminal cleanup should put the parent shell back into normal
keyboard reporting even if the terminal misses the usual stack pop.
Fixes#19553.
## What Changed
- Move TUI keyboard enhancement setup and detection into
`tui/src/tui/keyboard_modes.rs`.
- Add an exit-only `restore_after_exit()` path that performs the normal
keyboard enhancement pop plus unconditional keyboard enhancement and
`modifyOtherKeys` resets.
- Keep temporary restore paths, such as external-editor handoff, using
the balanced stack pop behavior.
## Confidence
Medium. This is a speculative fix: I was not able to reproduce the
reported iTerm2 behavior manually, but the symptoms line up with
terminal keyboard reporting state surviving Codex exit. The added reset
sequences are scoped to final TUI shutdown and should be harmless when
the terminal is already clean.
## Why
Memory startup runs in the background after an eligible turn, but it can
consume Codex backend quota at exactly the wrong time: when the user is
already near a rate-limit boundary. This PR adds a guard so the memory
pipeline backs off when the Codex rate-limit snapshot says the remaining
budget is too low.
## What Changed
- Added `memories.min_rate_limit_remaining_percent` with a default of
`25`, clamped to `0..=100`, and regenerated `core/config.schema.json`.
- Added `codex-rs/memories/write/src/guard.rs`, which fetches Codex
backend rate limits before memory startup and skips phase 1 / phase 2
when the Codex limit is reached or either tracked window is above the
configured usage ceiling.
- Keeps startup best-effort: non-Codex auth or rate-limit fetch/client
failures preserve the existing memory startup behavior.
- Records a `codex.memory.startup` counter with
`status=skipped_rate_limit` when startup is skipped.
- Added config parsing/clamping coverage and guard unit tests.
## Verification
- Added `codex-rs/memories/write/src/guard_tests.rs` for threshold,
primary/secondary window, and reached-limit behavior.
- Added config tests for TOML parsing and clamping.
## Summary
AgentIdentity runtime loading currently registers tasks against a single
hardcoded AuthAPI base URL. That works for production, but local and
staging validation may need registration to target a different
authapi-login-provider without baking internal staging service URLs into
the OSS binary.
This PR adds a small config surface for
`agent_identity_authapi_base_url` and threads it through the existing
auth-loading path as a direct argument. Explicit config wins. Without
config, task registration keeps using the production AuthAPI URL,
matching the current default behavior.
## Stack
1. openai/codex#19762 - `refactor: make auth loading async` (merged)
2. openai/codex#19763 - `refactor: load agent identity runtime eagerly`
3. This PR - `fix: configure AgentIdentity AuthAPI base URL`
4. openai/codex#19764 - `feat: verify agent identity JWTs with JWKS`
## Design decisions
- Keep the existing auth-loading shape and pass the new value as an
argument. This avoids another wrapper loader and keeps the call path
readable.
- Add config instead of embedding internal staging URLs. Environments
that need a non-production AuthAPI can configure it explicitly.
- Keep the default AuthAPI registration URL as production.
`chatgpt_base_url` remains separate and is used by the follow-up JWKS
verification PR for fetching public keys from the ChatGPT backend route.
- Resolve the AuthAPI base URL inside AgentIdentity loading, because
task registration is the only consumer of this value.
## Testing
Tests: targeted Rust checks, AgentIdentity auth tests, config schema
regeneration, formatter/fix pass, and whitespace diff check.
## Why
Memory startup was tied to thread lifecycle events such as create, load,
and fork. That can run memory work before a thread receives real user
input, and it makes startup cost scale with thread management instead of
actual turns. Moving the trigger to `thread/sendInput` keeps memory
startup aligned with the first real user turn and lets it use the
current thread config at turn time.
The idea is to prevent ghost cost due to pre-warm triggered by the app
Turn-based startup can also make global phase-2 consolidation easier to
request repeatedly, so this adds a success cooldown and tightens the
default startup scan window.
## What Changed
- Start `codex_memories_write::start_memories_startup_task` after a
non-empty `thread/sendInput` turn is submitted, instead of from thread
create/load/fork paths:
d4a6885b78/codex-rs/app-server/src/codex_message_processor.rs (L6477-L6487)
- Expose `CodexThread::config()` so app-server can pass the live config
into memory startup at turn time.
- Add a six-hour successful-run cooldown for global phase-2
consolidation via `SkippedCooldown`:
d4a6885b78/codex-rs/state/src/runtime/memories.rs (L963-L966)
- Reduce memory startup defaults to at most 2 rollouts over 10 days:
d4a6885b78/codex-rs/config/src/types.rs (L31-L34)
## Verification
Updated the memory runtime coverage around phase-2 reclaim behavior,
including `phase2_global_lock_respects_success_cooldown`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Phase 2 still needs to choose the most relevant stage-1 memory outputs
by usage and recency, but exposing that ranking as the rendered
`raw_memories.md` order creates unnecessary large diff. Usage-count or
timestamp changes can reshuffle otherwise unchanged memories, making the
workspace diff noisy and giving the consolidation prompt a misleading
recency signal from file position.
This fix will reduce token consumption
## What Changed
- Keep the existing top-N Phase 2 selection ranking by `usage_count`,
`last_usage`, `source_updated_at`, and `thread_id`.
- Return the selected rows in stable ascending `thread_id` order before
syncing Phase 2 filesystem inputs.
- Update the memory README, raw memories header, and consolidation
prompt so they describe the stable order and tell the prompt to use
metadata and workspace diffs instead of file order as the recency
signal.
- Adjust the memory runtime tests to use deterministic thread IDs and
assert the stable return order separately from the ranked selection
semantics.
## Test Coverage
- Existing memory runtime tests in
`codex-rs/state/src/runtime/memories.rs` now cover the stable returned
ordering for Phase 2 inputs.
---------
Co-authored-by: Codex <noreply@openai.com>
Keep extracting memories out of core and moving the write trigger in the
app-server
This is temporary and it should move at the client level as a follow-up
This makes core fully independant from `codex-memories-write`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
MultiAgentV2 sessions need startup guidance that matches the role of the
thread that is actually being created. Root agents and subagents have
different responsibilities, and forked subagents can inherit parent
rollout history. If the parent hint is carried into the child context,
the child can see stale or conflicting developer guidance before its own
session-specific context is added.
## What changed
- Added `features.multi_agent_v2.root_agent_usage_hint_text` and
`features.multi_agent_v2.subagent_usage_hint_text` config fields,
including schema/config parsing support.
- Injected the matching root or subagent hint into the initial context
as its own developer message when `multi_agent_v2` is enabled.
- Filtered configured MultiAgentV2 usage-hint developer messages out of
forked parent history so a child thread receives fresh guidance for its
own session source/config.
- Added targeted coverage for config parsing, initial-context rendering,
feature-config deserialization, and forked-history filtering.
## Context examples
With this config:
```toml
[features.multi_agent_v2]
enabled = true
root_agent_usage_hint_text = "Root guidance."
subagent_usage_hint_text = "Subagent guidance."
```
A root thread initial context renders the root hint as a standalone
developer message:
```text
[developer]
<existing developer context, when present>
[developer]
Root guidance.
```
A subagent thread initial context renders the subagent hint instead:
```text
[developer]
<existing developer context, when present>
[developer]
Subagent guidance.
```
When a subagent forks parent history, any parent developer message whose
text exactly matches the configured MultiAgentV2 root or subagent hint
is omitted from the forked history before the child receives its fresh
subagent hint.
## Why
Addresses #9274
Running `codex update` currently starts an interactive Codex session
with `update` as the prompt. That is a rough edge for users who expect a
direct self-update command after seeing the existing update notice, and
it forces them to copy the suggested package-manager command manually.
## What changed
- Added a top-level `codex update` subcommand.
- Reused the existing install-channel detection and update command
runner that the TUI already uses for update prompts.
- Exposed the update-action lookup from `codex-tui` so the CLI can
invoke the same behavior.
- Added CLI coverage to ensure `codex update` is parsed as a subcommand
instead of becoming an interactive prompt.
## Verification
- `cargo test -p codex-cli`
- `cargo test -p codex-tui update_action::tests`
## Why
`PermissionProfile` is now the canonical internal permissions
representation, but the app-server wire shape is still intentionally
unstable while the migration continues. Stable app-server clients should
not see or generate code for these fields until the wire format settles.
## What changed
- Marks every app-server v2 field that sends `PermissionProfile` as
experimental, including `command/exec`, `thread/start`, `thread/resume`,
`thread/fork`, and `turn/start` request/response payloads.
- Enables per-field experimental inspection for `command/exec`, so
`permissionProfile` is gated without making the entire method
experimental.
- Fixes the generated TypeScript schema filter to be comment-aware. The
previous scanner treated apostrophes inside doc comments as string
delimiters, so some experimental fields leaked into stable TypeScript
even though stable JSON was filtered correctly.
## Verification
- `cargo test -p codex-app-server-protocol`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19899).
* #19900
* __->__ #19899
## Why
After thread sessions have a required `PermissionProfile`, the TUI no
longer needs to cache a separate legacy `SandboxPolicy` in
`ThreadSessionState`. Keeping the legacy field would reintroduce two
permission authorities in the session cache and make later
replay/switching logic easier to get wrong.
This PR keeps legacy app-server compatibility at the ingestion boundary:
old `sandbox` response values are still accepted, but they are
immediately converted to a cwd-anchored profile.
## What Changed
- Removes `ThreadSessionState.sandbox_policy`.
- Updates active-session permission syncing to write only the current
`PermissionProfile`.
- Updates thread-read/replay/test fixtures to use profiles as the cached
session permission source.
- Leaves legacy `sandbox` fields in app-server request/response protocol
paths unchanged; those are compatibility boundaries and are converted
before entering cached TUI state.
## Verification
- `cargo test -p codex-tui thread_session_state::tests --lib`
- `cargo test -p codex-tui
inactive_thread_started_notification_initializes_replay_session --lib`
- `cargo test -p codex-tui thread_events --lib`
- `just fix -p codex-tui`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19776).
* #19900
* #19899
* __->__ #19776
## Why
Remote TUI resume uses the app-server websocket client. That client
inherited tungstenite's default `16 MiB` frame limit, so a large saved
session could make `thread/resume` return a single JSON-RPC response
frame that the client rejected before the TUI could deserialize or
render it.
Fixes#19837
## What Changed
- Configure the remote app-server websocket client with a bounded `128
MiB` max frame/message size.
- Preserve the concrete remote worker exit reason when completing
pending requests after a transport/read failure instead of replacing it
with a generic channel-closed error.
- Add a regression test that sends a single `>16 MiB` JSON-RPC response
frame and verifies the typed request succeeds.
Note: This isn't a perfect fix. It really just moves the limit to a much
larger value. I looked at a bunch of other potential fixes (both
server-side and client-side), and they all involved significant
complexity, had backward-compatibility impact, or impacted performance
of common use cases. This simple fix should address the vast majority of
remote use cases.
## Verification
I reproed the problem locally using a long rollout. Verified that fix
addresses connection drop.
## Why
`ThreadConfigSnapshot` is used by app-server and thread metadata code as
a stable view of active runtime settings. Keeping both `sandbox_policy`
and `permission_profile` in the snapshot duplicates permission state and
makes it possible for the legacy projection to drift from the canonical
profile.
The legacy `sandbox` value is still needed at app-server compatibility
boundaries, so this PR derives it on demand from the snapshot profile
and cwd instead of storing it.
## What Changed
- Removes `ThreadConfigSnapshot.sandbox_policy`.
- Adds `ThreadConfigSnapshot::sandbox_policy()` as a compatibility
projection from `permission_profile` plus `cwd`.
- Updates app-server response/metadata code and tests to call the
projection only where legacy fields still exist.
- Keeps snapshot construction profile-only so split filesystem rules,
disabled enforcement, and external enforcement remain represented by the
canonical profile.
## Verification
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
dispatch_reclaims_stale_global_lock_and_starts_consolidation --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19775).
* #19900
* #19899
* #19776
* __->__ #19775
## Why
`SessionConfiguredEvent` is the internal event that tells clients what
permissions are active for a session. Emitting both `sandbox_policy` and
`permission_profile` leaves two possible authorities and forces every
consumer to decide which one to honor. At this point in the migration,
the profile is expressive enough to represent managed, disabled, and
external sandbox enforcement, so the internal event can be profile-only.
The wire compatibility concern is older serialized events or rollout
data that only contain `sandbox_policy`; those still need to
deserialize.
## What Changed
- Removes `sandbox_policy` from `SessionConfiguredEvent` and makes
`permission_profile` required.
- Adds custom deserialization so old payloads with only `sandbox_policy`
are upgraded to a cwd-anchored `PermissionProfile`.
- Updates core event emission and TUI session handling to sync
permissions from the profile directly.
- Updates app-server response construction to derive the legacy
`sandbox` response field from the active thread snapshot instead of from
`SessionConfiguredEvent`.
- Updates yolo-mode display logic to treat both
`PermissionProfile::Disabled` and managed unrestricted filesystem plus
enabled network as full-access, while still preserving the distinction
between no sandbox and external sandboxing.
## Verification
- `cargo test -p codex-protocol session_configured_event --lib`
- `cargo test -p codex-protocol serialize_event --lib`
- `cargo test -p codex-exec session_configured --lib`
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox
--lib`
- `cargo test -p codex-tui session_configured --lib`
- `cargo test -p codex-tui
yolo_mode_includes_managed_full_access_profiles --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19774).
* #19900
* #19899
* #19776
* #19775
* __->__ #19774
## Why
Fixes#19475.
`codex exec` can finish successfully and then emit an `ERROR` on stderr:
```text
failed to record rollout items: thread <id> not found
```
That happens because shutdown closes the live thread writer before
emitting `ShutdownComplete`. The terminal event was still using the
normal `send_event_raw` path, so it tried to append rollout items
through a recorder that had already been removed. The answer is correct,
but wrappers that treat stderr as failure can retry completed exec runs.
This looks like a likely recent regression from
[#18882](https://github.com/openai/codex/pull/18882), which routed live
thread writes through `ThreadStore` and added the shutdown-time live
writer close. I have not bisected this, so the PR treats #18882 as the
likely source based on the affected shutdown code path rather than a
proven first-bad commit.
## What Changed
`ShutdownComplete` now bypasses rollout persistence after thread
shutdown and is delivered directly to clients. The shutdown path still
records the protocol event in the rollout trace before delivery,
preserving trace visibility without attempting a post-shutdown
thread-store append.
The change also adds a regression test with the in-memory thread store
to assert that shutdown creates and shuts down the live thread without
appending another item after shutdown.
## Summary
Adds the standard Codex `User-Agent` to shared default headers so the
responses-api WS handshake carries the same client OS and version
context as HTTP requests.
## Testing
- `cargo test -p codex-core
build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
- `cargo test -p codex-core --test all responses_websocket`
## Summary
AgentIdentity auth previously registered the process task lazily behind
a `OnceCell`. That meant the auth object could be constructed before its
runtime task binding was known.
This PR makes AgentIdentity auth load the runtime task at auth load time
and stores the resulting process task id directly on the auth object.
The model-provider call path can then read a concrete task id instead of
handling a missing lazy value.
## Stack
1. [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762) (merged)
2. **This PR:** [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [fix: configure AgentIdentity AuthAPI base
URL](https://github.com/openai/codex/pull/19904)
4. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)
## Important call sites
| Area | Change |
| --- | --- |
| `AgentIdentityAuth::load` | Registers the process task during auth
loading and stores `process_task_id`. |
| `CodexAuth::from_agent_identity_jwt` | Awaits AgentIdentity auth
loading. |
| model-provider auth | Reads a concrete `process_task_id` instead of an
optional lazy value. |
| AgentIdentity auth tests | Mock task registration now covers eager
runtime allocation. |
## Design decisions
AgentIdentity auth now treats task registration as part of constructing
a usable auth object. That matches how callers use the value: once auth
is present, the model-provider path expects the task-scoped assertion
data to be ready.
## Testing
Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
- Marks `/title` and `/statusline` as available during active tasks.
- Extends the existing slash-command availability test coverage to
include these commands alongside `/goal`.
## Why
`ThreadSessionState` is the TUI's cached view of an app-server session.
To make `PermissionProfile` the canonical runtime permissions model,
cached thread sessions need to always have a profile instead of treating
the profile as an optional supplement to a legacy `sandbox` response
field.
The main compatibility concern is older app-server v2 lifecycle
responses that only include `sandbox` and omit `permissionProfile`:
- `thread/start` -> `ThreadStartResponse.sandbox`
- `thread/resume` -> `ThreadResumeResponse.sandbox`
- `thread/fork` -> `ThreadForkResponse.sandbox`
Those responses must still hydrate correctly when the TUI is pointed at
an older app-server. This PR converts the legacy `sandbox` value into a
`PermissionProfile` immediately at response ingestion time, using the
response `cwd`, so cached sessions do not carry an optional profile that
can later reinterpret cwd-bound grants against a different thread cwd.
This fallback is intentionally boundary compatibility. The follow-up PRs
in this stack continue the cleanup by making `SessionConfiguredEvent`
profile-only, deriving sandbox projections from snapshots only when an
API still needs them, and then removing `sandbox_policy` from
`ThreadSessionState`.
## What Changed
- Makes `ThreadSessionState.permission_profile` required.
- Converts legacy app-server response `sandbox` values into a
`PermissionProfile` at ingestion time using the response cwd.
- Ensures `thread/read` hydration does not reuse a primary session
profile that may be anchored to a different cwd; it uses the active
widget permission settings for the read thread fallback instead of
reusing cached primary-session permissions.
- Keeps the app-server request path unchanged: embedded sessions send
profiles, while remote sessions continue using legacy sandbox overrides
for compatibility.
## Verification
- `cargo test -p codex-tui thread_read --lib`
- `cargo test -p codex-tui
permission_settings_sync_preserves_active_profile_only_rules --lib`
- `cargo test -p codex-tui
resume_response_restores_turns_from_thread_items --lib`
- `cargo test -p codex-tui thread_session_state::tests --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19773).
* #19900
* #19899
* #19776
* #19775
* #19774
* __->__ #19773
## Summary
- Remove `ghost_snapshot` / `GhostCommit` from the Responses API surface
and generated SDK/schema artifacts.
- Keep legacy config loading compatible, but make undo a no-op that
reports the feature is unavailable.
- Clean up core history, compaction, telemetry, rollout, and tests to
stop carrying ghost snapshot items.
## Testing
- Unit tests passed for `codex-protocol`, `codex-core` targeted undo and
compaction flows, `codex-rollout`, and `codex-app-server-protocol`.
- Regenerated config and app-server schemas plus Python SDK artifacts
and verified they match the checked-in outputs.
## Why
Recent `main` CI had repeated flakes in the plugin fixture tests:
- `codex-core::all
suite::plugins::explicit_plugin_mentions_inject_plugin_guidance` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
- `codex-core::all suite::plugins::plugin_mcp_tools_are_listed` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
The failures were in the same plugin/MCP fixture family: assertions
expected sample plugin guidance or tool inventory, but the test could
observe the session before the sample MCP server had finished startup.
## Root Cause
`explicit_plugin_mentions_inject_plugin_guidance` submitted the user
turn immediately after constructing the session. MCP startup is
asynchronous, so on a slower or busier CI runner the prompt could be
built before the sample plugin MCP server had reported its tools. That
made the test depend on scheduler timing rather than the fixture being
ready.
`plugin_mcp_tools_are_listed` already needed the same readiness
condition, but its wait logic was local to that test.
## What Changed
- Added a shared `wait_for_sample_mcp_ready` helper for the plugin
fixture tests.
- Wait for `McpStartupComplete` before submitting the explicit plugin
mention turn.
- Reuse the same readiness helper in the MCP tool-listing test.
## Why This Should Be Reliable
The tests now wait for the explicit readiness signal from the sample MCP
server before asserting guidance or tools derived from that server. This
removes the startup race while still exercising the real fixture path,
so the assertions should only run after the plugin inventory is
deterministic.
## Verification
- `cargo test -p codex-core --test all plugins::`
- GitHub CI for this PR is passing.
## Summary
- Extracted the shared filesystem types and `ExecutorFileSystem` trait
into a new `codex-file-system` crate
- Switched `codex-config` and `codex-git-utils` to depend on that crate
instead of `codex-exec-server`
- Kept `codex-exec-server` re-exporting the same API for existing
callers
## Testing
- Ran `cargo test -p codex-file-system`
- Ran `cargo test -p codex-git-utils`
- Ran `cargo test -p codex-config`
- Ran `cargo test -p codex-exec-server`
- Ran `just fix -p codex-file-system`, `just fix -p codex-git-utils`,
`just fix -p codex-config`, `just fix -p codex-exec-server`
- Ran `just fmt`
- Updated and verified the Bazel module lockfile
## Summary
Disallow fileParams metadata for custom MCPs
Restricts Codex openai/fileParams handling to the first-party codex_apps
MCP server. Custom MCP servers may still advertise the metadata, but
Codex now ignores it for upload rewriting, preventing non-Apps tools
from receiving signed OpenAI file refs for local paths. Added a
regression test for the allowed and denied cases.
## Why
This continues the permissions migration by making legacy config default
resolution produce the canonical `PermissionProfile` first. The legacy
`SandboxPolicy` projection should stay available at compatibility
boundaries, but config loading should not create a legacy policy just to
immediately convert it back into a profile.
Specifically, when `default_permissions` is not specified in
`config.toml`, instead of creating a `SandboxPolicy` in
`codex-rs/core/src/config/mod.rs` and then trying to derive a
`PermissionProfile` from it, we use `derive_permission_profile()` to
create a more faithful `PermissionProfile` using the values of
`ConfigToml` directly.
This also keeps the existing behavior of `sandbox_workspace_write` and
extra writable roots after #19841 replaced `:cwd` with `:project_roots`.
Legacy workspace-write defaults are represented as symbolic
`:project_roots` write access plus symbolic project-root metadata
carveouts. Extra absolute writable roots are still added directly and
continue to get concrete metadata protections for paths that exist under
those roots.
The platform sandboxes differ when a symbolic project-root subpath does
not exist yet.
* **Seatbelt** can encode literal/subpath exclusions directly, so macOS
emits project-root metadata subpath policies even if `.git`, `.agents`,
or `.codex` do not exist.
* **bwrap** has to materialize bind-mount targets. Binding `/dev/null`
to a missing `.git` can create a host-visible placeholder that changes
Git repo discovery. Binding missing `.agents` would not affect Git
discovery, but it would still create a host-visible project metadata
placeholder from an automatic compatibility carveout. Linux therefore
skips only missing automatic `.git` and `.agents` read-only metadata
masks; missing `.codex` remains protected so first-time project config
creation goes through the protected-path approval flow. User-authored
`read` and `none` subpath rules keep normal bwrap behavior, and `none`
can still mask the first missing component to prevent creation under
writable roots.
## What Changed
- Adds profile-native helpers for legacy workspace-write semantics,
including `PermissionProfile::workspace_write_with()`,
`FileSystemSandboxPolicy::workspace_write()`, and
`FileSystemSandboxPolicy::with_additional_legacy_workspace_writable_roots()`.
- Makes `FileSystemSandboxPolicy::workspace_write()` the single legacy
workspace-write constructor so both `from_legacy_sandbox_policy()` and
`From<&SandboxPolicy>` include the project-root metadata carveouts.
- Removes the no-carveout `legacy_workspace_write_base_policy()` path
and the `prune_read_entries_under_writable_roots()` cleanup that was
only needed by that split construction.
- Adds `ConfigToml::derive_permission_profile()` for legacy sandbox-mode
fallback resolution; named `default_permissions` profiles continue
through the permissions profile pipeline instead of being reconstructed
from `sandbox_mode`.
- Updates `Config::load()` to start from the derived profile, validate
that it still has a legacy compatibility projection, and apply
additional writable roots directly to managed workspace-write filesystem
policies.
- Updates Linux bwrap argument construction so missing automatic
`.git`/`.agents` symbolic project-root read-only carveouts are skipped
before emitting bind args; missing `.codex`, user-authored `read`/`none`
subpath rules, and existing missing writable-root behavior are
preserved.
- Adds coverage that legacy workspace-write config produces symbolic
project-root metadata carveouts, extra legacy workspace writable roots
still protect existing metadata paths such as `.git`, and bwrap skips
missing `.git`/`.agents` project-root carveouts while preserving missing
`.codex` and user-authored missing subpath rules.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19772).
* #19776
* #19775
* #19774
* #19773
* __->__ #19772
## Why
The remaining review, interrupt, fuzzy search, feedback, and git-diff
handlers still had local send-error branches that obscured otherwise
simple request handling. This final slice flattens those handlers
without changing the public protocol behavior.
## What Changed
- Streamlined review start, turn interrupt, fuzzy search session,
feedback upload, and git diff handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Converted validation and upload failures into returned JSON-RPC errors
where that avoids nested `send_error`/`return` blocks.
- Left unrelated sandbox setup and notification code untouched.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::review --
--test-threads=1`
## Summary
- Add the `enable_mcp_apps` feature flag to the `codex-features`
registry
- Keep it under development and disabled by default
## Testing
- Unit tests for `codex-features` passed
- Formatting passed