## Why
Extensions can observe thread and turn lifecycle events today, but there
was no single host-owned hook for changes to the effective thread
configuration. That makes features that need to react to model,
permission, or tool-suggest updates either depend on individual mutation
paths or risk going stale after runtime config refreshes.
This adds a typed config-change contributor so extension-owned state can
stay synchronized with the effective thread config while the host
remains responsible for deciding when config changed.
## What Changed
- Added `ConfigContributor<C>` to `codex_extension_api`, with
before/after immutable snapshots of the effective config plus
session/thread extension stores.
- Added registry builder/accessor support through `config_contributor`
and `config_contributors`.
- Emits config-change callbacks after committed updates from session
settings, per-turn setting updates, and `refresh_runtime_config`.
- Builds effective config snapshots only when config contributors are
registered, and suppresses no-op callbacks when the before/after
snapshots are equal.
- Added a core session regression test that verifies contributors
observe both model changes and user-layer runtime config changes,
including access to session and thread extension stores.
## Validation
Added `config_change_contributor_observes_effective_config_changes` in
`codex-rs/core/src/session/tests.rs` to cover the new contributor path.
## Why
Spawned agents can already override `model` and `reasoning_effort`, but
they have no equivalent way to opt into a model-supported service tier.
That makes it impossible to preserve or intentionally select tiered
execution behavior when delegating work to a sub-agent, even though the
model catalog already advertises supported `service_tiers`.
## What changed
- Add optional `service_tier` to both legacy and `MultiAgentV2`
`spawn_agent` tool inputs.
- Show each picker-visible model's supported service tier ids and
descriptions in the `spawn_agent` tool guidance.
- Resolve service tier selection after the child agent's effective model
is known.
- Inherit the parent tier when omitted and still supported by the final
child model; otherwise clear it.
- Reject explicit unsupported tier requests with a model-facing error.
- Keep explicit `service_tier` usable on full-history forks, while still
honoring the existing model/reasoning fork restrictions.
- Hide `service_tier` alongside other spawn metadata when
`hide_spawn_agent_metadata` is enabled.
## Verification
Added focused coverage for:
- v1/v2 `spawn_agent` schema exposure for `service_tier`
- tier descriptions in spawn guidance
- hidden-metadata suppression
- explicit supported tier selection
- explicit unknown and unsupported tier rejection
- inherited tier preservation or clearing based on child-model support
- full-history fork acceptance for explicit service tiers in both v1 and
v2
Local Rust tests were not run in this workspace per repo guidance; the
new coverage is included for CI.
## Why
We added Zellij-specific TUI workarounds because older Zellij behavior
did not work with Codex's normal terminal model:
- #8555 made `tui.alternate_screen = "auto"` disable alternate screen in
Zellij so transcript history stayed available.
- #16578 avoided scroll-region operations in Zellij by emitting raw
newlines and using a separate composer styling path.
This PR removes both workarounds because the latest Zellij release
tested locally (`zellij 0.44.1`) works correctly with Codex's standard
TUI behavior: normal alternate-screen handling, redraw, and history
insertion.
## What Changed
- Removed the `InsertHistoryMode::Zellij` path and the Zellij-only
newline scrollback insertion behavior.
- Removed cached `is_zellij` state from the TUI and composer.
- Removed Zellij-specific composer styling, the helper snapshot, and the
`TerminalInfo::is_zellij()` convenience method that only served this
workaround.
- Changed `tui.alternate_screen = "auto"` to use alternate screen for
Zellij too; `--no-alt-screen` and `tui.alternate_screen = "never"` still
preserve the inline mode escape hatch.
- Updated the generated config schema description for
`tui.alternate_screen`.
## How to Test
Manual smoke path used with `zellij 0.44.1`:
1. Build and run this branch inside a Zellij `0.44.1` session with
default config.
2. Start Codex normally and produce enough assistant/tool output to
create scrollback.
3. Confirm the transcript remains readable, the composer renders
normally, and scrolling through terminal history works.
4. Resize the Zellij pane while output exists and confirm the TUI
redraws without duplicated, missing, or stale rows.
5. Compare with `--no-alt-screen` or `-c tui.alternate_screen=never` if
you want to verify the inline fallback still works.
Targeted tests:
- `just write-config-schema`
- `just fmt`
- `just fix -p codex-tui`
- `cargo test -p codex-terminal-detection`
- `cargo test -p codex-tui alternate_screen_auto_uses_alt_screen`
Attempted but did not complete locally:
- `cargo test -p codex-tui` built and ran the new test successfully,
then failed later on unrelated local failures in
`status_permissions_full_disk_managed_*` and a stack overflow in
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all`.
## Documentation
No developers.openai.com Codex documentation update is needed for this
revert.
## Summary
- add a scoped level_id to ExtensionData and expose it through
level_id()
- remove thread_id/turn_id parameters from extension contributor inputs
where the scoped ExtensionData already carries that identity
- move turn-scoped extension data onto TurnContext so token usage and
lifecycle contributors can share the same turn store
## Testing
- cargo check -p codex-extension-api -p codex-core --tests
- cargo test -p codex-extension-api
- cargo test -p codex-guardian
- cargo test -p codex-core --lib
record_token_usage_info_notifies_extension_contributors
- cargo test -p codex-core --lib
submission_loop_channel_close_emits_thread_stop_lifecycle
- cargo test -p codex-core --lib
submission_loop_channel_close_aborts_active_turn_before_thread_stop_lifecycle
- just fix -p codex-extension-api
- just fix -p codex-guardian
- just fix -p codex-core
- just fmt
## Note
- Attempted cargo test -p codex-core; it aborted in
agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns
with the existing stack overflow before the full suite completed.
## Why
Extensions need a stable place to observe token accounting after Codex
folds model-provider usage into the session's cached `TokenUsageInfo`.
Without a contributor hook, extension-owned features that need last-turn
or cumulative token usage have to duplicate session plumbing or infer
state from client-facing `TokenCount` notifications.
## What changed
- Added `TokenUsageContributor` to `codex-extension-api`, passing
session/thread `ExtensionData`, `ThreadId`, turn id, and the current
`TokenUsageInfo`.
- Added registry builder/storage support for token-usage contributors.
- Invoked registered contributors from
`Session::record_token_usage_info` after the session token cache is
updated and before the client `TokenCount` notification is emitted.
## Testing
- Added `record_token_usage_info_notifies_extension_contributors`,
covering cumulative token usage updates and access to both extension
stores.
## Why
The thread lifecycle contributor hooks from #22476 should observe every
session teardown. The explicit `Op::Shutdown` path already emitted
`on_thread_stop`, but when `submission_loop` exited because its
submission channel closed, it only tore down runtime services. That
meant extensions could miss the thread-stop lifecycle signal on implicit
runtime shutdown.
## What Changed
- Split shared runtime teardown into `shutdown_runtime_services(...)`.
- Split thread-stop lifecycle emission into
`emit_thread_stop_lifecycle(...)`.
- Reused those helpers from both explicit shutdown and the channel-close
shutdown path.
- Tracked whether `Op::Shutdown` was received so the explicit path does
not double-emit lifecycle events after it exits the loop.
- Added a regression test that closes the submission channel and asserts
`ThreadLifecycleContributor::on_thread_stop` runs once with the expected
thread/session stores.
## Testing
- `cargo test -p codex-core
submission_loop_channel_close_emits_thread_stop_lifecycle`
## Why
Extensions can already contribute prompt, tool, turn-item, and
thread-lifecycle behavior, but there was no explicit host-owned hook for
per-turn setup and cleanup. That makes extension-private turn state
awkward: an extension either has to stash it outside the turn lifecycle
or depend on core runtime objects.
This adds a small turn lifecycle boundary. Extensions receive stable
identifiers plus the existing session, thread, and turn `ExtensionData`
stores, while core keeps owning task scheduling, cancellation, and turn
teardown.
## What Changed
- Added `TurnLifecycleContributor` with `on_turn_start`, `on_turn_stop`,
and `on_turn_abort` callbacks in `codex-rs/ext/extension-api`.
- Added typed `TurnStartInput`, `TurnStopInput`, and `TurnAbortInput`
payloads that expose `thread_id`, `turn_id`, `session_store`,
`thread_store`, and `turn_store`.
- Registered and re-exported turn lifecycle contributors through
`ExtensionRegistry` and `ExtensionRegistryBuilder`.
- Wired `Session` to emit turn start, stop, and abort callbacks from the
existing turn/task lifecycle paths.
- Carried the turn-scoped `ExtensionData` through `RunningTask` and
`RemovedTask` so stop/abort callbacks receive the same turn store
created at turn start.
## Verification
- Not run locally.
## Why
Extensions that need thread-scoped state currently only get a start-time
callback. That is enough for seeding stores, but it leaves the host
without a shared extension seam for later thread rehydrate and flush
work as thread ownership evolves. This PR turns that start-only seam
into a host-owned thread lifecycle contributor contract so
extension-private state can stay behind the extension API instead of
leaking extra orchestration through core.
## What changed
- Replaced `ThreadStartContributor` with `ThreadLifecycleContributor`
and added typed lifecycle inputs for thread start, resume, and stop. The
contract lives in
[`contributors/thread_lifecycle.rs`](d0e9211f70/codex-rs/ext/extension-api/src/contributors/thread_lifecycle.rs (L1-L64)).
- Kept the existing start-time behavior intact by routing session
construction through `on_thread_start`.
- Invoked `on_thread_stop` during session shutdown before thread-scoped
extension state is dropped, while isolating contributor failures behind
warning logs.
- Migrated `git-attribution` and `guardian` onto the lifecycle
registration path.
- Renamed the extension registry plumbing from start-specific
contributors to lifecycle-specific contributors.
## Notes
`on_thread_resume` is introduced at the API boundary here so extensions
can target the final lifecycle shape; host resume dispatch can be wired
where that runtime path is finalized.
## Why
Extension tools were split across two public runtime contracts:
`codex-tool-api` exposed `ToolBundle` plus its own call/spec/error
types, while core native tools used `codex_tools::ToolExecutor`. That
made contributed tool specs and execution behavior easy to drift apart
and added another crate boundary for what should be one executable-tool
seam.
This PR makes `ToolExecutor` the single runtime contract and keeps
extension-specific pinning in `codex-extension-api`.
## Remaining todo
https://github.com/openai/codex/pull/22369/changes#diff-b935ea8245c3ce568a30cff660175fa6390b66b872ae409e1e2e965738250741R5
Either generic `Invocation` or sub-extract the `ToolCall` and clean
`ToolInvocation`
## What changed
- Removed the `codex-tool-api` workspace crate and its dependencies from
core and `codex-extension-api`.
- Made `codex_tools::ToolExecutor` object-safe with `async_trait` so
extension contributors can return a dyn executor.
- Added the extension-facing aliases under
`ext/extension-api/src/contributors/tools.rs`, including
`ExtensionToolExecutor = dyn ToolExecutor<ToolCall, Output =
ExtensionToolOutput>`.
- Changed `ToolContributor::tools` to return extension executors
directly instead of `ToolBundle`s.
- Updated core’s extension tool handler/registry/router path to adapt
those extension executors into the existing native `ToolInvocation`
runtime path.
- Added focused coverage for extension tools being registered,
model-visible, dispatchable, and not replacing built-in tools.
## Verification
- `cargo test -p codex-tools`
- `cargo test -p codex-extension-api`
## Why
Codex still models model-visible tools and executable behavior largely
inside `codex-core`, which makes it harder to evolve the tool system
toward a single reusable abstraction for built-ins, MCP-backed tools,
dynamic tools, and later tools injected from outside core.
This PR takes the next incremental step in that direction by moving the
common execution-facing pieces out of core and separating them from
core-only orchestration. The intent is to let shared tool abstractions
improve in one place, while `codex-core` keeps the parts that are still
inherently host-specific today, such as `ToolInvocation`, dispatch
wiring, and hook integration.
This PR is mostly moving things around. The only interesting piece is
this abstraction:
https://github.com/openai/codex/pull/22359/changes#diff-81af519002548ba51ed102bdaaf77e081d40a1e73a6e5f9b104bbbc96a6f1b3dR13
## What changed
- Added `codex_tools::ToolExecutor<Invocation>` as the shared execution
trait for model-visible tools.
- Moved the reusable execution support types from `codex-core` into
`codex-tools`:
- `FunctionCallError`
- `ToolPayload`
- `ToolOutput`
- Refactored core tool implementations so that execution behavior lives
on `ToolExecutor<ToolInvocation>`, while `ToolHandler` remains the
core-local extension point for hook payloads, telemetry tags, diff
consumers, and other orchestration concerns.
- Kept the registry and dispatch flow behaviorally unchanged while
making the shared/extracted boundary explicit across built-in, MCP,
dynamic, extension-backed, shell, and multi-agent tool handlers.
## Verification
- `cargo test -p codex-tools`
- `just fix -p codex-tools`
- `just fix -p codex-core`
- `cargo test -p codex-core` progressed through the updated tool
surfaces and then hit the existing unrelated multi-agent stack overflow
in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
## Summary
- move the `view_image` sandbox filesystem-read unit test onto a
temporary cwd
- keep the turn cwd and selected turn environment cwd aligned inside the
test
- avoid leaving `core/image.png` behind in the repo checkout after the
test runs
## Root cause
The test wrote `image.png` beneath `turn.cwd`, and the shared session
test helper defaults that cwd to the current repo directory when no
override is provided.
## Validation
- `just fmt`
- `cargo test -p codex-core
tools::handlers::view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads`
# Why
Hook trust happens through the TUI in `/hooks` so it can block
non-interactive use cases. This flag will allow users that are using
codex headlessly to bypass hooks when they want to.
# What
This adds one invocation-scoped escape hatch.
- the CLI flag sets a runtime-only `bypass_hook_trust` override; there
is no durable `config.toml` setting
- hook discovery still respects normal enablement, so explicitly
disabled hooks remain disabled
- we show a `--dangerously-bypass-hook-trust is enabled. Enabled hooks
may run without review for this invocation.` message on startup so
accidental use is visible in both interactive and exec flows
This keeps “enabled” and “trusted” as separate concepts in the normal
path, while giving CI/E2E callers a stable way to opt into the
exceptional path when they already control the hook set.
# Why
Linked worktrees currently load their own project hook declarations, so
the same repo can present different hook definitions depending on which
checkout is active. https://github.com/openai/codex/pull/21762 tried to
share trust by giving matching worktree hooks a shared synthetic key,
but review pointed out that divergent worktree hook definitions would
then fight over one `trusted_hash`.
Instead of introducing a second trust model, this makes linked worktrees
use the root checkout as the single source of truth for project hook
declarations. Worktree-local project config can still diverge for
unrelated settings, but project hooks now keep one real source path and
one trust state per repo.
# What
- Teach project config loading to remember the matching root-checkout
`.codex/` folder for actual linked-worktree project layers.
- Keep ordinary project config sourced from the worktree, but replace
project hook declarations with the root checkout's matching layer before
hook discovery runs, including linked-worktree layers with `.codex/` but
no local `config.toml`.
- Make hook discovery use that authoritative hook folder for both
`hooks.json` and TOML hook source paths, so linked worktrees produce the
same hook key and trust state as the root checkout.
- Cover the linked-worktree path plus regressions for missing worktree
`config.toml` and nested non-worktree project roots.
## Why
`UnavailableDummyTools` kept synthetic placeholder tools alive for
historical tool calls whose backing MCP tool was no longer available.
That path adds stale model-visible tool specs and special routing at the
point where unavailable MCP calls should use ordinary current-tool
handling. This removes the runtime backfill instead of preserving a
second compatibility lane.
## Is it safe to remove?
The unavailable tools were added in #17853 after a CS issue when a
previously-called MCP tool failed to load and was omitted from the CS
spec. Now that we have tool search, I think this is resolved:
- API merges tools from previous TST output into effective tool set so
theyre always in CS spec
- if an MCP tool surfaced by TST later becomes unavailable, the model
can still call it and it will just return model-visible error
- both TST output and function call output are dropped on compaction so
model will not remember old calls to MCP post compaction
## What changed
- Delete unavailable-tool collection, placeholder handler, router/spec
plumbing, and obsolete placeholder coverage.
- Keep `features.unavailable_dummy_tools` as a removed no-op feature
tombstone so existing configs still parse cleanly.
- Add an integration-style `tool_search` regression test showing that a
deferred MCP tool surfaced through `tool_search` still routes through
MCP and returns a model-visible tool-call error rather than `unsupported
call`.
## Verification
- `cargo test -p codex-core tool_search`
## Why
This builds on the handler-owned spec refactor by moving deferred
tool-search metadata to the same handlers that already own tool specs.
The registry builder no longer needs a separate prebuilt
`tool_search_entries` path; it can collect searchable entries from
deferred handlers directly.
## What changed
- Added `search_info()` to tool handlers and implemented it for MCP and
dynamic handlers.
- Reused handler `spec()` output when constructing tool-search entries,
adapting it into the deferred `LoadableToolSpec` shape expected by
`tool_search`.
- Simplified `build_tool_registry_builder(...)` so `tool_search`
registration is based on deferred handlers with search info.
- Removed the old standalone search-entry builders and now-unused
`codex-tools` discovery helper exports.
## Verification
- `cargo test -p codex-core tools::handlers::tool_search::tests:: --
--nocapture`
- `cargo test -p codex-core tools::spec_plan::tests::search_tool --
--nocapture`
- `cargo test -p codex-core tools::spec::tests:: -- --nocapture`
- `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture`
- `cargo test -p codex-tools`
- `just fix -p codex-core`
- `just fix -p codex-tools`
## Why
Code mode already builds the merged nested `ToolSpec`s that feed the
`exec` prompt. Keeping a separate `tool_namespaces` map in the planning
path duplicated that metadata and left extra wrapper plumbing in
`spec.rs`.
## What changed
- derive code-mode namespace descriptions from the merged
`ToolSpec::Namespace` entries before building the code-mode handlers
- extract `build_code_mode_handlers(...)` so the code-mode-specific
planning stays in one place
- remove `tool_namespaces` from `ToolRegistryBuildParams`
- delete the now-unused `McpToolPlanInputs` wrapper and related test
helper plumbing
## Testing
- `cargo test -p codex-core spec_plan`
## Why
`CODEX_RS_SSE_FIXTURE` let integration-style CLI, exec, and TUI tests
bypass the normal Responses transport by reading SSE from local files.
That kept test-only behavior wired through production client code. The
affected tests can stay hermetic by using the existing
`core_test_support::responses` mock server and passing `openai_base_url`
instead.
## What Changed
- Removed the `CODEX_RS_SSE_FIXTURE` flag,
`codex_api::stream_from_fixture`, the `env-flags` dependency, and the
checked-in SSE fixture files.
- Repointed the affected core, exec, and TUI tests at `MockServer` with
the existing SSE event constructors.
- Removed the Bazel test data plumbing for the deleted fixtures and
refreshed cargo/Bazel lock state.
## Verification
- `cargo build -p codex-cli`
- `cargo test -p codex-api`
- `cargo test -p codex-core --test all responses_api_stream_cli`
- `cargo test -p codex-core --test all
integration_creates_and_checks_session_file`
- `cargo test -p codex-exec --test all ephemeral`
- `cargo test -p codex-exec --test all resume`
- `cargo test -p codex-tui --test all
resume_startup_does_not_consume_model_availability_nux_count`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just fix -p codex-api -p codex-core -p codex-exec -p codex-tui`
- `git diff --check`
## Why
Enterprise-managed hook policy needs a narrow way to require Codex to
ignore user-controlled lifecycle hooks without adopting the broader
trust-precedence model from earlier hook work. This keeps the policy
anchored in `requirements.toml`, so admins can opt into managed hooks
only while normal `config.toml` files cannot enable the restriction
themselves.
## What changed
- Added `allow_managed_hooks_only` to the requirements data flow and
preserved explicit `false` values.
- Also adds it to /debug-config
- Marked MDM, system, and legacy managed config layers as managed for
hook discovery.
- Updated hook discovery so `allow_managed_hooks_only = true`:
- keeps managed requirements hooks and managed config-layer hooks,
- skips user/project/session `hooks.json` and `[hooks]` entries with
concise startup warnings,
- skips current unmanaged plugin hooks,
- ignores any `allow_managed_hooks_only` key placed in ordinary
`config.toml` layers.
## Why
hook semantics treat `session_id` as shared across a root session and
its subagents. Codex hooks were still emitting the current thread ID,
which made spawned agents look like independent sessions and made it
harder for hook integrations to correlate work across a root thread and
its spawned helpers
This change makes hooks use Codex's existing shared session identity so
hook `session_id` matches the root-thread session across spawned
subagents.
## What Changed
- switch hook payloads to use the existing shared session identity from
core instead of the current thread ID
- cover all hook surfaces that expose `session_id`, including
`SessionStart`, tool hooks, compact hooks, prompt-submit hooks, stop
hooks, and legacy after-agent dispatch
## Why
Guardian review selection was hard-coded in `core`, which worked for the
default OpenAI path but did not give provider implementations a way to
choose backend-specific reviewer model IDs. That matters for Amazon
Bedrock: guardian review should run through the Bedrock/Mantle provider
using Bedrock's `openai.gpt-5.4` model ID, instead of accidentally
selecting a reviewer model that implies the OpenAI backend.
## What Changed
- Added provider-owned approval review model selection via
`ModelProvider::approval_review_model_selection`.
- Moved the existing default selection policy into the provider
abstraction: prefer the requested reviewer model when it is available,
otherwise fall back to the active turn model, preferring `Low` reasoning
when supported.
- Added an Amazon Bedrock override that pins guardian review to
`openai.gpt-5.4` with `Low` reasoning.
## Why
`tool_search` still carries the server-specific result-cap path added in
#17684 for `computer-use`: when the model omitted `limit`, a matching
result expanded the search to 20 and then `limit_results_by_bucket`
applied per-bucket caps. That makes default result handling depend on a
one-off server exception instead of the single
`TOOL_SEARCH_DEFAULT_LIMIT` path.
This PR removes that custom branch so omitted `limit` values use the
ordinary global default consistently. The implementation being retired
is the pre-change bucketed search path in
[`tool_search.rs`](5e3ee5eddf/codex-rs/core/src/tools/handlers/tool_search.rs (L121-L190)).
## What changed
- Collapse `ToolSearchHandler::search` back to one BM25 search with the
resolved limit.
- Remove `limit_results_by_bucket`, the `computer-use` constants, and
the omitted-limit plumbing that only existed for the override.
- Drop dead `ToolSearchEntry::limit_bucket` metadata from deferred MCP
and dynamic search entries.
- Remove tests and helpers that only asserted the deleted override
behavior.
- Add direct handler-level unit coverage for omitted/default and
explicit `tool_search` result limits.
## Validation
- `cargo test -p codex-core tool_search`
- The matching unit tests passed, including the new omitted/default and
explicit result-limit coverage.
- The broader `--test all` search-tool fixture phase then failed before
sending mocked response requests in
`tool_search_indexes_only_enabled_non_app_mcp_tools` and
`tool_search_uses_non_app_mcp_server_instructions_as_namespace_description`.
- `cargo test -p codex-core`
- The touched tool-search coverage passed before the run later aborted
in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
with a stack overflow.
- make ThreadStore::update_thread_metadata accept a broad range of
metadata patches
- keep ThreadStore::append_items as raw canonical history append (no
metadata side effects)
- in the local store, write these metadata updates to a combination of
sqlite and rollout jsonl files for backwards-compat. It special cases
which fields need to go into jsonl vs sqlite vs whatever, confining the
awkwardness to just this implementation
- in remote stores we can simply persist the metadata directly to a
database, no special casing required.
- move the "implicit metadata updates triggered by appending rollout
items" from the RolloutRecorder (which is local-threadstore-specific) to
the LiveThread layer above the ThreadStore, inside of a private helper
utility called ThreadMetadataSync. LiveThread calls ThreadStore
append_items and update_metadata separately.
- Add a generic update metadata method to ThreadManager that works on
both live threads and "cold" threads
- Call that ThreadManager method from app server code, so app server
doesn't need to worry about whether the thread is live or not
## Why
This is the base PR in the split stack for the permissions migration. It
isolates stack-safety work that had been mixed into the larger
permissions PR, so reviewers can evaluate the async-future changes
separately from the permissions model changes in #22267.
The main risk this addresses is large or recursive multi-agent futures
overflowing smaller runner stacks. A follow-up review also called out
that `shutdown_live_agent` must remain quiescent: callers should not
remove a live agent from tracking or release its spawn slot until the
worker loop has actually terminated.
## What Changed
- Boxes the large async futures in the multi-agent spawn, resume, and
close tool handlers.
- Boxes the `AgentControl` spawn and recursive close/shutdown paths that
can otherwise build very deep futures.
- Keeps `shutdown_live_agent` waiting for thread termination before
removing/releasing the live agent, preserving the previous shutdown
ordering while still boxing the recursive close path.
## Verification Strategy
The focused local coverage was `cargo test -p codex-core multi_agents`,
which exercises the multi-agent spawn/resume/close handlers, cascade
close/resume behavior, and the shutdown path touched by this PR.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22266).
* #22330
* #22329
* #22328
* #22327
* __->__ #22266
## Summary
This refactor makes tool handlers the owner of the specs they can
publish, so registry construction can register handlers once and
separately publish only the specs that should be model-visible.
The main motivation is deferred tools: MCP and dynamic tools still need
handlers registered up front, but deferred tools should be discoverable
through `tool_search` rather than emitted in the initial tool spec list.
## What changed
- `McpHandler` and `DynamicToolHandler` can return their own `ToolSpec`.
- `build_tool_registry_builder` now collects handlers, registers them
through the no-spec path, and publishes only non-deferred handler specs.
- Deferred MCP and dynamic tool names are combined into one
`all_deferred_tools` set that drives spec filtering, code-mode
deferred-tool signaling, and `tool_search` registration.
- `tool_search` registration now requires both deferred tools and
`namespace_tools`.
- Namespace specs are merged in `spec_plan`, preserving top-level spec
order, sorting tools within each namespace, and backfilling empty
namespace descriptions.
- Hosted web search and image-generation specs are included in the
collected spec vector before namespace merge/publication, and tool-name
tests that should not care about hosted relative order now compare sets.
## Testing
- `cargo test -p codex-core tools::spec::tests:: -- --nocapture`
- `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture`
- `cargo test -p codex-core
tools::router::tests::specs_filter_deferred_dynamic_tools --
--nocapture`
- `cargo test -p codex-core
suite::prompt_caching::prompt_tools_are_consistent_across_requests --
--nocapture`
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-core -- --skip
tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
passed the library suite after skipping the known stack-overflowing unit
test.
Full `cargo test -p codex-core` currently hits a stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`;
the same focused test reproduces on `origin/main`.
## Why
Code mode only used nested spec lookup at execution time to rediscover
whether a nested tool should be invoked as a function tool or a freeform
tool.
That information is already present in the enabled tool metadata that
code mode builds to expose `tools.*` and `ALL_TOOLS`, so re-looking it
up from the router was redundant and kept execution coupled to a
separate spec lookup path.
## What Changed
- thread `CodeModeToolKind` through the code-mode runtime `ToolCall`
event and `CodeModeNestedToolCall`
- emit the nested tool kind directly from the V8 callback using the
already-enabled tool metadata
- build nested tool payloads from the propagated kind instead of calling
`find_spec`
- remove the now-unused `find_spec` plumbing from the router and
parallel runtime helpers
- add unit coverage for function vs freeform payload shaping and update
affected router tests
## Testing
- `cargo test -p codex-code-mode`
- `cargo test -p codex-core code_mode::tests`
- `cargo test -p codex-core
extension_tool_bundles_are_model_visible_and_dispatchable`
- `cargo test -p codex-core
model_visible_specs_filter_deferred_dynamic_tools`
## Summary
Adds include_collaboration_mode_instructions, which is a config
equivalent to include_permissions_instructions for collaboration modes.
Desired for situations where we want to disable this instruction from
entering the context
## Testing
- [x] Added unit test
## Why
Tool dispatch had two serialization mechanisms:
- `supports_parallel_tool_calls` decides whether a tool participates in
the shared parallel-execution lock.
- `is_mutating` separately gated some calls inside dispatch.
That second hook no longer carried its weight. The remaining
parallel-support flag is already the per-tool concurrency policy, so
keeping a second mutating gate made dispatch harder to follow and left
behind extra session plumbing that only existed for that path.
## What changed
- Removed `is_mutating` from tool handlers and deleted the
`tool_call_gate` path that existed only to support it.
- Simplified dispatch and routing to rely on the existing per-tool
`supports_parallel_tool_calls` boolean.
- Dropped the now-unused handler overrides and related session/test
scaffolding.
- Kept the router/parallel tests focused on the surviving per-tool
behavior.
- Removed the unused `codex-utils-readiness` dependency from
`codex-core` as a follow-up fix for `cargo shear`.
## Testing
- `cargo test -p codex-core
parallel_support_does_not_match_namespaced_local_tool_names`
- `cargo test -p codex-core mcp_parallel_support_uses_handler_data`
- `cargo test -p codex-core
tools_without_handlers_do_not_support_parallel`
## Summary
- tighten unified exec sandbox initialization
- preserve the requested process workdir independently from sandbox
setup
- add regression coverage for the updated invariant
## Validation
- Ran `/tmp/cargo-tools/bin/just fmt`.
- Ran the targeted `codex-core` regression test successfully.
- Ran `cargo test -p codex-core`; it did not complete cleanly because
unrelated existing agent/config-loader tests failed and the run later
aborted on a stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
Co-authored-by: Codex <noreply@openai.com>
## Why
The Codex App has animated pets, but the TUI had no equivalent ambient
companion surface. This brings that experience into terminal Codex while
keeping the main chat flow usable: the pet should feel present, but it
cannot cover transcript text, composer input, approvals, or picker
content.
The feature also needs to be terminal-aware. Different terminals support
different image protocols, tmux can interfere with image rendering, and
some users will want pets disabled entirely or anchored differently
depending on their layout.
<table>
<tr><td>
<img width="4110" height="2584" alt="CleanShot 2026-05-05 at 12 41
45@2x"
src="https://github.com/user-attachments/assets/68a1fcbc-2104-48d6-b834-69c6aaa95cdf"
/>
<p align="center">macOS - Ghostty, iTerm2 and WezTerm with Custom
Pet</p>
</td></tr>
<tr><td>
![Uploading CleanShot 2026-05-10 at 20.28.30.png…]()
<p align="center">Windows Terminal</p>
</td></tr>
<tr><td>
<img width="3902" height="2752" alt="CleanShot 2026-05-05 at 12 39
02@2x"
src="https://github.com/user-attachments/assets/300e2931-6b00-467e-91cb-ab8e28470500"
/>
<p align="center">Linux - WezTerm and Ghostty</p>
</td></tr>
</table>
## What Changed
- Add a TUI ambient pet renderer in `codex-rs/tui/src/pets/`.
- Port the app-style pet animation states so the sprite changes with
task status, waiting-for-input states, review/ready states, and
failures.
- Add `/pets` selection UI with a preview pane, loading state, built-in
pet choices, and a first-row `Disable terminal pets` option.
- Download built-in pet spritesheets on demand from the same public CDN
path already used by Android, under
`https://persistent.oaistatic.com/codex/pets/v1/...`, and cache them
locally under `~/.codex/cache/tui-pets/`.
- Keep custom pets local.
- Add config support for pet selection, disabling pets, and choosing
whether the pet follows the composer bottom or anchors to the terminal
bottom.
- Reserve layout space around the pet so transcript wrapping, live
responses, and composer input do not render underneath the sprite.
- Gate image rendering by terminal capability, disable image pets under
tmux, and support both Kitty Graphics and SIXEL terminals.
- Add redraw cleanup for terminal image artifacts, including sixel cell
clearing.
## Current Scope
- This is an initial TUI version of ambient pets, not full App parity.
- It focuses on ambient sprite rendering, `/pets` selection, custom
pets, terminal capability gating, and on-demand CDN-backed built-in
assets.
- The ambient text overlay is currently disabled, so the TUI renders the
pet sprite without extra status text beside it.
## How to Test
1. Start Codex TUI in a terminal with image support.
2. Run `/pets`.
3. Confirm the picker shows built-in pets plus custom pets, and the
first item is `Disable terminal pets`.
4. On a fresh `~/.codex/cache/tui-pets/`, move onto a built-in pet and
confirm the first preview downloads the spritesheet from the shared
Codex pets CDN and renders successfully.
5. Move through the pet list and confirm subsequent built-in previews
use the local cache.
6. Select a pet, then send and receive messages. Confirm transcript and
composer text wrap before the pet instead of rendering underneath the
sprite.
7. Change the pet anchor setting and confirm the pet can either follow
the composer bottom or sit at the terminal bottom.
8. Return to `/pets`, choose `Disable terminal pets`, and confirm the
sprite disappears cleanly.
Targeted tests:
- `cargo test -p codex-tui ambient_pet_`
- `cargo test -p codex-tui
resize_reflow_wraps_transcript_early_when_pet_is_enabled`
- `cargo insta pending-snapshots`
Part 1 of guardian as extension. This bind all the logic to spawn
another agent from an extension and it adds `ThreadId` in the start
thread collaborator
## Why
The split filesystem policy stack already supports exact and glob
`access = none` read restrictions on macOS and Linux. Windows still
needed subprocess handling for those deny-read policies without claiming
enforcement from a backend that cannot provide it.
## Key finding
The unelevated restricted-token backend cannot safely enforce deny-read
overlays. Its `WRITE_RESTRICTED` token model is authoritative for write
checks, not read denials, so this PR intentionally fails that backend
closed when deny-read overrides are present instead of claiming
unsupported enforcement.
## What changed
This PR adds the Windows deny-read enforcement layer and makes the
backend split explicit:
- Resolves Windows deny-read filesystem policy entries into concrete ACL
targets.
- Preserves exact missing paths so they can be materialized and denied
before an enforceable sandboxed process starts.
- Snapshot-expands existing glob matches into ACL targets for Windows
subprocess enforcement.
- Honors `glob_scan_max_depth` when expanding Windows deny-read globs.
- Plans both the configured lexical path and the canonical target for
existing paths so reparse-point aliases are covered.
- Threads deny-read overrides through the elevated/logon-user Windows
sandbox backend and unified exec.
- Applies elevated deny-read ACLs synchronously before command launch
rather than delegating them to the background read-grant helper.
- Reconciles persistent deny-read ACEs per sandbox principal so policy
changes do not leave stale deny-read ACLs behind.
- Fails closed on the unelevated restricted-token backend when deny-read
overrides are present, because its `WRITE_RESTRICTED` token model is not
authoritative for read denials.
## Landed prerequisites
These prerequisite PRs are already on `main`:
1. #15979 `feat(permissions): add glob deny-read policy support`
2. #18096 `feat(sandbox): add glob deny-read platform enforcement`
3. #17740 `feat(config): support managed deny-read requirements`
This PR targets `main` directly and contains only the Windows deny-read
enforcement layer.
## Implementation notes
- Exact deny-read paths remain enforceable on the elevated path even
when they do not exist yet: Windows materializes the missing path before
applying the deny ACE, so the sandboxed command cannot create and read
it during the same run.
- Existing exact deny paths are preserved lexically until the ACL
planner, which then adds the canonical target as a second ACL target
when needed. That keeps both the configured alias and the resolved
object covered.
- Windows ACLs do not consume Codex glob syntax directly, so glob
deny-read entries are expanded to the concrete matches that exist before
process launch.
- Glob traversal deduplicates directory visits within each pattern walk
to avoid cycles, without collapsing distinct lexical roots that happen
to resolve to the same target.
- Persistent deny-read ACL state is keyed by sandbox principal SID, so
cleanup only removes ACEs owned by the same backend principal.
- Deny-read ACEs are fail-closed on the elevated path: setup aborts if
mandatory deny-read ACL application fails.
- Unelevated restricted-token sessions reject deny-read overrides early
instead of running with a silently unenforceable read policy.
## Verification
- `cargo test -p codex-core
windows_restricted_token_rejects_unreadable_split_carveouts`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-windows-sandbox`
- GitHub Actions rerun is in progress on the pushed head.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`ToolRouter::tool_supports_parallel()` was still consulting configured
specs when a handler lookup missed, even though parallel schedulability
is really a property of the executable handler. Keeping that metadata on
`ConfiguredToolSpec` duplicated state between the model-visible spec
layer and the runtime handler layer.
This change makes handlers the sole source of truth for parallel tool
support and removes the extra spec wrapper that only existed to carry
duplicated metadata.
## What changed
- removed `ConfiguredToolSpec` and store plain `ToolSpec` values in the
registry/router builder path
- changed `ToolRouter::tool_supports_parallel()` to consult only the
handler registry and fall back to `false`
- simplified spec collection and test helpers to operate directly on
`ToolSpec`
- updated router/spec tests to cover handler-owned parallel behavior and
the no-handler fallback
## Validation
- `cargo test -p codex-tools`
- `cargo test -p codex-core mcp_parallel_support_uses_handler_data`
- `cargo test -p codex-core
deferred_responses_api_tool_serializes_with_defer_loading`
- `cargo test -p codex-core
tools_without_handlers_do_not_support_parallel`
- `cargo test -p codex-core
request_plugin_install_can_be_registered_without_search_tool`
## Docs
No documentation updates needed.
## Why
Older sessions can contain model-warning records persisted as `user`
messages, including the unified exec process-limit warning, the
`apply_patch`-via-`exec_command` warning, and the model-mismatch
high-risk cyber fallback warning. Those warnings are no longer produced
as conversation history items, but when old sessions compact they should
still be recognized as injected context rather than preserved as real
user turns.
## What changed
- Removed `record_model_warning` and the production paths that emitted
these warning messages into conversation history.
- Added `LegacyUnifiedExecProcessLimitWarning`,
`LegacyApplyPatchExecCommandWarning`, and `LegacyModelMismatchWarning`
contextual fragments that are used only for matching old persisted
messages.
- Registered the legacy fragments with contextual user message detection
so compaction filters them through the existing fragment path.
- Added focused compaction coverage for old warning messages being
dropped during compacted-history processing.
## Testing
- `cargo test -p codex-core warning`
- `just fix -p codex-core`
## Why
`PreToolUse` already exposes `updatedInput` in its hook output schema,
but Codex currently rejects it instead of applying the rewrite. That
leaves hook authors unable to make the documented pre-execution
adjustment to a tool call before it runs.
## What
- Accept `updatedInput` from `PreToolUse` hooks when paired with
`permissionDecision: "allow"`.
- Apply the rewritten input before dispatch so the tool executes the
updated payload, not the original one.
- Preserve the stable hook-facing compatibility shapes that
participating tool handlers expose:
- Bash-like tools (`shell`, `container.exec`, `local_shell`,
`shell_command`, `exec_command`) use `{ "command": ... }`.
- `apply_patch` exposes its patch body through the same command-shaped
hook contract.
- MCP tools expose their JSON argument object directly.
- Keep each participating tool handler responsible for translating
hook-facing `updatedInput` back into its concrete invocation shape.
## Verification
Direct Bash-like rewrite coverage:
- `pre_tool_use_rewrites_shell_before_execution`
- `pre_tool_use_rewrites_container_exec_before_execution`
- `pre_tool_use_rewrites_local_shell_before_execution`
- `pre_tool_use_rewrites_shell_command_before_execution`
- `pre_tool_use_rewrites_exec_command_before_execution`
These cases assert that each supported Bash-like surface runs only the
rewritten command while the hook still observes the original `{
"command": ... }` input.
`pre_tool_use_rewrites_apply_patch_before_execution`
- Model emits one patch.
- Hook swaps in a different patch.
- Asserts only the rewritten file is created, and the hook saw the
original patch.
`pre_tool_use_rewrites_code_mode_nested_exec_command_before_execution`
- Model runs one nested shell command from code mode.
- Hook rewrites it.
- Asserts only the rewritten command runs, and the hook saw the original
nested input.
`pre_tool_use_rewrites_mcp_tool_before_execution`
- Model calls the RMCP echo tool.
- Hook rewrites the MCP arguments.
- Asserts the MCP server receives and returns the rewritten message, not
the original one.
## Summary
- create a selected-cwd filesystem sandbox context for view_image
metadata and file reads in both local and remote environments
- add a local restricted-profile regression test for the previously
unsandboxed read path
## Validation
- just fmt
- bazel test --bes_backend= --bes_results_url= --test_output=errors
--test_filter=view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads
//codex-rs/core:core-unit-tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The MCP tool path had accumulated a few core-owned special cases: a
dedicated payload variant, resolver plumbing, a legacy `AfterToolUse`
translation path, and a side channel for parallel-call metadata. That
made `ToolRegistry` and the spec builder know more about MCP than they
needed to.
This change moves MCP-specific execution details back onto `ToolInfo`
and `McpHandler` so `codex-core` can treat MCP calls like normal
function calls while still preserving MCP-specific dispatch and
telemetry behavior where it belongs.
## What changed
- removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and
the remaining registry-side MCP resolver path
- stored MCP routing metadata directly on `McpHandler` and `ToolInfo`,
including `supports_parallel_tool_calls`
- deleted the legacy `AfterToolUse` consumer in `core`, which removes
the need for handler-specific `after_tool_use_payload` implementations
- switched tool-result telemetry to handler-provided tags and kept
MCP-specific dispatch payload construction inside the handler
- simplified tool spec planning/building by passing `ToolInfo` directly
and dropping the direct/deferred MCP wrapper structs and the
parallel-server side table
## Testing
- `cargo check -p codex-core -p codex-mcp -p codex-otel`
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server`
- `cargo test -p codex-core
direct_mcp_tools_register_namespaced_handlers`
- `cargo test -p codex-core
search_tool_description_lists_each_mcp_source_once`
- `cargo test -p codex-mcp
list_all_tools_uses_startup_snapshot_while_client_is_pending`
- `just fix -p codex-core -p codex-mcp -p codex-otel`
## Why
While investigating `codex exec hi` startup latency, the useful
questions were not "is startup slow?" but "which durable bucket is slow
in production?"
The path we observed has a few distinct stages:
1. `thread/start` creates the session
2. startup prewarm builds the turn context, tools, and prompt
3. startup prewarm warms the websocket
4. the first real turn resolves the prewarm
5. the model produces the first token
Before this PR, production telemetry had some of the raw measurements
already:
- aggregate startup-prewarm duration / age-at-first-turn metrics
- TTFT as a metric
- websocket request telemetry
But there was no coherent production event stream for the startup
breakdown itself, and TTFT was metric-only. That made it hard to answer
the same latency questions from OpenTelemetry-backed logs without adding
one-off local instrumentation.
## What changed
Add durable production telemetry on the existing `SessionTelemetry`
path:
- new `codex.startup_phase` OTel log/trace events plus
`codex.startup.phase.duration_ms`
- new `codex.turn_ttft` OTel log/trace events while preserving the
existing TTFT metric
The startup phase event is emitted for the coarse buckets we actually
observed while running `exec hi`:
- `thread_start_create_thread`
- `startup_prewarm_total`
- `startup_prewarm_create_turn_context`
- `startup_prewarm_build_tools`
- `startup_prewarm_build_prompt`
- `startup_prewarm_websocket_warmup`
- `startup_prewarm_resolve`
These phases are intentionally low-cardinality so they remain safe as
production telemetry tags.
## Why this shape
This keeps the instrumentation on the same production path as the rest
of the session telemetry instead of adding a local debug-only trace
mode. It also avoids changing startup behavior:
- prewarm still runs
- no control flow changes
- no extra remote calls
- no user-visible behavior changes
One boundary is intentional: very early process bootstrap that happens
before a session exists is not included here, because this PR uses
session-scoped production telemetry. The expensive buckets we were
trying to understand after `thread/start` are now covered durably.
## Verification
- `cargo test -p codex-otel`
- `cargo test -p codex-core turn_timing`
- `cargo test -p codex-core
regular_turn_emits_turn_started_without_waiting_for_startup_prewarm`
- `cargo test -p codex-core
interrupting_regular_turn_waiting_on_startup_prewarm_emits_turn_aborted`
- `cargo test -p codex-app-server thread_start`
- `just fix -p codex-otel -p codex-core -p codex-app-server`
I also ran `cargo test -p codex-core`; it built successfully and then
hit an existing unrelated stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
## Summary
- add multi-environment apply_patch routing for both freeform and
function-call tool flows
- parse and reconcile the optional environment selector in the main
apply_patch parser, then verify against the selected environment in the
handler
- carry environment_id through runtime and approval surfaces so
remote-targeted patches stay explicit end to end
## Testing
- just fmt
- remote exec-server e2e: `cargo test -p codex-core --test all
apply_patch_multi_environment_uses_remote_executor -- --nocapture` on
dev via `scripts/test-remote-env.sh`
---------
Co-authored-by: Codex <noreply@openai.com>
# Why
Managed hook configs need a shared cross-platform shape without making
the existing `command` field polymorphic. The common case is still one
command string, with Windows needing a different entrypoint only when
the runtime is actually Windows.
Keeping `command` as the portable/default path and adding an optional
Windows override keeps the config easier to read, preserves the existing
scalar shape for non-Windows users, and avoids forcing every caller into
a `{ unix, windows }` object when only one platform needs special
handling.
# What
- Add optional `command_windows` / `commandWindows` alongside the
existing hook `command` field.
- Resolve `command_windows` only on Windows during hook discovery; other
platforms continue to use `command` unchanged.
- Keep trust hashing aligned to the effective command selected for the
current runtime.
# Docs
The Codex hooks/config reference should document `command_windows` as
the Windows-only override for command hooks.
## Why
Review telemetry should describe reviews as first-class events, not only
as counters denormalized onto terminal tool-item events. That lets us
analyze guardian and user reviews consistently across command execution,
file changes, permissions, and network access, while still preserving
the terminal item summaries that existing tool analytics need.
To make those review events accurate, analytics also needs the observed
completion time for each review and enough command metadata to
distinguish `shell` from `unified_exec` reviews.
## What changed
- emit generic `codex_review_event` rows for completed user and guardian
reviews, with review subjects, reviewer, trigger, terminal status,
resolution, and observed duration
- reduce approval request / response / abort facts into review events
for command execution, file change, and permissions flows
- keep denormalized review counts, final approval outcome, and
permission-request flags on terminal tool-item events for
item-associated reviews
- plumb review completion timing so user-review responses and aborts use
app-server-observed completion times, while guardian analytics reuse the
same terminal timestamps emitted on guardian assessment events
- carry command approval `source` through the protocol and app-server
layers so review analytics can distinguish `shell` from `unified_exec`
- add analytics coverage for user-review emission, guardian-review
emission, permission reviews that should not denormalize onto tool
items, item-summary isolation across threads, and the serialized
review-event shape
## Verification
- `cargo test -p codex-analytics`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18748).
* __->__ #18748
* #21434
* #18747
* #17090
* #17089
* #20514
## Why
The permissions migration is making
`permissions.<profile>.network.enabled` the canonical sandbox network
bit, while proxy startup is a separate concern. Enabling network access
should not implicitly start the proxy, and users who are still on legacy
sandbox modes need a separate place to opt into proxy startup and
provide proxy-specific settings.
This follow-up to #19900 gives the network proxy its own feature surface
instead of overloading permission-profile network semantics.
## What changed
- Add an experimental `network_proxy` feature with a configurable
`[features.network_proxy]` table.
- Overlay `features.network_proxy` settings onto the configured proxy
state after permission-profile selection, so the proxy only starts when
the active `NetworkSandboxPolicy` already allows network access.
- Preserve `[experimental_network]` startup behavior independently of
the new feature flag.
## Behavior and examples
There are now three related knobs:
- `permissions.<profile>.network.enabled` controls whether the active
permission profile has network access at all.
- `features.network_proxy` enables proxy restrictions for an
already-network-enabled profile.
- Legacy `sandbox_mode` plus `[sandbox_workspace_write].network_access`
still control whether legacy `workspace-write` has network access at
all.
The rule is:
- network off + proxy flag on -> network stays off, proxy is a no-op
- network on + proxy flag off -> unrestricted direct network
- network on + proxy flag on -> network stays on, with proxy
restrictions applied
For permission profiles, the feature toggle adds proxy restrictions only
when network access is already enabled:
```toml
default_permissions = "workspace"
[permissions.workspace.filesystem]
":minimal" = "read"
[permissions.workspace.network]
enabled = true
[features]
network_proxy = true
```
If `network.enabled = false`, the same feature flag is a no-op: network
remains off and the proxy does not start.
For legacy sandbox config, `network_access` remains the master switch:
```toml
sandbox_mode = "workspace-write"
[sandbox_workspace_write]
network_access = true
[features]
network_proxy = true
```
That keeps legacy `workspace-write` network access on, but routes it
through the proxy policy. If `network_access = false`, the proxy feature
is a no-op and legacy `workspace-write` remains offline.
The same proxy opt-in can be supplied from the CLI:
```bash
codex -c 'features.network_proxy=true'
```
Additional proxy settings can be supplied when a table is needed:
```bash
codex \
-c 'features.network_proxy.enabled=true' \
-c 'features.network_proxy.enable_socks5=false'
```
The intended behavior matrix is:
| Config surface | Network setting | `features.network_proxy` | Direct
sandbox network | Proxy |
| --- | --- | --- | --- | --- |
| Permission profile | `network.enabled = false` | off | restricted |
off |
| Permission profile | `network.enabled = false` | on | restricted | off
|
| Permission profile | `network.enabled = true` | off | enabled | off |
| Permission profile | `network.enabled = true` | on | enabled | on |
| Legacy `workspace-write` | `network_access = false` | off | restricted
| off |
| Legacy `workspace-write` | `network_access = false` | on | restricted
| off |
| Legacy `workspace-write` | `network_access = true` | off | enabled |
off |
| Legacy `workspace-write` | `network_access = true` | on | enabled | on
|
`[experimental_network]` requirements remain separate from the user
feature toggle and still start the proxy on their own.
Relevant code:
-
[`features/src/feature_configs.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/features/src/feature_configs.rs#L58-L117)
defines the feature-specific proxy config.
-
[`core/src/config/mod.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L1959-L1964)
reads the feature table, and [later applies it only when network access
is already
enabled](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L2448-L2458).
## Verification
Added focused coverage for:
- keeping the proxy off when `features.network_proxy` is enabled but
sandbox network access is disabled
- the full permission-profile and legacy `workspace-write` matrix above
- preserving `[experimental_network]` startup without the feature
- reusing profile-supplied proxy settings when the feature is enabled
Ran:
- `cargo test -p codex-features`
- `cargo test -p codex-core network_proxy_feature`
- `cargo test -p codex-core
experimental_network_requirements_enable_proxy_without_feature`
## Why
We've added support for auth elicitation behind the auth_elicitation
flag, but servers need to explicitly check the capability before it
decides to send elicitations in order to be backward compatible. This PR
adds the capability advertising conditioned on the flag.
## What changed
- Build `client_elicitation_capability` from the `AuthElicitation`
feature state.
- Thread that capability through MCP config, session startup, and
`McpConnectionManager` so RMCP initialization advertises the correct
elicitation support.
- Advertise both `form` and `url` elicitation when the feature is
enabled, and preserve the empty default capability when it is disabled.
- Add coverage for the feature-derived config shape and the advertised
initialization payload.
## Testing
- `cargo test -p codex-mcp`
- `cargo test -p codex-core
to_mcp_config_preserves_auth_elicitation_feature_from_config`
- `cargo test -p codex-core` *(currently fails outside this change in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
with a stack overflow after unrelated tests have started running)*
## Why
Managed requirements can already centrally disable apps, but they could
not express the per-tool app approval rules that normal config already
supports. That left admins without a way to enforce connector tool
approvals through `/etc/codex/requirements.toml` or cloud requirements.
## What changed
- Extend app requirements with per-tool `approval_mode` entries.
- Merge managed app tool requirements across managed sources while
preserving higher-precedence exact tool settings.
- Apply managed tool approvals separately from user app config so
managed policy is matched only on raw MCP `tool.name`, while user config
keeps the existing raw-name-then-title convenience fallback.
- Add coverage for local requirements, cloud requirements parsing,
managed-over-user precedence, and a title-collision case that must not
widen managed auto-approval.
## Configuration shape
Local `/etc/codex/requirements.toml` and cloud requirements use the same
TOML shape:
```toml
[apps.connector_123123.tools."calendar/list_events"]
approval_mode = "approve"
```
This is a per-tool approval rule keyed by app ID and raw MCP tool name,
not an app-level boolean such as `apps.connector_123123.approve = true`.
## Why
Managed filesystem `deny_read` requirements are administrator-enforced
restrictions on specific paths. Once those requirements are active,
Codex should not drop them just because an execution path would
otherwise leave the sandbox.
Before this change, an explicit escalation, a prefix-rule allow, a
sandbox-denial retry, or an app-server legacy sandbox override could
rebuild the runtime policy without those managed read-deny entries and
expose a path the administrator had marked unreadable.
This is narrower than general sandbox-mode constraints. If an enterprise
only sets `allowed_sandbox_modes`, a trusted `prefix_rule(..., decision
= "allow")` can still run its matching command unsandboxed; this PR only
preserves managed filesystem `deny_read` restrictions across those
paths.
## What Changed
- Mark filesystem policies built from managed `deny_read` requirements
so callers can tell when those deny entries must survive escalation.
- Preserve managed deny-read entries when runtime permission profiles
are rebuilt through protocol, app-server, or legacy sandbox-policy
compatibility paths.
- Keep managed deny-read attempts inside the selected sandbox on the
first attempt and after sandbox-denial retries.
- Preserve the same behavior in the zsh-fork escalation path, including
prefix-rule-driven escalation.
- Add a regression test showing the opposite case too: without managed
deny-read, a prefix-rule allow still chooses unsandboxed execution.
## Verification
Targeted automated verification:
```shell
cargo test -p codex-core shell_request_escalation_execution_is_explicit -- --nocapture
cargo test -p codex-core prefix_rule_uses_unsandboxed_execution_without_managed_deny_read -- --nocapture
cargo test -p codex-core prefix_rule_preserves_managed_deny_read_escalation -- --nocapture
cargo test -p codex-protocol permission_profile_round_trip_preserves_filesystem_policy_metadata -- --nocapture
cargo test -p codex-protocol preserving_deny_entries_keeps_unrestricted_policy_enforceable -- --nocapture
cargo test -p codex-app-server-protocol permission_profile_file_system_permissions_preserves_policy_metadata -- --nocapture
cargo check -p codex-app-server -p codex-tui
```
Smoke-test invocations:
```shell
# macOS exact deny + allowed control
codex exec --skip-git-repo-check -C "$ROOT" \
-c 'default_permissions="deny_read_smoke"' \
-c 'permissions.deny_read_smoke.filesystem={":minimal"="read",":project_roots"={"."="write","secrets"="none","future-secret"="none","**/*.env"="none"}}' \
'Run shell commands only. Print the contents of allowed.txt. Then test whether reading secrets/exact-secret.txt succeeds without printing that file if it does. End with exactly two lines: allowed=<contents> and exact_secret=<BLOCKED or READABLE>.'
# Linux exact deny + allowed control
codex exec --skip-git-repo-check -C "$ROOT" \
-c 'default_permissions="deny_read_smoke"' \
-c 'permissions.deny_read_smoke.filesystem={":minimal"="read",glob_scan_max_depth=3,":project_roots"={"."="write","secrets"="none","future-secret"="none","**/*.env"="none"}}' \
'Run shell commands only. Print the contents of allowed.txt. Then test whether reading secrets/exact-secret.txt succeeds without printing that file if it does. End with exactly two lines: allowed=<contents> and exact_secret=<BLOCKED or READABLE>.'
```
Observed manual smoke matrix:
| Case | macOS Seatbelt | Linux bubblewrap |
| --- | --- | --- |
| `cat allowed.txt` | Pass | Pass |
| `cat secrets/exact-secret.txt` | Blocked | Blocked |
| `cat envs/root.env` | Blocked | Blocked |
| `cat envs/nested/one.env` | Blocked | Blocked |
| `cat envs/nested/two.env` | Blocked | Blocked |
| `cat alias-to-secrets/exact-secret.txt` | Blocked | Blocked |
| Missing denied path | A file created after sandbox setup remained
unreadable | Creation was blocked by the reserved missing-path
placeholder, and the placeholder was cleaned up after exit |
| Real `codex exec` shell turn | Pass | Pass |
Notes:
- The Linux smoke run used the fallback glob walker because the devbox
did not have `rg` installed.
- The smoke matrix verifies the end-to-end filesystem behavior on macOS
and Linux; the escalation-specific behavior is covered by the focused
tests above.
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charlie Marsh <charliemarsh@openai.com>