## Summary
- migrate exec-server remote registration naming from executor to
environment
- align CLI, public Rust exports, registry error messages, and relay
test fixtures with the environment registry contract
- keep the live registration path and response model consistent with
`/cloud/environment/{environment_id}/register`
## Verification
- `cargo test -p codex-exec-server
remote::tests::register_environment_posts_with_auth_provider_headers
--manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml`
- `cargo test -p codex-exec-server --test relay
multiplexed_remote_environment_routes_independent_virtual_streams
--manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml`
- `cargo check -p codex-cli --manifest-path
/Users/richardlee/code/codex/codex-rs/Cargo.toml` (still running when PR
opened; will update after completion if needed)
## Why
Code mode can use nested unified exec calls as data sources. When those
calls omit `max_output_tokens`, code mode should receive raw command
output so the script can parse or summarize it itself. When code mode
does provide `max_output_tokens`, that explicit nested budget should be
respected, including values above the default unified exec limit, rather
than being capped before code mode sees the result.
## What
- Preserve direct unified exec truncation behavior, while letting
code-mode exec/write_stdin keep `max_output_tokens` as `None` unless
explicitly supplied.
- Make code-mode tool results use raw output when no explicit limit is
present, and use the explicit nested limit directly when one is
specified.
- Refactor unified exec output formatting so `truncated_output` takes
the caller-selected token budget.
- Add e2e integration coverage for explicit nested exec limits, omitted
nested exec limits, outer exec limit propagation, omitted-limit outputs
that exceed both the default and a small truncation policy, explicit
nested limits above those caps, and high explicit limits that still
compact larger command output.
- Reuse the code-mode turn setup helper while directly asserting the
exact exec output item in each test.
## Testing
- `just fmt`
- `git diff --check`
- Not run locally per repo guidance; CI should validate the e2e
integration tests.
## Why
Issue #23214 reports `/ps` showing no background terminals while the
status line still says it is waiting for a background terminal. The race
is in core: `write_stdin` can poll a process that exits before the
response returns. The process manager correctly returns `process_id:
None`, but the handler still emitted a `TerminalInteraction` event using
the requested session id, causing clients to believe a dead process was
still being polled.
Fixes#23214.
## What changed
- Suppress `TerminalInteraction` events for empty `write_stdin` polls
once `response.process_id` is `None`.
- Continue emitting interactions for non-empty stdin, even if that input
causes the process to exit before the response returns.
- Extend the unified exec integration test to assert completed empty
polls do not emit terminal interactions.
## Verification
- `cargo test -p codex-core --test all
unified_exec_emits_one_begin_and_one_end_event`
- `cargo test -p codex-core --test all
unified_exec_emits_terminal_interaction_for_write_stdin`
`cargo test -p codex-core` currently aborts in unrelated
`agent::control::tests::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`
with a reproducible stack overflow.
## Summary
- Add `list_available_plugins_to_install` as the inventory step for
plugin and connector install suggestions.
- Slim `request_plugin_install` so it only handles the actual
elicitation, instead of carrying the full discoverable list in its
prompt.
- Emit send-time telemetry when an install elicitation is dispatched,
including requested tool identity in the event payload.
- Emit install-result telemetry through `SessionTelemetry`, including
tool type, user response action, and completion status.
- Update registration and tests to cover the new two-step flow while
keeping the existing `tool_suggest` feature gate unchanged.
## Testing
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core request_plugin_install`
- `cargo test -p codex-core list_available_plugins_to_install`
- `cargo test -p codex-core
install_suggestion_tools_can_be_registered_without_search_tool`
- `cargo test -p codex-otel
manager_records_plugin_install_suggestion_metric`
- `cargo test -p codex-otel
manager_records_plugin_install_elicitation_sent_metric`
- `just fix -p codex-core`
- `just fix -p codex-tools`
- `just fix -p codex-otel`
- `cargo check -p codex-core`
# What
`SubagentStart` runs once when Codex creates a thread-spawned subagent,
before that child sends its first model request. Thread-spawned
subagents use `SubagentStart` instead of the normal root-agent
`SessionStart` hook.
Configured handlers match on the subagent `agent_type`, using the same
value passed to `spawn_agent`. When no agent type is specified, Codex
uses the default agent type.
Hook input includes the normal session-start fields plus:
- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.
`SubagentStart` may return `hookSpecificOutput.additionalContext`. That
context is added to the child conversation before the first model
request.
# Lifecycle Scope
Only thread-spawned subagents run `SubagentStart`.
Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `SessionStart` hooks and do not run
`SubagentStart`. This avoids exposing synthetic matcher labels for
internal implementation paths.
Also the `SessionStart` hook no longer fires for subagents, this matches
behavior with other coding agents' implementation
# Stack
1. This PR: add `SubagentStart`.
2. #22873: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.
## Why
Filesystem permission profiles used `none` for deny-read entries, which
is less direct than the action the entry actually represents. This
change makes `deny` the canonical filesystem permission spelling while
preserving compatibility for older configs that still send `none`.
## What changed
- rename `FileSystemAccessMode::None` to `Deny`
- serialize and generate schemas with `deny` as the canonical value
- retain `none` only as a legacy input alias for temporary config
compatibility
- update filesystem glob diagnostics and regression coverage to use the
canonical spelling
- refresh config and app-server schema fixtures to match the new wire
shape
## Validation
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core config_toml_deserializes_permission_profiles
--lib`
- `cargo test -p codex-core
read_write_glob_patterns_still_reject_non_subpath_globs --lib`
Earlier in the session, a broad `cargo test -p codex-core` run reached
unrelated pre-existing failures in timing/snapshot/git-info tests under
this environment; the targeted surfaces touched by this PR passed
cleanly.
## Why
The v1 sub-agent tools are a single tool family, but they were exposed
as separate flat function tools. This makes the model-visible surface
less clearly grouped and leaves the legacy names in the same flat
namespace as newer agent tooling.
## What
- Wraps the v1 `spawn_agent`, `send_input`, `resume_agent`,
`wait_agent`, and `close_agent` specs in the `multi_agent_v1` namespace.
- Registers the corresponding handlers with namespaced runtime tool
names.
- Updates tool-planning, deferred tool search, and sub-agent
notification tests to assert the namespace shape and child `spawn_agent`
lookup.
## Verification
- Updated `codex-core` coverage for the v1 multi-agent tool plan,
deferred tool search output, and sub-agent tool descriptions.
Summary: defer v1 multi-agent tools when tool_search and namespace tools
are available; keep concise searchable descriptions and move the v1
usage guidance into developer instructions; add targeted coverage.
Testing: not run per request; ran just fmt.
## Why
`model_auto_compact_token_limit` has only been able to budget the full
active context. That makes it hard to set a small "growth since
compaction" budget for sessions that preserve a large carried window
prefix: the preserved prefix can consume the whole budget and force
immediate repeated compaction.
This PR adds an opt-in `body_after_prefix` scope so callers can apply
`model_auto_compact_token_limit` to sampled output and later growth
after the current carried prefix, while still forcing compaction before
the full model context window is exhausted.
## What changed
- Adds `AutoCompactTokenLimitScope` with the existing `total` behavior
as the default and a new `body_after_prefix` mode:
[`config_types.rs`](973806b1cb/codex-rs/protocol/src/config_types.rs (L24-L37)).
- Threads `model_auto_compact_token_limit_scope` through config loading,
`Config`, `core-api`, and app-server v2 schema/TypeScript generation.
- Records the first observed input-token count for a `body_after_prefix`
compaction window and uses it as the baseline when deciding whether the
scoped auto-compaction budget is exhausted:
[`turn.rs`](973806b1cb/codex-rs/core/src/session/turn.rs (L743-L781)).
- Keeps a hard context-window cap in `body_after_prefix`, so scoped
budgeting cannot let the active context overrun the usable window.
## Verification
Added compact-suite coverage for the two key behaviors:
`body_after_prefix` does not re-compact just because the carried prefix
is larger than the scoped budget, and it still compacts when the total
active context reaches the configured context window:
[`compact.rs`](973806b1cb/codex-rs/core/tests/suite/compact.rs (L3003-L3128)).
**Stack position:** [5 of 7]
## Summary
This PR adds `Op::ThreadSettings`, a queued settings-only update
mechanism for changing stored thread settings without starting a new
turn. It also removes the legacy `Op::OverrideTurnContext` in the same
layer, so reviewers can see the replacement and deletion together.
## Changes
- Add `Op::ThreadSettings` for settings-only queued updates.
- Emit `ThreadSettingsApplied` with the effective thread settings
snapshot after core applies an update.
- Route settings-only updates through the same submission queue as user
input.
- Migrate remaining `OverrideTurnContext` tests and callers to the
queued `Op::ThreadSettings` path.
- Delete `Op::OverrideTurnContext` from the core protocol and submission
loop.
This stack addresses #20656 and #22090.
## Stack
1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)
**Stack position:** [1 of 7]
## Summary
The first three PRs in this stack are a cleanup pass before the actual
thread settings API work.
Today, core has several overlapping "user input" ops: `UserInput`,
`UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how
much next-turn state they carry, which makes the later queued thread
settings update harder to reason about and review.
This PR starts that cleanup by adding the shared
`ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry
it. Existing variants remain in place here, so this layer is mostly a
behavior-preserving API shape change plus mechanical constructor
updates.
## End State After PR3
By the end of PR3, `Op::UserInput` is the only "user input" core op. It
can carry optional thread settings overrides for callers that need to
update stored defaults with a turn, while callers without updates use
empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are
deleted.
## End State After PR5
By the end of PR5, core will have only two ops for this area:
- `Op::UserInput` for user-input-bearing submissions.
- `Op::ThreadSettings` for settings-only updates.
## Stack
1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080) (this PR)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)
## Summary
- mark `ToolSearch` as removed and ignore stale config writes for its
legacy key
- make search tool exposure depend only on model capability, not a
feature toggle
- remove app-server enablement support and prune now-obsolete test
coverage/setup
## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core search_tool_requires_model_capability`
- `cargo test -p codex-app-server experimental_feature_enablement_set_`
## Notes
- This keeps the legacy config key as a no-op for compatibility while
removing the ability to toggle the behavior off cleanly.
- No developer-facing docs update outside the touched app-server README
was needed.
## Why
`TurnContextItem` is the durable baseline used to reconstruct context
diffs across resume/fork. Most of the old persisted-only fields on it
are no longer read, so keeping them in rollout snapshots adds schema
surface and state that can drift without affecting reconstruction.
`summary` is the exception: older Codex versions require it to
deserialize `turn_context` records, so keep writing a default
compatibility value until that schema surface can be removed safely.
## What changed
- Removed the unused persisted fields from `TurnContextItem`: trace ids,
user/developer instructions, output schema, and truncation policy.
- Kept `summary` with a compatibility comment and made
`TurnContext::to_turn_context_item` write `ReasoningSummary::Auto`
instead of live turn state.
- Updated rollout/context reconstruction fixtures for the retained
summary field.
## Verification
- `cargo test -p codex-protocol --lib turn_context_item`
- `cargo test -p codex-rollout
resume_candidate_matches_cwd_reads_latest_turn_context`
- `cargo test -p codex-state turn_context`
- `cargo test -p codex-core --lib
new_default_turn_captures_current_span_trace_id`
- `cargo test -p codex-core --lib
record_initial_history_resumed_turn_context_after_compaction_reestablishes_reference_context_item`
- `cargo test -p codex-core --test all
emits_warning_when_resumed_model_differs`
- `git diff --check`
## Why
The client and tool pipeline still carried compatibility code for legacy
structured shell output. Current shell and apply_patch responses are
already plain text for model consumption, so keeping a
JSON-serialization path plus shell-item rewrite logic makes the request
formatter and tests preserve a format we do not need anymore.
## What Changed
- Removed the client-side shell output rewrite from
`core/src/client_common.rs`.
- Removed the structured exec-output formatter and the shell `freeform`
switch so tool emitters use one model-facing formatter.
- Collapsed apply_patch/shell serialization tests around the remaining
plain-text output expectations and removed duplicate one-variant
parameterized cases.
- Kept the `ApplyPatchModelOutput::ShellCommandViaHeredoc` compatibility
input shape, but no longer treats it as a separate output-format mode.
## Validation
- `cargo test -p codex-core client_common`
- `cargo test -p codex-core shell_serialization`
- `cargo test -p codex-core apply_patch_cli`
- `just fix -p codex-core`
## Documentation
No external Codex documentation update is needed.
## Why
`SandboxPolicy` is a legacy compatibility shape, but several core tests
still used it for ordinary turn setup even when the runtime path now
carries `PermissionProfile`. With the first cleanup PR merged, this
follow-up trims more core test scaffolding so remaining `SandboxPolicy`
matches are easier to classify as production compatibility,
legacy-boundary coverage, or explicit conversion tests.
## What Changed
- Updated apply-patch handler and runtime tests to pass
`PermissionProfile` directly.
- Changed sandboxing test helpers to build permission profiles without
first creating `SandboxPolicy` values.
- Converted request-permissions integration turns to pass
`PermissionProfile` through the test helper, leaving legacy sandbox
projection at the `Op::UserTurn` boundary.
- Converted unified exec integration helpers and direct turn submissions
to use `PermissionProfile` values instead of `SandboxPolicy` setup.
- Removed now-unused `SandboxPolicy` imports from the touched core
tests.
## Test Plan
- `just fmt`
- `cargo test -p codex-core --lib tools::sandboxing::tests`
- `cargo test -p codex-core --lib tools::runtimes::apply_patch::tests`
- `cargo test -p codex-core --lib tools::handlers::apply_patch::tests`
- `cargo test -p codex-core --lib unified_exec::process_manager::tests`
- `cargo test -p codex-core --test all request_permissions::`
- `cargo test -p codex-core --test all unified_exec::`
- `just fix -p codex-core`
## Why
The `spawn_agent` model override guidance is uncapped and bloating
context. We need to trim down each entry and cap total entries.
picked 5 as cap, we can change
## What changed
- Cap the model override summaries shown in `spawn_agent` to the first 5
picker-visible models, preserving the existing priority ordering from
the models manager.
- Condense each rendered entry to the actionable pieces the model needs:
- use the model slug as the label
- render compact reasoning effort lists with the default marked inline
- render only service tier IDs, and omit the clause when no tiers are
available
- Update coverage so the compact formatter shape and the top-5 cap are
exercised, and keep the end-to-end request assertion aligned with real
model metadata.
## Example
Before:
`- gpt-5.4 ('gpt-5.4\'): Strong model for everyday coding. Default
reasoning effort: medium. Supported reasoning efforts: low (Fast
responses with lighter reasoning), medium (Balances speed and reasoning
depth for everyday tasks), high (Greater reasoning depth for complex
problems), xhigh (Extra high reasoning depth for complex problems).
Supported service tiers: priority (Fast: 1.5x speed, increased usage).`
After:
`- 'gpt-5.4': Strong model for everyday coding. Reasoning efforts: low,
medium (default), high, xhigh. Service tiers: priority.`
## Why
`SandboxPolicy` is now a legacy compatibility shape, but several tests
still built a `SandboxPolicy` only to immediately convert it into
`PermissionProfile` for APIs that already accept canonical runtime
permissions. Those detours make it harder to audit where legacy sandbox
policy is still required, because boundary-only usages are mixed
together with ordinary test setup.
## What Changed
- Updated tests in `codex-core`, `codex-exec`, `codex-analytics`, and
`codex-config` to construct `PermissionProfile` values directly when the
code under test takes a permission profile.
- Changed exec-policy, request-permissions, session, and sandbox test
helpers to pass `PermissionProfile` through instead of converting from
`SandboxPolicy` internally.
- Left `SandboxPolicy` in place where tests are explicitly exercising
legacy compatibility or request/response boundaries.
## Test Plan
- `cargo test -p codex-analytics -p codex-config`
- `cargo test -p codex-core --lib safety::tests`
- `cargo test -p codex-core --lib exec_policy::tests::`
- `cargo test -p codex-core --lib exec::tests`
- `cargo test -p codex-core --lib guardian_review_session_config`
- `cargo test -p codex-core --lib tools::network_approval::tests`
- `cargo test -p codex-core --lib
tools::runtimes::shell::unix_escalation::tests`
- `cargo test -p codex-core --lib managed_network`
- `cargo test -p codex-core --test all request_permissions::`
- `cargo test -p codex-exec sandbox`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23030).
* #23036
* __->__ #23030
## Summary
- Add optional image detail to user image inputs across core, app-server
v2, thread history/event mapping, and the generated app-server
schemas/types.
- Preserve requested detail when serializing Responses image inputs:
omitted detail stays on the existing `high` default, while explicit
`original` keeps local images on the original-resolution path.
- Support `high`/`original` consistently for tool image outputs,
including MCP `codex/imageDetail`, code-mode image helpers, and
`view_image`.
## Why
Remote compaction v2 is the `/responses` implementation of
session-history compaction, but it still needs to preserve the
observable contract of the legacy `/responses/compact` path. In
particular, users and integrations that rely on `PreCompact` and
`PostCompact` hooks should not see different behavior when
`remote_compaction_v2` is enabled.
## What Changed
- Runs `PreCompact` before issuing the remote compaction v2 request,
including `Interrupted` analytics when a pre-hook stops execution.
- Runs `PostCompact` after a successful v2 compaction and aborts the
turn if the post-hook stops execution.
- Adds `compact_remote_parity` coverage that compares legacy and v2
compaction across manual transcript shapes, automatic pre-turn
compaction, automatic mid-turn compaction, hook payloads, replacement
history, follow-up request payloads, and API-key `service_tier=fast`
behavior.
- Registers the new parity suite under `core/tests/suite`.
Relevant code:
-
[`compact_remote_v2.rs`](af63745cb5/codex-rs/core/src/compact_remote_v2.rs)
-
[`compact_remote_parity.rs`](af63745cb5/codex-rs/core/tests/suite/compact_remote_parity.rs)
## Verification
- Added `core/tests/suite/compact_remote_parity.rs` to assert parity
between legacy remote compaction and remote compaction v2 for the
affected request, hook, rollout-history, and follow-up paths.
- Existing `compact_remote_v2` unit coverage still exercises v2
replacement-history retention and compaction-output collection.
## Why
Remote compaction v2 was still using `context_compaction` as both the
request trigger and the compacted output shape. The Responses API now
has the landed contract for this flow: Codex sends a dedicated `{
"type": "compaction_trigger" }` input item, and the backend returns the
standard `compaction` output item with encrypted content.
This aligns the v2 path with that wire contract while preserving the
existing local compacted-history post-processing behavior.
## What changed
- Add `ResponseItem::CompactionTrigger` and regenerate the app-server
protocol schema fixtures.
- Send `compaction_trigger` from `remote_compaction_v2` instead of a
payload-less `context_compaction`.
- Collect exactly one backend `compaction` output item, then reuse the
existing compacted-history rebuilding path.
- Treat the trigger item as a transient request marker rather than model
output or persisted rollout/memory content.
## Verification
- `cargo test -p codex-protocol compaction_trigger`
- `cargo test -p codex-core remote_compact_v2`
- `cargo test -p codex-core compact_remote_v2`
- `cargo test -p codex-core
responses_websocket_sends_response_processed_after_remote_compaction_v2`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol schema_fixtures`
## Why
This PR builds on [#22610](https://github.com/openai/codex/pull/22610)
and is the app-server side of the migration from mutable per-turn
`SandboxPolicy` replacement toward selecting immutable permission
profiles by id plus mutable runtime workspace roots.
Once permission profiles can carry their own immutable
`workspace_roots`, app-server no longer needs to mutate the selected
`PermissionProfile` just to represent thread-specific filesystem
context. The mutable part now lives on the thread as explicit
`runtimeWorkspaceRoots`, while `:workspace_roots` remains symbolic until
the sandbox is realized for a turn.
## What Changed
- Replaced the v2 permission-selection wrapper surface with plain
profile ids for `thread/start`, `thread/resume`, `thread/fork`, and
`turn/start`.
- Removed the API surface for profile modifications
(`PermissionProfileSelectionParams`,
`PermissionProfileModificationParams`,
`ActivePermissionProfileModification`).
- Added experimental `runtimeWorkspaceRoots` fields to the thread
lifecycle and turn-start APIs.
- Threaded runtime workspace roots through core session/thread
snapshots, turn overrides, app-server request handling, and command
execution permission resolution.
- Kept session permission state symbolic so later runtime root updates
and cwd-only implicit-root retargeting rebind `:workspace_roots`
correctly.
- Updated the embedded clients just enough to send and restore the new
thread state.
- Refreshed the generated schema/TypeScript artifacts and the app-server
README to match the new contract.
## Verification
Targeted coverage for this layer lives in:
- `codex-rs/app-server-protocol/src/protocol/v2/tests.rs`
- `codex-rs/app-server/tests/suite/v2/thread_start.rs`
- `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
- `codex-rs/app-server/tests/suite/v2/turn_start.rs`
- `codex-rs/core/src/session/tests.rs`
The key regression checks exercise that:
- `runtimeWorkspaceRoots` resolve against the effective cwd on thread
start.
- Profile-declared workspace roots are excluded from the runtime
workspace roots returned by app-server.
- A turn-level runtime workspace-root update persists onto the thread
and is returned by `thread/resume`.
- A named permission profile selected on one turn remains symbolic so a
later runtime-root-only turn update changes the actual sandbox writes.
- A cwd-only turn update retargets the implicit runtime cwd root while
preserving additional runtime roots.
- The protocol fixtures and generated client artifacts stay in sync with
the string-based permission selection contract.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22611).
* #22612
* __->__ #22611
## Summary
- add the missing response.created event to the mocked empty follow-up
response in the compact rollback test
- keep the fix scoped to the flaky mocked stream shape, without
increasing timeouts
## Recent flakes on main
- `snapshot_rollback_followup_turn_trims_context_updates` failed in
`rust-ci-full` on `main` in the Ubuntu remote test job on 2026-05-14:
https://github.com/openai/codex/actions/runs/25891434395/job/76095284830
- The same `compact_resume_fork` suite also failed recently on `main`
with `snapshot_rollback_past_compaction_replays_append_only_history`,
which has the same mocked Responses stream shape sensitivity this PR is
tightening:
https://github.com/openai/codex/actions/runs/25892437363/job/76098329098
## Verification
- env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test
all snapshot_rollback_followup_turn_trims_context_updates -- --nocapture
- repeated the same focused test 3 consecutive times locally
- UV_CACHE_DIR=/private/tmp/uv-cache-codex-fmt just fmt
## Why
- Similar change as https://github.com/openai/codex/pull/21219
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with various key values.
- Issue: MCP servers currently do not know if user input was requested
during the turn (Ex: Model decides to prompt the user for approval
mid-turn before making a possibly risky tool call). MCP servers may want
to know this when tracking latency metrics because these instances are
inflated.
## What Changed
- With change: MCP turn metadata now includes
`user_input_requested_during_turn` when a model-visible
`request_user_input` call happened earlier in the turn, propagated in
`_meta["x-codex-turn-metadata"]`.
- `mark_turn_user_input_requested()` is called when user input is
requested through either MCP elicitation (`mcp.rs`) or the
`request_user_input` tool (`mod.rs`).
- MCP tool call `_meta` is now built immediately before execution
(`mcp_tool_call.rs`) so user input requested earlier in the same turn,
including within the same tool call via elicitation, is reflected in the
metadata.
- Normal `/responses` turn metadata headers are unchanged.
## Verification
- `codex-rs/core/src/session/mcp_tests.rs`
- `codex-rs/core/src/tools/handlers/request_user_input_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
## Why
This is the configuration/model half of the alternative permissions
migration we discussed as a comparison point for
[#22401](https://github.com/openai/codex/pull/22401) and
[#22402](https://github.com/openai/codex/pull/22402).
The old `workspace-write` model mixes three concerns that we want to
keep separate:
- reusable profile rules that should stay immutable once selected
- user/runtime workspace roots from `cwd`, `--add-dir`, and legacy
workspace-write config
- internal Codex writable roots such as memories, which should not be
shown as user workspace roots
This PR gives permission profiles first-class `workspace_roots` so users
can opt multiple repositories into the same `:workspace_roots` rules
without using broad absolute-path write grants. It also starts
separating the raw selected profile from the effective runtime profile
by making `Permissions` expose explicit accessors instead of public
mutable fields.
A representative `config.toml` looks like this:
```toml
default_permissions = "dev"
[permissions.dev.workspace_roots]
"~/code/openai" = true
"~/code/developers-website" = true
[permissions.dev.filesystem.":workspace_roots"]
"." = "write"
".codex" = "read"
".git" = "read"
".vscode" = "read"
```
If Codex starts in `~/code/codex` with that profile selected, the
effective workspace-root set becomes:
- `~/code/codex` from the runtime `cwd`
- `~/code/openai` from the profile
- `~/code/developers-website` from the profile
The `:workspace_roots` rules are materialized across each root, so
`.git`, `.codex`, and `.vscode` stay scoped the same way everywhere.
Runtime additions such as `--add-dir` can still layer on later stack
entries without mutating the selected profile.
## Stack Shape
This PR intentionally stops before the profile-identity cleanup in
[#22683](https://github.com/openai/codex/pull/22683) so the base review
stays focused on config loading, workspace-root materialization, and
compatibility with legacy `workspace-write`.
The representation in this PR is therefore transitional: `Permissions`
carries enough state to distinguish the raw constrained profile from the
effective runtime profile, and there are still call sites that must keep
the active profile identity and constrained profile value in sync. The
follow-up PR replaces that with a single resolved profile state
(`ResolvedPermissionProfile` / `PermissionProfileState`) that keeps the
profile id, immutable `PermissionProfile`, and profile-declared
workspace roots together. That follow-up removes APIs such as
`set_constrained_permission_profile_with_active_profile()` where
separate arguments could drift out of sync.
Downstream PRs then build on this base to switch app-server turn updates
to profile ids plus runtime workspace roots and to finish the
user-visible summary behavior. Reviewers should judge this PR as the
workspace-roots foundation, not as the final in-memory shape of selected
permission profiles.
## Review Guide
Suggested review order:
1. Start with `codex-rs/core/src/config/mod.rs`.
This is the main shape change in the base slice. `Permissions` now
stores a private raw `Constrained<PermissionProfile>` plus runtime
`workspace_roots`. Callers use `permission_profile()` when they need the
raw constrained value and `effective_permission_profile()` when they
need a materialized runtime profile. As noted above,
[#22683](https://github.com/openai/codex/pull/22683) replaces this
transitional shape with a resolved profile state that keeps identity and
profile data together.
2. Review `codex-rs/config/src/permissions_toml.rs` and
`codex-rs/core/src/config/permissions.rs`.
These add `[permissions.<id>.workspace_roots]`, resolve enabled entries
relative to the policy cwd, and keep `:workspace_roots` deny-read glob
patterns symbolic until the actual roots are known.
3. Review `codex-rs/protocol/src/permissions.rs` and
`codex-rs/protocol/src/models.rs`.
These add the policy/profile materialization helpers that expand exact
`:workspace_roots` entries and scoped deny-read globs over every
workspace root. This is also where `ActivePermissionProfileModification`
is removed from the core model.
4. Review the legacy bridge in
`Config::load_from_base_config_with_overrides` and
`Config::set_legacy_sandbox_policy`.
This is where legacy `workspace-write` roots become runtime workspace
roots, while Codex internal writable roots stay internal and do not
appear as user-facing workspace roots.
5. Then skim downstream call sites.
The interesting pattern is raw-vs-effective access: state/proxy/bwrap
paths keep the raw constrained profile, while execution, summaries, and
user-visible status use the effective profile and workspace-root list.
## What Changed
- added `[permissions.<id>.workspace_roots]` to the config model and
schema
- added runtime `workspace_roots` state to `Config`/`Permissions` and
`ConfigOverrides`
- made `Permissions` profile fields private and replaced direct mutation
with accessors/setters
- added `PermissionProfile` and `FileSystemSandboxPolicy` helpers for
materializing `:workspace_roots` exact paths and deny-read globs across
all roots
- moved legacy additional writable roots into runtime workspace-root
state instead of active profile modifications
- removed `ActivePermissionProfileModification` and its app-server
protocol/schema export
- updated sandbox/status summary paths so internal writable roots are
not reported as user workspace roots
## Verification Strategy
The targeted tests cover the behavior at the layers where regressions
are most likely:
- `codex-rs/core/src/config/config_tests.rs` verifies config loading,
legacy workspace-root seeding, effective profile materialization, and
memory-root handling.
- `codex-rs/core/src/config/permissions_tests.rs` verifies profile
`workspace_roots` parsing and `:workspace_roots` scoped/glob
compilation.
- `codex-rs/protocol/src/permissions.rs` unit tests verify exact and
glob materialization over multiple workspace roots.
- `codex-rs/tui/src/status/tests.rs` and
`codex-rs/utils/sandbox-summary/src/sandbox_summary.rs` verify the
user-facing summaries show effective workspace roots and hide internal
writes.
I also ran `cargo check --tests` locally after the latest stack refresh
to catch cross-crate API breakage from the private-field/accessor
changes.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22610).
* #22612
* #22611
* #22683
* __->__ #22610
## Summary
Remove the deprecated `experimental_instructions_file` config setting
from the typed config surface and the remaining deprecation-notice
plumbing. `model_instructions_file` remains the supported setting and
its loading path is unchanged.
The setting was deprecated when it was renamed to
`model_instructions_file` on January 20, 2026 in
https://github.com/openai/codex/pull/9555.
## Changes
- Remove `experimental_instructions_file` from `ConfigToml` and
`ConfigProfile`.
- Delete the custom config-layer scan and session deprecation notice for
the removed setting.
- Stop clearing the removed field from generated session config locks.
- Remove the obsolete deprecation-notice test case while keeping
`model_instructions_file` coverage intact.
## Validation
- `just write-config-schema`
- `just fmt`
- `cargo test -p codex-config`
- `cargo test -p codex-core model_instructions_file`
- `just fix -p codex-core`
- `git diff --check`
Co-authored-by: Codex <noreply@openai.com>
## Why
The Responses API test support already has structured SSE event
builders. Keeping separate JSON fixture loaders made small mock streams
harder to read and left an on-disk fixture for a single event.
## What changed
- Removed `load_sse_fixture` and `load_sse_fixture_with_id_from_str`
from `core_test_support`.
- Deleted the one `tests/fixtures/incomplete_sse.json` Responses API
fixture.
- Replaced the remaining call sites with `responses::sse(...)` and
existing event helpers.
## Validation
- `cargo test -p codex-core --test all
stream_no_completed::retries_on_early_close`
- `cargo test -p codex-core --test all
history_dedupes_streamed_and_final_messages_across_turns`
- `cargo test -p codex-core --test all review::`
## Summary
Removes the feature since this is effectively on by default in all cases
where we should use it, or can be configured via models.json.
## Testing
- [x] unit tests pass
## Why
Some core integration-test paths were creating Codex state under ambient
`~/.codex`. In environments where `HOME=/tmp`, that showed up as
`/tmp/.codex`, which is host-level shared state and makes these tests
environment/order sensitive.
The affected paths were:
- `core/tests/suite/live_cli.rs`: `run_live()` spawned the real CLI with
a temp cwd, but without an isolated home, so the child resolved Codex
home from ambient `HOME`.
- core / exec-server integration test binaries using
`configure_test_binary_dispatch(...)`: their startup ctor installs arg0
helper aliases like `apply_patch` and `codex-linux-sandbox`. Full
`arg0_dispatch()` also installs aliases from ambient Codex-home
resolution, so test-binary startup could create `CODEX_HOME/tmp/arg0`;
with `HOME=/tmp`, that became `/tmp/.codex/tmp/arg0/...`.
## What changed
- `live_cli` now gives the spawned CLI a temp `HOME` and temp
`CODEX_HOME`.
- arg0 alias setup now has an explicit-home form,
`prepend_path_entry_for_codex_aliases_in(...)`, so test helpers can
place alias state under a temp directory without relying on ambient
`CODEX_HOME`.
- helper re-entry behavior is preserved with
`dispatch_arg0_if_needed()`, so aliases like `apply_patch` and
`codex-linux-sandbox` still dispatch correctly before test alias
installation.
- core test support keeps the temp Codex home alive for the lifetime of
the test binary, matching the alias lifetime.
## Verification
Verified on `dev2` with `HOME=/tmp` that the focused core test-binary
startup path no longer recreates `/tmp/.codex`.
Also checked the exact `live_cli` test path under `HOME=/tmp`; on `dev2`
it still hits the existing remote-only `cargo_bin("codex-rs")`
resolution failure before spawning the child, but `/tmp/.codex` remains
absent after the run.
## Why
The Docker remote-env coverage was failing before it reached the
behavior those tests are meant to exercise. The remote-aware test
fixture only registered the remote environment, so tests that
intentionally select both `local` and `remote` could not start a turn.
After that was fixed, two tests exposed stale fixtures: the approval
test was auto-approving under workspace-write, and the remote
`view_image` test was writing invalid PNG bytes.
## What Changed
- Added `EnvironmentManager::create_for_tests_with_local(...)` so tests
can keep the provider default while also selecting `local` explicitly.
- Updated `build_remote_aware()` to use that test-only manager when a
remote exec-server URL is present.
- Changed the remote apply-patch approval helper to use
`SandboxPolicy::new_read_only_policy()` so the test actually exercises
approval caching per environment.
- Replaced the hardcoded remote `view_image` PNG blob with the existing
`png_bytes(...)` helper so the test uses a valid image fixture.
## Validation
Ran these isolated Docker remote-env tests on the devbox with
`$remote-tests` setup:
-
`suite::remote_env::apply_patch_freeform_routes_to_selected_remote_environment`
-
`suite::remote_env::apply_patch_approvals_are_remembered_per_environment`
-
`suite::remote_env::apply_patch_intercepted_exec_command_routes_to_selected_remote_environment`
-
`suite::remote_env::exec_command_routes_to_selected_remote_environment`
- `suite::view_image::view_image_routes_to_selected_remote_environment`
All five pass.
## Why
Some MCP OAuth providers require a pre-registered public client ID and
cannot rely on dynamic client registration. Codex already supports MCP
OAuth, but it had no way to supply that client ID from config into the
PKCE flow.
## What changed
- add `oauth.client_id` under `[mcp_servers.<server>]` config, including
config editing and schema generation
- thread the configured client ID through CLI, app-server, plugin login,
and MCP skill dependency OAuth entrypoints
- configure RMCP authorization with the explicit client when present,
while preserving the existing dynamic-registration path when it is
absent
- add focused coverage for config parsing/serialization and OAuth URL
generation
## Verification
- `cargo test -p codex-config -p codex-rmcp-client -p codex-mcp -p
codex-core-plugins`
- `cargo test -p codex-core blocking_replace_mcp_servers_round_trips
--lib`
- `cargo test -p codex-core
replace_mcp_servers_streamable_http_serializes_oauth_resource --lib`
- `cargo test -p codex-core config_schema_matches_fixture --lib`
## Notes
Broader local package runs still hit unrelated pre-existing stack
overflows in:
- `codex-app-server::in_process_start_clamps_zero_channel_capacity`
-
`codex-core::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`
## Why
Some sandboxed integration tests enabled both ambient temp roots
(`TMPDIR` and literal `/tmp`) even though they were not testing
temp-root behavior. On Linux bwrap, making `/tmp` writable causes
protected metadata mount targets such as `/tmp/.git`, `/tmp/.agents`,
and `/tmp/.codex` to be synthesized. If a run is interrupted, those
top-level markers can be left behind and contaminate later tests.
## What changed
For the incidental integration tests that do not need ambient temp-root
access, set `exclude_tmpdir_env_var` and `exclude_slash_tmp` to `true`.
Dedicated protected-metadata coverage remains in the lower-level sandbox
tests that use isolated temp roots.
## Verification
Focused remote devbox repros passed with a watcher polling `/tmp/.git`,
`/tmp/.agents`, and `/tmp/.codex`; no leaked markers were observed.
## Why
`--profile-v2 <name>` gives launchers and runtime entry points a named
profile config without making each profile duplicate the base user
config. The base `$CODEX_HOME/config.toml` still loads first, then
`$CODEX_HOME/<name>.config.toml` layers above it and becomes the active
writable user config for that session.
That keeps shared defaults, plugin/MCP setup, and managed/user
constraints in one place while letting a named profile override only the
pieces that need to differ.
## What Changed
- Added the shared `--profile-v2 <name>` runtime option with validated
plain names, now represented by `ProfileV2Name`.
- Extended config layer state so the base user config and selected
profile config are both `User` layers; APIs expose the active user layer
and merged effective user config.
- Threaded profile selection through runtime entry points: `codex`,
`codex exec`, `codex review`, `codex resume`, `codex fork`, and `codex
debug prompt-input`.
- Made user-facing config writes go to the selected profile file when
active, including TUI/settings persistence, app-server config writes,
and MCP/app tool approval persistence.
- Made plugin, marketplace, MCP, hooks, and config reload paths read
from the merged user config so base and profile layers both participate.
- Updated app-server config layer schemas to mark profile-backed user
layers.
## Limits
`--profile-v2` is still rejected for config-management subcommands such
as feature, MCP, and marketplace edits. Those paths remain tied to the
base `config.toml` until they have explicit profile-selection semantics.
Some adjacent background writes may still update base or global state
rather than the selected profile:
- marketplace auto-upgrade metadata
- automatic MCP dependency installs from skills
- remote plugin sync or uninstall config edits
- personality migration marker/default writes
## Verification
Added targeted coverage for profile name validation, layer
ordering/merging, selected-profile writes, app-server config writes,
session hot reload, plugin config merging, hooks/config fixture updates,
and MCP/app approval persistence.
---------
Co-authored-by: Codex <noreply@openai.com>
# Why
`PreToolUse.additionalContext` became model-visible after #20692, but
the hook-output spilling path from #21069 never picked up that newer
lane. As a result, oversized `PreToolUse` context could bypass the
truncation/spill treatment that already applies to the other hook
outputs Codex forwards to the model.
# What
- Run `PreToolUseOutcome.additional_contexts` through
`maybe_spill_texts(...)`
- Add an integration test proving a large `PreToolUse.additionalContext`
is replaced with a truncated preview plus spill-file pointer, while the
full text is preserved on disk.
## Why
Recent session history showed no active use of the raw `shell`,
`local_shell`, or `container.exec` execution surfaces. Keeping those
handlers/specs wired into core leaves duplicate shell execution paths
alongside the supported `shell_command` and unified exec tools.
## What changed
- Removed the raw `shell` handler/spec and its `ShellToolCallParams`
protocol helper.
- Removed the legacy `local_shell` and `container.exec` handler/spec
plumbing while preserving persisted-history compatibility for old
response items.
- Normalized model/config `default` and `local` shell selections to
`shell_command`.
- Pruned tests that exercised removed raw-shell/local-shell/apply-patch
variants and kept coverage on `shell_command`, unified exec, and
freeform `apply_patch`.
## Verification
- `git diff --check`
- `cargo test -p codex-protocol`
- `cargo test -p codex-tools`
- `cargo test -p codex-core tools::handlers::shell`
- `cargo test -p codex-core tools::spec`
- `cargo test -p codex-core tools::router`
- `cargo test -p codex-core
active_call_preserves_triggering_command_context`
- `cargo test -p codex-core guardian_tests`
- `cargo test -p codex-core --test all shell_serialization`
- `cargo test -p codex-core --test all apply_patch_cli`
- `cargo test -p codex-core --test all shell_command_`
- `cargo test -p codex-core --test all local_shell`
- `cargo test -p codex-core --test all otel::`
- `cargo test -p codex-core --test all hooks::`
- `just fix -p codex-core`
- `just fix -p codex-tools`
## Why
Stop sending duplicate `session_id`/`thread_id` headers. We only want
the hyphenated forms as `_` is rejected by some proxies
Related discussion here:
https://openai.slack.com/archives/C095U48JNL9/p1778508316923179
## What
- Keep `session-id` and `thread-id`
- Remove the underscore aliases
## Why
Spawned agents can already override `model` and `reasoning_effort`, but
they have no equivalent way to opt into a model-supported service tier.
That makes it impossible to preserve or intentionally select tiered
execution behavior when delegating work to a sub-agent, even though the
model catalog already advertises supported `service_tiers`.
## What changed
- Add optional `service_tier` to both legacy and `MultiAgentV2`
`spawn_agent` tool inputs.
- Show each picker-visible model's supported service tier ids and
descriptions in the `spawn_agent` tool guidance.
- Resolve service tier selection after the child agent's effective model
is known.
- Inherit the parent tier when omitted and still supported by the final
child model; otherwise clear it.
- Reject explicit unsupported tier requests with a model-facing error.
- Keep explicit `service_tier` usable on full-history forks, while still
honoring the existing model/reasoning fork restrictions.
- Hide `service_tier` alongside other spawn metadata when
`hide_spawn_agent_metadata` is enabled.
## Verification
Added focused coverage for:
- v1/v2 `spawn_agent` schema exposure for `service_tier`
- tier descriptions in spawn guidance
- hidden-metadata suppression
- explicit supported tier selection
- explicit unknown and unsupported tier rejection
- inherited tier preservation or clearing based on child-model support
- full-history fork acceptance for explicit service tiers in both v1 and
v2
Local Rust tests were not run in this workspace per repo guidance; the
new coverage is included for CI.
## Why
`UnavailableDummyTools` kept synthetic placeholder tools alive for
historical tool calls whose backing MCP tool was no longer available.
That path adds stale model-visible tool specs and special routing at the
point where unavailable MCP calls should use ordinary current-tool
handling. This removes the runtime backfill instead of preserving a
second compatibility lane.
## Is it safe to remove?
The unavailable tools were added in #17853 after a CS issue when a
previously-called MCP tool failed to load and was omitted from the CS
spec. Now that we have tool search, I think this is resolved:
- API merges tools from previous TST output into effective tool set so
theyre always in CS spec
- if an MCP tool surfaced by TST later becomes unavailable, the model
can still call it and it will just return model-visible error
- both TST output and function call output are dropped on compaction so
model will not remember old calls to MCP post compaction
## What changed
- Delete unavailable-tool collection, placeholder handler, router/spec
plumbing, and obsolete placeholder coverage.
- Keep `features.unavailable_dummy_tools` as a removed no-op feature
tombstone so existing configs still parse cleanly.
- Add an integration-style `tool_search` regression test showing that a
deferred MCP tool surfaced through `tool_search` still routes through
MCP and returns a model-visible tool-call error rather than `unsupported
call`.
## Verification
- `cargo test -p codex-core tool_search`
## Why
`CODEX_RS_SSE_FIXTURE` let integration-style CLI, exec, and TUI tests
bypass the normal Responses transport by reading SSE from local files.
That kept test-only behavior wired through production client code. The
affected tests can stay hermetic by using the existing
`core_test_support::responses` mock server and passing `openai_base_url`
instead.
## What Changed
- Removed the `CODEX_RS_SSE_FIXTURE` flag,
`codex_api::stream_from_fixture`, the `env-flags` dependency, and the
checked-in SSE fixture files.
- Repointed the affected core, exec, and TUI tests at `MockServer` with
the existing SSE event constructors.
- Removed the Bazel test data plumbing for the deleted fixtures and
refreshed cargo/Bazel lock state.
## Verification
- `cargo build -p codex-cli`
- `cargo test -p codex-api`
- `cargo test -p codex-core --test all responses_api_stream_cli`
- `cargo test -p codex-core --test all
integration_creates_and_checks_session_file`
- `cargo test -p codex-exec --test all ephemeral`
- `cargo test -p codex-exec --test all resume`
- `cargo test -p codex-tui --test all
resume_startup_does_not_consume_model_availability_nux_count`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just fix -p codex-api -p codex-core -p codex-exec -p codex-tui`
- `git diff --check`
- make ThreadStore::update_thread_metadata accept a broad range of
metadata patches
- keep ThreadStore::append_items as raw canonical history append (no
metadata side effects)
- in the local store, write these metadata updates to a combination of
sqlite and rollout jsonl files for backwards-compat. It special cases
which fields need to go into jsonl vs sqlite vs whatever, confining the
awkwardness to just this implementation
- in remote stores we can simply persist the metadata directly to a
database, no special casing required.
- move the "implicit metadata updates triggered by appending rollout
items" from the RolloutRecorder (which is local-threadstore-specific) to
the LiveThread layer above the ThreadStore, inside of a private helper
utility called ThreadMetadataSync. LiveThread calls ThreadStore
append_items and update_metadata separately.
- Add a generic update metadata method to ThreadManager that works on
both live threads and "cold" threads
- Call that ThreadManager method from app server code, so app server
doesn't need to worry about whether the thread is live or not
## Why
`tool_search` already had solid end-to-end coverage for discovery and
follow-up execution, but it did not prove that distinct pieces of
indexed search text actually work in integration. In particular, we were
not exercising whether unique tool names, descriptions, namespaces,
underscore-expanded dynamic names, and schema-property terms were
sufficient to surface the expected deferred tools.
This change adds focused integration coverage for those term sources so
regressions in search text construction are caught by a real `TestCodex`
flow instead of only by lower-level unit tests.
## What changed
- added a small helper in `core/tests/suite/search_tool.rs` to assert
that a `tool_search_output` contains an expected namespace child tool
- added an MCP integration test that issues several `tool_search_call`s
and verifies distinct query terms match the expected app tools:
- exact tool name: `calendar_timezone_option_99`
- tool description phrase: `uploaded document`
- top-level schema property: `starts_at`
- added a dynamic-tool integration test that verifies distinct query
terms match the expected deferred dynamic tool:
- exact name: `quasar_ping_beacon`
- underscore-expanded name: `quasar ping beacon`
- description phrase: `saffron metronome`
- namespace: `orbit_ops`
- schema property: `chrono_spec`
## Validation
- `cargo test -p codex-core tool_search_matches_`
## Docs
No documentation update needed.
## Summary
This refactor makes tool handlers the owner of the specs they can
publish, so registry construction can register handlers once and
separately publish only the specs that should be model-visible.
The main motivation is deferred tools: MCP and dynamic tools still need
handlers registered up front, but deferred tools should be discoverable
through `tool_search` rather than emitted in the initial tool spec list.
## What changed
- `McpHandler` and `DynamicToolHandler` can return their own `ToolSpec`.
- `build_tool_registry_builder` now collects handlers, registers them
through the no-spec path, and publishes only non-deferred handler specs.
- Deferred MCP and dynamic tool names are combined into one
`all_deferred_tools` set that drives spec filtering, code-mode
deferred-tool signaling, and `tool_search` registration.
- `tool_search` registration now requires both deferred tools and
`namespace_tools`.
- Namespace specs are merged in `spec_plan`, preserving top-level spec
order, sorting tools within each namespace, and backfilling empty
namespace descriptions.
- Hosted web search and image-generation specs are included in the
collected spec vector before namespace merge/publication, and tool-name
tests that should not care about hosted relative order now compare sets.
## Testing
- `cargo test -p codex-core tools::spec::tests:: -- --nocapture`
- `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture`
- `cargo test -p codex-core
tools::router::tests::specs_filter_deferred_dynamic_tools --
--nocapture`
- `cargo test -p codex-core
suite::prompt_caching::prompt_tools_are_consistent_across_requests --
--nocapture`
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-core -- --skip
tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
passed the library suite after skipping the known stack-overflowing unit
test.
Full `cargo test -p codex-core` currently hits a stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`;
the same focused test reproduces on `origin/main`.
## Summary
Adds include_collaboration_mode_instructions, which is a config
equivalent to include_permissions_instructions for collaboration modes.
Desired for situations where we want to disable this instruction from
entering the context
## Testing
- [x] Added unit test
## Why
The split filesystem policy stack already supports exact and glob
`access = none` read restrictions on macOS and Linux. Windows still
needed subprocess handling for those deny-read policies without claiming
enforcement from a backend that cannot provide it.
## Key finding
The unelevated restricted-token backend cannot safely enforce deny-read
overlays. Its `WRITE_RESTRICTED` token model is authoritative for write
checks, not read denials, so this PR intentionally fails that backend
closed when deny-read overrides are present instead of claiming
unsupported enforcement.
## What changed
This PR adds the Windows deny-read enforcement layer and makes the
backend split explicit:
- Resolves Windows deny-read filesystem policy entries into concrete ACL
targets.
- Preserves exact missing paths so they can be materialized and denied
before an enforceable sandboxed process starts.
- Snapshot-expands existing glob matches into ACL targets for Windows
subprocess enforcement.
- Honors `glob_scan_max_depth` when expanding Windows deny-read globs.
- Plans both the configured lexical path and the canonical target for
existing paths so reparse-point aliases are covered.
- Threads deny-read overrides through the elevated/logon-user Windows
sandbox backend and unified exec.
- Applies elevated deny-read ACLs synchronously before command launch
rather than delegating them to the background read-grant helper.
- Reconciles persistent deny-read ACEs per sandbox principal so policy
changes do not leave stale deny-read ACLs behind.
- Fails closed on the unelevated restricted-token backend when deny-read
overrides are present, because its `WRITE_RESTRICTED` token model is not
authoritative for read denials.
## Landed prerequisites
These prerequisite PRs are already on `main`:
1. #15979 `feat(permissions): add glob deny-read policy support`
2. #18096 `feat(sandbox): add glob deny-read platform enforcement`
3. #17740 `feat(config): support managed deny-read requirements`
This PR targets `main` directly and contains only the Windows deny-read
enforcement layer.
## Implementation notes
- Exact deny-read paths remain enforceable on the elevated path even
when they do not exist yet: Windows materializes the missing path before
applying the deny ACE, so the sandboxed command cannot create and read
it during the same run.
- Existing exact deny paths are preserved lexically until the ACL
planner, which then adds the canonical target as a second ACL target
when needed. That keeps both the configured alias and the resolved
object covered.
- Windows ACLs do not consume Codex glob syntax directly, so glob
deny-read entries are expanded to the concrete matches that exist before
process launch.
- Glob traversal deduplicates directory visits within each pattern walk
to avoid cycles, without collapsing distinct lexical roots that happen
to resolve to the same target.
- Persistent deny-read ACL state is keyed by sandbox principal SID, so
cleanup only removes ACEs owned by the same backend principal.
- Deny-read ACEs are fail-closed on the elevated path: setup aborts if
mandatory deny-read ACL application fails.
- Unelevated restricted-token sessions reject deny-read overrides early
instead of running with a silently unenforceable read policy.
## Verification
- `cargo test -p codex-core
windows_restricted_token_rejects_unreadable_split_carveouts`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-windows-sandbox`
- GitHub Actions rerun is in progress on the pushed head.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Older sessions can contain model-warning records persisted as `user`
messages, including the unified exec process-limit warning, the
`apply_patch`-via-`exec_command` warning, and the model-mismatch
high-risk cyber fallback warning. Those warnings are no longer produced
as conversation history items, but when old sessions compact they should
still be recognized as injected context rather than preserved as real
user turns.
## What changed
- Removed `record_model_warning` and the production paths that emitted
these warning messages into conversation history.
- Added `LegacyUnifiedExecProcessLimitWarning`,
`LegacyApplyPatchExecCommandWarning`, and `LegacyModelMismatchWarning`
contextual fragments that are used only for matching old persisted
messages.
- Registered the legacy fragments with contextual user message detection
so compaction filters them through the existing fragment path.
- Added focused compaction coverage for old warning messages being
dropped during compacted-history processing.
## Testing
- `cargo test -p codex-core warning`
- `just fix -p codex-core`
## Why
`PreToolUse` already exposes `updatedInput` in its hook output schema,
but Codex currently rejects it instead of applying the rewrite. That
leaves hook authors unable to make the documented pre-execution
adjustment to a tool call before it runs.
## What
- Accept `updatedInput` from `PreToolUse` hooks when paired with
`permissionDecision: "allow"`.
- Apply the rewritten input before dispatch so the tool executes the
updated payload, not the original one.
- Preserve the stable hook-facing compatibility shapes that
participating tool handlers expose:
- Bash-like tools (`shell`, `container.exec`, `local_shell`,
`shell_command`, `exec_command`) use `{ "command": ... }`.
- `apply_patch` exposes its patch body through the same command-shaped
hook contract.
- MCP tools expose their JSON argument object directly.
- Keep each participating tool handler responsible for translating
hook-facing `updatedInput` back into its concrete invocation shape.
## Verification
Direct Bash-like rewrite coverage:
- `pre_tool_use_rewrites_shell_before_execution`
- `pre_tool_use_rewrites_container_exec_before_execution`
- `pre_tool_use_rewrites_local_shell_before_execution`
- `pre_tool_use_rewrites_shell_command_before_execution`
- `pre_tool_use_rewrites_exec_command_before_execution`
These cases assert that each supported Bash-like surface runs only the
rewritten command while the hook still observes the original `{
"command": ... }` input.
`pre_tool_use_rewrites_apply_patch_before_execution`
- Model emits one patch.
- Hook swaps in a different patch.
- Asserts only the rewritten file is created, and the hook saw the
original patch.
`pre_tool_use_rewrites_code_mode_nested_exec_command_before_execution`
- Model runs one nested shell command from code mode.
- Hook rewrites it.
- Asserts only the rewritten command runs, and the hook saw the original
nested input.
`pre_tool_use_rewrites_mcp_tool_before_execution`
- Model calls the RMCP echo tool.
- Hook rewrites the MCP arguments.
- Asserts the MCP server receives and returns the rewritten message, not
the original one.
## Summary
- create a selected-cwd filesystem sandbox context for view_image
metadata and file reads in both local and remote environments
- add a local restricted-profile regression test for the previously
unsandboxed read path
## Validation
- just fmt
- bazel test --bes_backend= --bes_results_url= --test_output=errors
--test_filter=view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads
//codex-rs/core:core-unit-tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add multi-environment apply_patch routing for both freeform and
function-call tool flows
- parse and reconcile the optional environment selector in the main
apply_patch parser, then verify against the selected environment in the
handler
- carry environment_id through runtime and approval surfaces so
remote-targeted patches stay explicit end to end
## Testing
- just fmt
- remote exec-server e2e: `cargo test -p codex-core --test all
apply_patch_multi_environment_uses_remote_executor -- --nocapture` on
dev via `scripts/test-remote-env.sh`
---------
Co-authored-by: Codex <noreply@openai.com>