## Summary
- Change `EnvironmentProvider` to return concrete `Environment`
instances instead of `EnvironmentConfigurations`.
- Make `DefaultEnvironmentProvider` provide the provider-visible `local`
environment plus optional `remote` environment from
`CODEX_EXEC_SERVER_URL`.
- Keep `EnvironmentManager` as the concrete cache while exposing its own
explicit local environment for `local_environment()` fallback paths.
## Validation
- `just fmt`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`PermissionProfile` is the canonical runtime permission model in the
Rust workspace, but the Linux sandbox helper still accepted a legacy
`SandboxPolicy` plus separate filesystem and network policy flags. That
translation layer made the helper interface harder to reason about and
left `linux-sandbox`-specific callers and tests coupled to the legacy
policy representation.
This change moves the helper onto `PermissionProfile` directly so the
Linux sandbox plumbing matches the rest of the permission stack.
## What changed
- changed `codex-linux-sandbox` to accept `--permission-profile` and
derive the runtime filesystem and network policies internally
- updated the in-process seccomp and legacy Landlock path in
`codex-rs/linux-sandbox` to operate on `PermissionProfile`
- updated Linux sandbox argv construction in `codex-rs/sandboxing`,
`codex-rs/core`, and the CLI debug sandbox path to pass the canonical
profile instead of serializing compatibility policy projections
- simplified the Linux sandbox tests to build the exact permission
profile under test, including the managed-proxy path and
direct-runtime-enforcement carveout coverage
- removed helper-local `SandboxPolicy` usage from `bwrap` tests where
`FileSystemSandboxPolicy` is already the value being exercised
## Testing
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-linux-sandbox` (on this macOS host, the crate
compiled cleanly and its Linux-only tests were cfg-gated)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-cli --no-run`
## Summary
This fixes the CI regression introduced by
[#20040](https://github.com/openai/codex/pull/20040).
That PR migrated several `apply_patch_cli` tests from direct
`codex.submit(Op::UserTurn { ... })` calls to `harness.submit(...)`.
`harness.submit()` waits for `TurnComplete` before returning, which
drains the same event stream that these tests use to assert `TurnDiff`,
`PatchApplyUpdated`, and related live events. The regressed tests then
timed out waiting for events that had already been consumed.
This change restores a no-wait submit path for the event-observing
`apply_patch_cli` tests so they can watch the turn stream directly
again.
## What Changed
- added a local `submit_without_wait(...)` helper in
`codex-rs/core/tests/suite/apply_patch_cli.rs`
- switched the `apply_patch_cli` tests that assert live turn events back
to that helper
- left the profile-backed `harness.submit(...)` migration in place for
tests that only care about final filesystem or tool output state
## Why macOS Looked Green
In the failing run
[25084487331](https://github.com/openai/codex/actions/runs/25084487331),
`//codex-rs/core:core-all-test` was cached on macOS, so the regressed
tests were not rerun there. The Linux GNU, Linux MUSL, and Windows Bazel
jobs reran the target and exposed the failure.
## Verification
- `cargo test -p codex-core apply_patch_ -- --nocapture`
- previously failing local cases now pass again:
- `apply_patch_cli_move_without_content_change_has_no_turn_diff`
- `apply_patch_turn_diff_for_rename_with_content_change`
- `apply_patch_aggregates_diff_across_multiple_tool_calls`
## Why
Unsupported features must fail closed and Codex must not expose
OpenAI-hosted fallback paths when the active provider cannot support
them. In practice, Bedrock should not surface app connectors, MCP
servers, tool search/suggestions, image generation, web search, or JS
REPL until those paths are explicitly supported for that provider.
This PR moves that decision into provider-owned capability metadata
instead of scattering Bedrock-specific checks across callers.
## What changed
- Adds `ProviderCapabilities` to `codex-model-provider`, with default
support for existing providers and a Bedrock override that disables
unsupported launch surfaces.
- Adds `ToolCapabilityBounds` to `codex-tools` so provider capability
limits can clamp otherwise-enabled tool config.
- Applies capability bounds when building session and review-thread tool
config.
- Routes MCP/app connector configuration through
`McpManager::mcp_config`, which filters configured MCP servers and app
connectors based on the active provider.
- Updates app-server MCP list/read paths to use the filtered MCP config.
- Adds coverage for default provider capabilities, Bedrock disabled
capabilities, and optional tool-surface clamping.
## Testing
built locally and verified that bedrock responses api now return without
errors calling unsupported tools.
## Summary
- Add `disable_tool_suggest` to app and plugin config, schema, and
TypeScript output
- Exclude disabled connectors and plugins from tool suggestion discovery
- Persist "never show again" tool-suggestion choices back into
`config.toml`
- Update config docs and add coverage for connector and plugin
suppression
## Testing
- Added and updated unit tests for config persistence and tool-suggest
filtering
- Not run (not requested)
## Summary
- Removes `SandboxPolicy` from the hooks test suite.
- Submits hook-related turns with explicit `PermissionProfile` values
for disabled, read-only, and workspace-write cases.
- Preserves the managed-network hook test by configuring and submitting
a workspace-write profile with enabled network, allowing the existing
requirements-backed proxy path to remain covered.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Removes `SandboxPolicy` from the RMCP client test suite.
- Adds shared read-only user-turn helpers that submit
`PermissionProfile::read_only()` plus the legacy compatibility
projection required by the current `Op::UserTurn` shape.
- Keeps sandbox metadata assertions intact by deriving the expected
legacy `sandboxPolicy` value from the same read-only profile used for
the turn.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Removes the remaining `SandboxPolicy` usage from the compaction test
suite.
- Adds a small local helper for direct `Op::UserTurn` construction so
these tests send `PermissionProfile::Disabled` plus the legacy
compatibility projection required by the protocol field.
- Keeps the existing danger/full-access behavior while exercising the
canonical permission profile path.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Updates the zsh-fork test helper to configure `PermissionProfile`
directly instead of constructing a legacy `SandboxPolicy`.
- Sends permission-profile-backed turns from the skill approval zsh-fork
tests so the runtime and request path exercise the canonical permissions
model.
- Leaves the broader approvals suite on legacy policies for now, except
for the zsh-fork test that shares this helper.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
This migrates the macOS request-permissions tool tests from legacy
`SandboxPolicy` setup to `PermissionProfile` setup. The tests still
exercise the same workspace-write baseline and request-permission
grants, but the canonical permissions value is now the profile.
## Changes
- Replaces the `workspace_write_excluding_tmp()` helper with a
`PermissionProfile::workspace_write_with()` helper.
- Applies test config through `Permissions::set_permission_profile()`.
- Uses `turn_permission_fields()` for `Op::UserTurn` compatibility
fields.
- Removes the `SandboxPolicy` import from `request_permissions_tool.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This removes the explicit `SandboxPolicy` constructors from
`core/tests/suite/prompt_caching.rs`. The tests still exercise the same
prompt-cache invariants across permission and turn-context changes, but
the permission source is now `PermissionProfile`.
## Changes
- Uses `PermissionProfile::workspace_write_with()` for workspace-write
override scenarios.
- Uses `PermissionProfile::Disabled` for the no-sandbox per-turn
override.
- Projects profiles through `turn_permission_fields()` or
`to_legacy_sandbox_policy()` only to populate compatibility fields on
existing ops.
- Removes the `SandboxPolicy` import from `prompt_caching.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This migrates `core/tests/suite/exec_policy.rs` away from legacy
`SandboxPolicy` turn construction. These tests all use no-sandbox turns
to exercise exec-policy behavior, so `PermissionProfile::Disabled` is
the canonical representation.
## Changes
- Replaces direct `SandboxPolicy::DangerFullAccess` turn fields with
`PermissionProfile::Disabled`.
- Uses `turn_permission_fields()` to populate the compatibility
`sandbox_policy` field required by `Op::UserTurn`.
- Removes the `SandboxPolicy` import from `exec_policy.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This removes another test-only `SandboxPolicy` dependency by configuring
`permissions_messages.rs` with a `PermissionProfile` directly. The test
still verifies the rendered compatibility permissions text, but now
obtains the legacy projection from the loaded `Config` rather than using
`SandboxPolicy` as the source of truth.
## Changes
- Builds the workspace-write test setup with
`PermissionProfile::workspace_write_with()`.
- Applies that profile through `Permissions::set_permission_profile()`.
- Uses `Config::legacy_sandbox_policy()` only for the expected
`PermissionsInstructions` compatibility rendering.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This continues the test-side migration away from `SandboxPolicy` by
removing the remaining legacy policy setup in
`core/tests/suite/tools.rs`. The affected test was already modeling a
profile-backed filesystem policy with a deny-read glob, so configuring
the test through `Permissions::set_permission_profile()` is a better
match for the behavior being exercised.
## Changes
- Drops the `SandboxPolicy` import from `core/tests/suite/tools.rs`.
- Configures the glob deny-read shell test directly with a
`PermissionProfile` instead of creating a legacy read-only policy first.
- Submits the test turn with the session permission profile so the
deny-read glob remains active for the command under test.
## Verification
- `cargo check -p codex-core --tests`
## Why
The core item tests still had a cluster of plan-mode `Op::UserTurn`
literals that used `SandboxPolicy::DangerFullAccess` and omitted
`permission_profile`. These tests are validating emitted item lifecycle
events, so keeping them on the legacy sandbox-only turn shape adds noise
to the broader permissions migration without testing legacy behavior.
## What Changed
- Adds a local `disabled_plan_turn()` helper that preserves the existing
`std::env::current_dir()` turn cwd behavior.
- Uses `turn_permission_fields(PermissionProfile::Disabled, cwd)` to
populate both the compatibility `sandbox_policy` and canonical
`permission_profile` fields.
- Replaces the plan-mode hand-built turns in
`codex-rs/core/tests/suite/items.rs`, removing all `SandboxPolicy`
references from that file and reducing remaining `codex-rs/core/tests`
`SandboxPolicy` files from 16 to 15.
## Verification
- `cargo check -p codex-core --tests`
## Why
This stack is retiring direct `SandboxPolicy` construction from tests so
core coverage exercises the same `PermissionProfile` turn path used by
runtime code. `safety_check_downgrade.rs` still submitted each test turn
as `SandboxPolicy::DangerFullAccess` with no permission profile, even
though the tests are about model verification/reroute behavior rather
than legacy sandbox conversion.
## What Changed
- Adds a local `disabled_text_turn()` helper that derives both the
compatibility `sandbox_policy` and canonical `permission_profile` from
`PermissionProfile::Disabled`.
- Replaces repeated hand-built `Op::UserTurn` literals in
`codex-rs/core/tests/suite/safety_check_downgrade.rs` with that helper.
- Removes all `SandboxPolicy` references from the safety-check suite,
reducing the remaining `codex-rs/core/tests` files that mention
`SandboxPolicy` from 17 to 16.
## Verification
- `cargo check -p codex-core --tests`
## Why
This stack is removing direct `SandboxPolicy` usage from test code so
new tests exercise the same `PermissionProfile` path that runtime code
now treats as canonical. `view_image.rs` still built `Op::UserTurn`
requests with `SandboxPolicy::DangerFullAccess` and no permission
profile, which kept another core test module on the legacy turn shape.
## What Changed
- Adds a small `disabled_user_turn()` helper for the view-image suite
that derives the compatibility `sandbox_policy` and canonical
`permission_profile` from `PermissionProfile::Disabled`.
- Replaces repeated direct `Op::UserTurn` literals in
`codex-rs/core/tests/suite/view_image.rs` with that helper.
- Removes all `SandboxPolicy` references from `view_image.rs`, reducing
the remaining `codex-rs/core/tests` files that mention `SandboxPolicy`
from 18 to 17.
## Verification
- `cargo check -p codex-core --tests`
## Summary
- Migrates `model_switching.rs` and `personality.rs` direct
`Op::UserTurn` construction from legacy `SandboxPolicy` literals to
`PermissionProfile`-backed turn fields.
- Adds small local helpers in each file so tests keep asserting
model/personality behavior without repeating permission plumbing.
- Reduces `rg -l '\bSandboxPolicy\b' codex-rs/core/tests` from 20 files
to 18; `codex-rs/tui` remains at zero `SandboxPolicy` references.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Replace legacy sandbox config setup in delegate and telemetry tests
with direct `PermissionProfile` configuration.
- Move no-sandbox and read-only test turns in `tools.rs`,
`code_mode.rs`, `user_shell_cmd.rs`, and `model_visible_layout.rs` from
legacy `SandboxPolicy` values to `PermissionProfile` helpers, while
leaving the deny-glob read-only compatibility case for a later targeted
cleanup.
- Use `PermissionProfile::read_only()` where tests need managed
read-only behavior and `PermissionProfile::Disabled` where they
intentionally need no sandbox.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 27
files after #20013 to 22 files.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Migrate another batch of direct `Op::UserTurn` test construction from
legacy `SandboxPolicy` values to `PermissionProfile` inputs via
`turn_permission_fields()`.
- Replace a one-off read-only `SandboxPolicy` bridge in the macOS exec
test with `PermissionProfile::read_only()`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 32
files at the start of the cleanup stack to 27 files.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p codex-core`
## Summary
- Add `turn_permission_fields()` so tests that construct `Op::UserTurn`
directly can provide a canonical `PermissionProfile` while still filling
the required legacy `sandbox_policy` compatibility field.
- Migrate direct user-turn construction in core integration tests from
`SandboxPolicy::DangerFullAccess` to `PermissionProfile::Disabled`.
- Continue reducing direct `SandboxPolicy` usage in
`codex-rs/core/tests`, from 41 files after #20010 to 32 files in this
PR.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
## Summary
- Add `PermissionProfile`-based turn submission helpers to
`core_test_support`, while keeping the legacy `SandboxPolicy` helper for
tests that intentionally exercise legacy fallback behavior.
- Switch the default `TestCodex::submit_turn()` path to send a real
`PermissionProfile` plus the required legacy compatibility projection in
`Op::UserTurn`.
- Migrate straightforward app/search/shell/truncation tests from
`SandboxPolicy::{DangerFullAccess, ReadOnly}` to
`PermissionProfile::{Disabled, read_only}`.
- Add a TUI compatibility projection helper for legacy app-server fields
so non-legacy writable roots are preserved instead of being downgraded
to read-only.
- Fix remote start/resume/fork sandbox-mode projection to classify any
managed profile with writable roots as workspace-write, not only
profiles that can write `cwd`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 47
files to 41 files without changing production behavior.
## Testing
- `cargo check -p codex-core --tests`
- `cargo test -p codex-tui
compatibility_profile_preserves_unbridgeable_write_roots`
- `cargo test -p codex-tui
sandbox_mode_preserves_non_cwd_write_roots_for_remote_sessions`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
## Why
Plugins can bundle lifecycle hooks, but Codex previously only discovered
hooks from user, project, and managed config layers. This adds the
plugin discovery and runtime plumbing needed for plugin-bundled hooks
while keeping execution behind the `plugin_hooks` feature flag.
## What
- Discovers plugin hook sources from each plugin's default
`hooks/hooks.json`.
- Supports `plugin.json` manifest `hooks` entries as either relative
paths or inline hook objects.
- Plumbs discovered plugin hook sources through plugin loading into the
hook runtime when `plugin_hooks` is enabled.
- Marks plugin-originated hook runs as `HookSource::Plugin`.
- Injects `PLUGIN_ROOT` and `CLAUDE_PLUGIN_ROOT` into plugin hook
command environments.
- Updates generated schemas and hook source metadata for the plugin hook
source.
## Stack
1. This PR - openai/codex#19705
2. openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882
## Reviewer Notes
- Core logic is in `codex-rs/core-plugins/src/loader.rs` and
`codex-rs/hooks/src/engine/discovery.rs`
- Moved existing / adding new tests to
`codex-rs/core-plugins/src/loader_tests.rs` hence the large diff there
- Otherwise mostly plumbing and minor schema updates
### Core Changes
The `codex-rs/core` changes are limited to wiring plugin hook support
into existing core flows:
- `core/src/session/session.rs` conditionally pulls effective plugin
hook sources and plugin hook load warnings from `PluginsManager` when
`plugin_hooks` is enabled, then passes them into `HooksConfig`.
- `core/src/hook_runtime.rs` adds the `plugin` metric tag for
`HookSource::Plugin`.
- `core/config.schema.json` picks up the new `plugin_hooks` feature
flag, and `core/src/plugins/manager_tests.rs` updates fixtures for the
added plugin hook fields.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Rollout traces need an identifier that can be used to correlate a Codex
inference with upstream Responses API, proxy, and engine logs. The
reduced trace model already exposed `upstream_request_id`, but it was
being populated from the Responses API `response.id`. That value is
useful for `previous_response_id` chaining, but it is not the transport
request id that upstream systems key on.
This PR separates those concepts so trace consumers can reliably answer
both questions:
- which Responses API response did this inference produce?
- which upstream request handled it?
## Structure
The change keeps the upstream request id at the same lifecycle level as
the provider stream:
- `codex-api` captures the `x-request-id` HTTP response header when the
SSE stream is created and exposes it on `ResponseStream`. Fixture and
websocket streams set the field to `None` because they do not have that
HTTP response header.
- `codex-core` carries that stream-level id into `InferenceTraceAttempt`
when recording terminal stream outcomes. Completed, failed, cancelled,
dropped-stream, and pre-response error paths all record the id when it
is available.
- `rollout-trace` now records both identifiers in raw terminal inference
events and response payloads: `response_id` for the Responses API
`response.id`, and `upstream_request_id` for `x-request-id`.
- The reducer stores both fields on `InferenceCall`. It also uses
`response_id` for `previous_response_id` conversation linking, which
removes the old accidental dependency on the misnamed
`upstream_request_id` field.
- Terminal inference reduction now consumes the full terminal payload
(`InferenceCompleted`, `InferenceFailed`, or `InferenceCancelled`) in
one place. That keeps status, partial payloads, response ids, and
upstream request ids consistent across success, failure, cancellation,
and late stream-mapper events.
## Why This Shape
`x-request-id` is a property of the HTTP/provider response envelope, not
an SSE event. Capturing it once in `codex-api` and plumbing it through
terminal trace recording avoids trying to infer the value from stream
contents, and it preserves the id even when the stream fails or is
cancelled after only partial output.
Keeping `response_id` separate from `upstream_request_id` also makes the
reduced trace model less surprising: `response_id` remains the
conversation-continuation id, while `upstream_request_id` is the
operational correlation id for upstream debugging.
## Validation
The PR updates trace and reducer coverage for:
- reading `x-request-id` from SSE response headers;
- storing the true upstream request id on completed inference calls;
- preserving upstream request ids for cancelled and late-cancelled
inference streams;
- keeping `previous_response_id` reconstruction tied to `response_id`
rather than transport request ids.
## Why
MultiAgentV2 `wait_agent` currently clamps short waits to a fixed 10
second minimum. That default is still useful for preventing tight
polling loops, but it is too rigid for environments that need faster
mailbox wake-up checks or a larger minimum to discourage frequent
polling.
This PR makes the minimum wait timeout configurable from the existing
MultiAgentV2 feature config section, so operators can tune the behavior
without changing the legacy multi-agent tool surface.
## What Changed
- Added `features.multi_agent_v2.min_wait_timeout_ms`.
- Defaulted the new setting to the existing 10 second floor.
- Validated the configured value as `1..=3600000`, matching the existing
one hour maximum wait bound.
- Applied the configured minimum to MultiAgentV2 `wait_agent` runtime
clamping.
- Plumbed the configured minimum into the `wait_agent` tool schema,
including the effective default when the minimum is above the normal 30
second default.
- Regenerated `core/config.schema.json`.
## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib multi_agent_v2`
- `just fix -p codex-core`
## Why
Slow Codex turns are easier to debug when token usage is visible in the
trace itself, without joining against separate analytics. This adds
token usage to existing turn-handling spans for regular user turns only.
[Example
turn](https://openai.datadoghq.com/apm/trace/9d353efa2cb5de1f4c5b93dc33c3df04?colorBy=service&graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=3555541504891512675&spanViewType=metadata&traceQuery=)
<img width="1447" height="967" alt="Screenshot 2026-04-24 at 3 03 07 PM"
src="https://github.com/user-attachments/assets/ab7bb187-e7fc-41f0-a366-6c44610b2b2c"
/>
## What Changed
Added response-level token fields on completed handle_responses spans:
gen_ai.usage.input_tokens
gen_ai.usage.cache_read.input_tokens
gen_ai.usage.output_tokens
codex.usage.reasoning_output_tokens
codex.usage.total_tokens
Added aggregate token fields on regular turn spans:
codex.turn.token_usage.*
Added an explicit regular-turn opt-in via
SessionTask::records_turn_token_usage_on_span() so this is not coupled
to span-name strings.
## Testing
- `cargo test -p codex-otel`
- `cargo test -p codex-core
turn_and_completed_response_spans_record_token_usage`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-otel`
- Manual local Electron/app-server smoke test: regular user turn emits
the new span fields
Known status: `cargo test -p codex-core` was attempted and failed in
unrelated existing areas: config approvals, request-permissions,
git-info ordering, and subagent metadata persistence.
## Why
The migration away from `SandboxPolicy` needs new configs to start from
permissions profiles instead of deriving profiles from legacy sandbox
modes. Existing users can have empty `config.toml` files, and we should
not rewrite user-owned config files that may live in shared
repositories.
This PR introduces built-in profile names so an empty config can resolve
to a canonical `PermissionProfile`, while explicit named `[permissions]`
profiles still behave predictably.
## What changed
- Adds built-in `default_permissions` profile names:
- `:read-only` maps to `PermissionProfile::read_only()`.
- `:workspace` maps to the workspace-write profile, including
project-root metadata carveouts.
- `:danger-no-sandbox` maps to `PermissionProfile::Disabled`, preserving
the distinction between no sandbox and a broad managed sandbox.
- Reserves the `:` prefix for built-in profiles so user-defined
`[permissions]` profiles cannot collide with future built-ins.
- Allows `default_permissions` to reference a built-in profile without
requiring a `[permissions]` table.
- Makes an otherwise empty config choose a built-in profile by
trust/platform context: trusted or untrusted project roots use
`:workspace` when the platform supports that sandbox, while roots
without a trust decision use `:read-only`.
- Keeps legacy `sandbox_mode` configs on the legacy path, and still
rejects user-defined `[permissions]` profiles that omit
`default_permissions` so we do not silently guess among custom profiles.
- Preserves compatibility behavior for implicit defaults: bare
`network.enabled = true` allows runtime network without starting the
managed proxy, explicit profile proxy policy still starts the proxy, and
implicit workspace/add-dir roots keep legacy metadata carveouts.
## Verification
- `cargo test -p codex-core builtin --lib`
- `cargo test -p codex-core profile_network_proxy_config`
- `cargo test -p codex-core
implicit_builtin_workspace_profile_preserves_add_dir_metadata_carveouts`
- `cargo test -p codex-core
permissions_profiles_network_enabled_allows_runtime_network_without_proxy`
- `cargo test -p codex-core
permissions_profiles_proxy_policy_starts_managed_network_proxy`
## Documentation
Public Codex config docs should mention these built-in names when the
`[permissions]` config format is ready to document as stable.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19900).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* #20008
* __->__ #19900
## Why
Network access approval prompts were showing the generic retry reason,
which made auto-review focus on the blocked connection instead of the
command that caused it. This makes network approvals easier to assess by
telling the reviewer to evaluate whether the triggering command was
authorised by the user and within policy, and to treat the network call
as acceptable when it is a reasonable consequence of that command.
## What changed
- Split guardian approval request prompt rendering so `NetworkAccess`
has a dedicated branch.
- For network requests, show `Network approval context` and `Network
access JSON` instead of `Retry reason` / `Planned action JSON`.
- Added regression coverage for the network approval prompt wording and
for omitting retry reason in this case.
## Verification
- `cargo test -p codex-core
guardian::tests::build_guardian_prompt_items_explains_network_access_review_scope`
## Why
- Without change: MCP tool call spans include request-side details such
as server, tool, call ID, connector, session, and turn.
- Issue: Some useful telemetry is only known by the MCP server after it
handles the tool call, such as target identity or whether the call
triggered a user-facing flow.
## What Changed
- With change: Codex reads allowlisted telemetry from
`_meta["codex/telemetry"]["span"]` and records it on the
`mcp.tools.call` span.
- Adds span fields for `codex.mcp.target.id` and
`codex.mcp.user_flow.triggered`, with strict type checks and bounded
target ID length.
## Verification
`codex-rs/core/src/mcp_tool_call_tests.rs`
## Why
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with `session_id` and `turn_id`.
- Issue: MCP servers may want the turn start timestamp to measure
internal latency relative to turn start.
## What Changed
- With change: turn metadata now includes `turn_started_at_unix_ms`,
which is propagated to MCP tool calls in
`_meta["x-codex-turn-metadata"]`.
## Verification
- `codex-rs/core/src/mcp_tool_call_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/src/turn_timing_tests.rs`
- `codex-rs/core/tests/responses_headers.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
## Why
Several bug reports describe thread shutdown (including subagent
threads) leaving stdio MCP server processes behind. These reports all
point at the same lifecycle gap: Codex launches stdio MCP servers, but
the session-level shutdown path does not explicitly close MCP clients or
terminate the server process tree.
Fixes#12491Fixes#12976Fixes#18881Fixes#19469
## History
This is best understood as a regression/coverage gap in MCP session
lifecycle management, not as stdio MCP cleanup being absent all along.
#10710 added process-group cleanup for stdio MCP servers, but that
cleanup only runs when the `RmcpClient`/transport is dropped. The older
reports (#12491 and #12976) came after that cleanup existed, which
suggests the remaining problem was that some higher-level shutdown paths
kept the MCP manager alive or replaced it without explicitly draining
clients. The newer reports (#18881 and #19469) exposed the same family
around manager replacement and shutdown.
## What changed
- Added an explicit stdio MCP process handle in `codex-rmcp-client` so
local MCP servers terminate their process group and executor-backed MCP
servers call the executor process terminator.
- Added `RmcpClient::shutdown()` and manager-level MCP shutdown draining
so session shutdown, channel-close fallback, MCP refresh, and connector
probing stop owned MCP clients.
- Added regression coverage that starts a stdio MCP server, begins an
in-flight blocking tool call, shuts down the client, and asserts the
server process exits.
## Verification
- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-mcp`
- `just fix -p codex-rmcp-client`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
- Manual before/after validation with a temporary repro script:
- Pre-fix binary from `HEAD^` (`fed0a8f4fa`): reproduced the leak with
surviving MCP server and child PIDs, `survivors=[77583, 77592]`,
`leaked=true`.
- Post-fix binary from this branch (`67e318148b`): verified both MCP
processes were gone after interrupting `codex exec`, `survivors=[]`,
`leaked=false`.
## Why
The TUI currently handles keyboard shortcuts as hard-coded event matches
spread across app, composer, pager, list, approval, and navigation code.
That makes shortcuts hard to customize, makes displayed hints easy to
drift from actual behavior, and makes future keymap work riskier because
there is no central action inventory.
This PR adds the foundation for configurable, action-based keymaps
without adding the interactive remapping UI yet. Onboarding
intentionally stays on fixed startup shortcuts because users cannot
reasonably configure keymaps before completing onboarding.
This is PR1 in the keymap stack:
- PR1: #18593: configurable keymap foundation
- PR2: #18594: `/keymap` picker and guided remapping UI
- PR3: #18595: Vim composer mode and the remap option
## Design Notes
The new model resolves named actions into concrete runtime bindings once
from config, then passes those bindings to the UI surfaces that handle
input or render shortcut hints.
The main concepts are:
- **Context**: a scope where an action is active, such as `global`,
`chat`, `composer`, `editor`, `pager`, `list`, or `approval`.
- **Action**: a named operation inside a context, such as
`global.open_transcript`, `composer.submit`, or `pager.close`.
- **Binding**: one or more single-key shortcuts assigned to an action,
written as config strings such as `ctrl-t`, `alt-backspace`, or
`page-down`. Multi-step sequences such as `ctrl-x ctrl-s`, `g g`, or
leader-key flows are not part of this PR.
- **Resolution order**: context-specific config wins first, supported
global fallbacks come next, and built-in defaults fill in anything
unset.
- **Explicit unbinding**: an empty array removes an action binding in
that scope and does not fall through to a fallback binding.
- **Conflict validation**: a resolved keymap rejects duplicate active
bindings inside the same scope so one keypress cannot dispatch two
actions.
## What Changed
- Added `TuiKeymap` config support under `[tui.keymap]`, including typed
contexts/actions, key alias normalization, generated schema coverage,
and user-facing config errors.
- Added `RuntimeKeymap` resolution in `codex-rs/tui/src/keymap.rs`,
including fallback precedence, built-in defaults, explicit unbinding,
and per-context conflict validation.
- Rewired existing TUI handlers to consume resolved keymap actions
instead of directly matching hard-coded keys in each component.
- Updated key hint rendering and footer/pager/list surfaces so displayed
shortcuts follow the resolved keymap.
- Kept onboarding shortcuts fixed in
`codex-rs/tui/src/onboarding/keys.rs` instead of exposing them through
`[tui.keymap]`.
## Validation
The branch includes focused coverage for config parsing, key
normalization, runtime fallback resolution, explicit unbinding,
duplicate-key conflict validation, default keymap consistency,
onboarding startup key behavior, and UI hint snapshots affected by
resolved key bindings.
## Why
Memory startup runs in the background after an eligible turn, but it can
consume Codex backend quota at exactly the wrong time: when the user is
already near a rate-limit boundary. This PR adds a guard so the memory
pipeline backs off when the Codex rate-limit snapshot says the remaining
budget is too low.
## What Changed
- Added `memories.min_rate_limit_remaining_percent` with a default of
`25`, clamped to `0..=100`, and regenerated `core/config.schema.json`.
- Added `codex-rs/memories/write/src/guard.rs`, which fetches Codex
backend rate limits before memory startup and skips phase 1 / phase 2
when the Codex limit is reached or either tracked window is above the
configured usage ceiling.
- Keeps startup best-effort: non-Codex auth or rate-limit fetch/client
failures preserve the existing memory startup behavior.
- Records a `codex.memory.startup` counter with
`status=skipped_rate_limit` when startup is skipped.
- Added config parsing/clamping coverage and guard unit tests.
## Verification
- Added `codex-rs/memories/write/src/guard_tests.rs` for threshold,
primary/secondary window, and reached-limit behavior.
- Added config tests for TOML parsing and clamping.
## Why
Memory startup was tied to thread lifecycle events such as create, load,
and fork. That can run memory work before a thread receives real user
input, and it makes startup cost scale with thread management instead of
actual turns. Moving the trigger to `thread/sendInput` keeps memory
startup aligned with the first real user turn and lets it use the
current thread config at turn time.
The idea is to prevent ghost cost due to pre-warm triggered by the app
Turn-based startup can also make global phase-2 consolidation easier to
request repeatedly, so this adds a success cooldown and tightens the
default startup scan window.
## What Changed
- Start `codex_memories_write::start_memories_startup_task` after a
non-empty `thread/sendInput` turn is submitted, instead of from thread
create/load/fork paths:
d4a6885b78/codex-rs/app-server/src/codex_message_processor.rs (L6477-L6487)
- Expose `CodexThread::config()` so app-server can pass the live config
into memory startup at turn time.
- Add a six-hour successful-run cooldown for global phase-2
consolidation via `SkippedCooldown`:
d4a6885b78/codex-rs/state/src/runtime/memories.rs (L963-L966)
- Reduce memory startup defaults to at most 2 rollouts over 10 days:
d4a6885b78/codex-rs/config/src/types.rs (L31-L34)
## Verification
Updated the memory runtime coverage around phase-2 reclaim behavior,
including `phase2_global_lock_respects_success_cooldown`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Phase 2 still needs to choose the most relevant stage-1 memory outputs
by usage and recency, but exposing that ranking as the rendered
`raw_memories.md` order creates unnecessary large diff. Usage-count or
timestamp changes can reshuffle otherwise unchanged memories, making the
workspace diff noisy and giving the consolidation prompt a misleading
recency signal from file position.
This fix will reduce token consumption
## What Changed
- Keep the existing top-N Phase 2 selection ranking by `usage_count`,
`last_usage`, `source_updated_at`, and `thread_id`.
- Return the selected rows in stable ascending `thread_id` order before
syncing Phase 2 filesystem inputs.
- Update the memory README, raw memories header, and consolidation
prompt so they describe the stable order and tell the prompt to use
metadata and workspace diffs instead of file order as the recency
signal.
- Adjust the memory runtime tests to use deterministic thread IDs and
assert the stable return order separately from the ranked selection
semantics.
## Test Coverage
- Existing memory runtime tests in
`codex-rs/state/src/runtime/memories.rs` now cover the stable returned
ordering for Phase 2 inputs.
---------
Co-authored-by: Codex <noreply@openai.com>
Keep extracting memories out of core and moving the write trigger in the
app-server
This is temporary and it should move at the client level as a follow-up
This makes core fully independant from `codex-memories-write`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
MultiAgentV2 sessions need startup guidance that matches the role of the
thread that is actually being created. Root agents and subagents have
different responsibilities, and forked subagents can inherit parent
rollout history. If the parent hint is carried into the child context,
the child can see stale or conflicting developer guidance before its own
session-specific context is added.
## What changed
- Added `features.multi_agent_v2.root_agent_usage_hint_text` and
`features.multi_agent_v2.subagent_usage_hint_text` config fields,
including schema/config parsing support.
- Injected the matching root or subagent hint into the initial context
as its own developer message when `multi_agent_v2` is enabled.
- Filtered configured MultiAgentV2 usage-hint developer messages out of
forked parent history so a child thread receives fresh guidance for its
own session source/config.
- Added targeted coverage for config parsing, initial-context rendering,
feature-config deserialization, and forked-history filtering.
## Context examples
With this config:
```toml
[features.multi_agent_v2]
enabled = true
root_agent_usage_hint_text = "Root guidance."
subagent_usage_hint_text = "Subagent guidance."
```
A root thread initial context renders the root hint as a standalone
developer message:
```text
[developer]
<existing developer context, when present>
[developer]
Root guidance.
```
A subagent thread initial context renders the subagent hint instead:
```text
[developer]
<existing developer context, when present>
[developer]
Subagent guidance.
```
When a subagent forks parent history, any parent developer message whose
text exactly matches the configured MultiAgentV2 root or subagent hint
is omitted from the forked history before the child receives its fresh
subagent hint.
## Why
`ThreadConfigSnapshot` is used by app-server and thread metadata code as
a stable view of active runtime settings. Keeping both `sandbox_policy`
and `permission_profile` in the snapshot duplicates permission state and
makes it possible for the legacy projection to drift from the canonical
profile.
The legacy `sandbox` value is still needed at app-server compatibility
boundaries, so this PR derives it on demand from the snapshot profile
and cwd instead of storing it.
## What Changed
- Removes `ThreadConfigSnapshot.sandbox_policy`.
- Adds `ThreadConfigSnapshot::sandbox_policy()` as a compatibility
projection from `permission_profile` plus `cwd`.
- Updates app-server response/metadata code and tests to call the
projection only where legacy fields still exist.
- Keeps snapshot construction profile-only so split filesystem rules,
disabled enforcement, and external enforcement remain represented by the
canonical profile.
## Verification
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
dispatch_reclaims_stale_global_lock_and_starts_consolidation --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19775).
* #19900
* #19899
* #19776
* __->__ #19775
## Why
`SessionConfiguredEvent` is the internal event that tells clients what
permissions are active for a session. Emitting both `sandbox_policy` and
`permission_profile` leaves two possible authorities and forces every
consumer to decide which one to honor. At this point in the migration,
the profile is expressive enough to represent managed, disabled, and
external sandbox enforcement, so the internal event can be profile-only.
The wire compatibility concern is older serialized events or rollout
data that only contain `sandbox_policy`; those still need to
deserialize.
## What Changed
- Removes `sandbox_policy` from `SessionConfiguredEvent` and makes
`permission_profile` required.
- Adds custom deserialization so old payloads with only `sandbox_policy`
are upgraded to a cwd-anchored `PermissionProfile`.
- Updates core event emission and TUI session handling to sync
permissions from the profile directly.
- Updates app-server response construction to derive the legacy
`sandbox` response field from the active thread snapshot instead of from
`SessionConfiguredEvent`.
- Updates yolo-mode display logic to treat both
`PermissionProfile::Disabled` and managed unrestricted filesystem plus
enabled network as full-access, while still preserving the distinction
between no sandbox and external sandboxing.
## Verification
- `cargo test -p codex-protocol session_configured_event --lib`
- `cargo test -p codex-protocol serialize_event --lib`
- `cargo test -p codex-exec session_configured --lib`
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox
--lib`
- `cargo test -p codex-tui session_configured --lib`
- `cargo test -p codex-tui
yolo_mode_includes_managed_full_access_profiles --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19774).
* #19900
* #19899
* #19776
* #19775
* __->__ #19774
## Why
Fixes#19475.
`codex exec` can finish successfully and then emit an `ERROR` on stderr:
```text
failed to record rollout items: thread <id> not found
```
That happens because shutdown closes the live thread writer before
emitting `ShutdownComplete`. The terminal event was still using the
normal `send_event_raw` path, so it tried to append rollout items
through a recorder that had already been removed. The answer is correct,
but wrappers that treat stderr as failure can retry completed exec runs.
This looks like a likely recent regression from
[#18882](https://github.com/openai/codex/pull/18882), which routed live
thread writes through `ThreadStore` and added the shutdown-time live
writer close. I have not bisected this, so the PR treats #18882 as the
likely source based on the affected shutdown code path rather than a
proven first-bad commit.
## What Changed
`ShutdownComplete` now bypasses rollout persistence after thread
shutdown and is delivered directly to clients. The shutdown path still
records the protocol event in the rollout trace before delivery,
preserving trace visibility without attempting a post-shutdown
thread-store append.
The change also adds a regression test with the in-memory thread store
to assert that shutdown creates and shuts down the live thread without
appending another item after shutdown.
## Summary
Adds the standard Codex `User-Agent` to shared default headers so the
responses-api WS handshake carries the same client OS and version
context as HTTP requests.
## Testing
- `cargo test -p codex-core
build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
- `cargo test -p codex-core --test all responses_websocket`
## Summary
AgentIdentity auth previously registered the process task lazily behind
a `OnceCell`. That meant the auth object could be constructed before its
runtime task binding was known.
This PR makes AgentIdentity auth load the runtime task at auth load time
and stores the resulting process task id directly on the auth object.
The model-provider call path can then read a concrete task id instead of
handling a missing lazy value.
## Stack
1. [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762) (merged)
2. **This PR:** [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [fix: configure AgentIdentity AuthAPI base
URL](https://github.com/openai/codex/pull/19904)
4. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)
## Important call sites
| Area | Change |
| --- | --- |
| `AgentIdentityAuth::load` | Registers the process task during auth
loading and stores `process_task_id`. |
| `CodexAuth::from_agent_identity_jwt` | Awaits AgentIdentity auth
loading. |
| model-provider auth | Reads a concrete `process_task_id` instead of an
optional lazy value. |
| AgentIdentity auth tests | Mock task registration now covers eager
runtime allocation. |
## Design decisions
AgentIdentity auth now treats task registration as part of constructing
a usable auth object. That matches how callers use the value: once auth
is present, the model-provider path expects the task-scoped assertion
data to be ready.
## Testing
Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
## Summary
- Remove `ghost_snapshot` / `GhostCommit` from the Responses API surface
and generated SDK/schema artifacts.
- Keep legacy config loading compatible, but make undo a no-op that
reports the feature is unavailable.
- Clean up core history, compaction, telemetry, rollout, and tests to
stop carrying ghost snapshot items.
## Testing
- Unit tests passed for `codex-protocol`, `codex-core` targeted undo and
compaction flows, `codex-rollout`, and `codex-app-server-protocol`.
- Regenerated config and app-server schemas plus Python SDK artifacts
and verified they match the checked-in outputs.