## Why
`/goal` is supposed to keep Codex working until the goal is actually
done. The previous continuation logic had two ways to stop early: the
continuation prompt told the model to wait for new input when it felt
blocked, and the runtime suppressed another continuation turn after a
continuation finished without any tool calls.
That made goals stop short even when the agent could still keep making
progress (I received a few reports of this from users). It also relied
on a brittle heuristic that treated "no registry tool calls" as
equivalent to "should stop."
## What changed
- removed the continuation prompt sentence that told the model to stop
and wait for new input when it could not continue productively
- removed the goal runtime suppression heuristic that stopped
auto-continuation after a no-tool continuation turn
- deleted the continuation-activity bookkeeping and left `tool_calls` as
telemetry only
- added focused regressions for the two intended behaviors: completed
no-tool continuation turns still continue, while `request_user_input`
keeps the existing turn open instead of spawning a new continuation
## Why
Apply-patch file changes are now part of the core turn item stream, so
v2 clients can consume the same first-class item lifecycle path used by
other turn items instead of relying on app-server-specific remapping
from legacy patch events.
## What changed
- Added a core `TurnItem::FileChange` carrying apply-patch changes and
completion metadata.
- Updated the apply-patch tool emitter to send `ItemStarted` /
`ItemCompleted` with the new `FileChange` item while preserving legacy
`PatchApplyBegin` / `PatchApplyEnd` fan-out.
- Updated app-server v2 conversion to render the new core item directly
and stopped `event_mapping` from remapping old patch begin/end events
into item notifications.
- Kept thread history reconstruction based on the existing old
apply-patch events for rollout compatibility.
## Verification
- `cargo test -p codex-protocol -p codex-app-server-protocol`
- `cargo test -p codex-core --test all
apply_patch_tool_executes_and_emits_patch_events`
- `cargo test -p codex-app-server bespoke_event_handling`
## Why
For reproducibility. A hand-written `config.toml` is not enough to
recreate what a Codex session actually ran with because layered config,
CLI overrides, defaults, feature aliases, resolved feature config,
prompt setup, and model-catalog/session values can all affect the final
runtime behavior.
This PR adds an effective config lockfile path: one run can export the
resolved session config, and a later run can replay that lockfile and
fail early if the regenerated effective config drifts.
## What Changed
- Add a dedicated `ConfigLockfileToml` wrapper with top-level lockfile
metadata plus the replayable config:
```toml
version = 1
codex_version = "..."
[config]
# effective ConfigToml fields
```
- Keep lockfile metadata out of regular `ConfigToml`; replay loads
`ConfigLockfileToml` and then uses its nested `config` as the
authoritative config layer.
- Add `debug.config_lockfile.export_dir` to write
`<thread_id>.config.lock.toml` when a root session starts.
- Add `debug.config_lockfile.load_path` to replay a saved lockfile and
validate the regenerated session lockfile against it.
- Add `debug.config_lockfile.allow_codex_version_mismatch` to optionally
tolerate Codex binary version drift while still comparing the rest of
the lockfile.
- Add `debug.config_lockfile.save_fields_resolved_from_model_catalog` so
lock creation can either save model-catalog/session-resolved fields or
intentionally leave those fields dynamic.
- Build lockfiles from the effective config plus resolved runtime values
such as model selection, reasoning settings, prompts, service tier, web
search mode, feature states/config, memories config, skill instructions,
and agent limits.
- Materialize feature aliases and custom feature config into the
lockfile so replay compares canonical resolved behavior instead of
user-authored alias shape.
- Strip profile/debug/file-include/environment-specific inputs from
generated lockfiles so they contain replayable values rather than the
inputs that produced those values.
- Surface JSON-RPC server error code/data in app-server client and TUI
bootstrap errors so config-lock replay failures include the actual TOML
diff.
- Regenerate the config schema for the new debug config keys.
## Review Notes
The main flow is split across these files:
- `config/src/config_toml.rs`: lockfile/debug TOML shapes.
- `core/src/config/mod.rs`: loading `debug.config_lockfile.*`, replaying
a lockfile as a config layer, and preserving the expected lockfile for
validation.
- `core/src/session/config_lock.rs`: exporting the current session
lockfile and materializing resolved session/config values.
- `core/src/config_lock.rs`: lockfile parsing, metadata/version checks,
replay comparison, and diff formatting.
## Usage
Export a lockfile from a normal session:
```sh
codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"'
```
Export a lockfile without saving model-catalog/session-resolved fields:
```sh
codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"' \
-c 'debug.config_lockfile.save_fields_resolved_from_model_catalog=false'
```
Replay a saved lockfile in a later session:
```sh
codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"'
```
If replay resolves to a different effective config, startup fails with a
TOML diff.
To tolerate Codex binary version drift during replay:
```sh
codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"' \
-c 'debug.config_lockfile.allow_codex_version_mismatch=true'
```
## Limitations
This does not support custom rules/network policies.
## Verification
- `cargo test -p codex-core config_lock`
- `cargo test -p codex-config`
- `cargo test -p codex-thread-manager-sample`
## Why
Users have shared that the TUI can feel too visually flat because themes
mostly show up in code syntax highlighting. The configurable statusline
is a natural place to make the active theme more visible, while still
letting users keep the existing monotone statusline if they prefer it.
## What Changed
- Added a statusline styling helper that builds the rendered statusline
from `(StatusLineItem, text)` segments, preserving item identity while
keeping the plain text output unchanged.
- Derived foreground accent colors from the active syntax theme by
looking up TextMate scopes through the existing syntax highlighter, with
conservative ANSI fallbacks when a scope does not provide a foreground.
- Tuned theme-derived colors to keep the accents visible without making
the statusline feel overly bright.
- Added `[tui].status_line_use_colors`, defaulting to `true`, plus a
separated `/statusline` toggle so users can enable or disable
theme-derived statusline colors from the setup UI.
- Updated the live statusline and `/statusline` preview to use the same
styled builder, while keeping terminal-title preview text plain.
- Kept statusline separators and active-agent add-ons subdued while
removing blanket dimming from the whole passive statusline.
## Verification
- `cargo test -p codex-tui status_line`
- `cargo test -p codex-tui theme_picker`
- `cargo test -p codex-tui foreground_style_for_scopes`
- `cargo test -p codex-tui`
- `cargo test -p codex-config`
- `cargo test -p codex-core status_line_use_colors`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`
## Visual
<img width="369" height="23" alt="Screenshot 2026-04-30 at 6 16 08 PM"
src="https://github.com/user-attachments/assets/11d03efb-8e4f-4450-8f4d-00a9659ef4cd"
/>
<img width="385" height="23" alt="Screenshot 2026-04-30 at 6 16 02 PM"
src="https://github.com/user-attachments/assets/a3d89f36-bdc1-42e8-8e84-61350e3999e2"
/>
- Build one app-server process ThreadStore from startup config and share
it with ThreadManager and CodexMessageProcessor.
- Remove per-thread/fork store reconstruction so effective thread config
cannot switch the persistence backend.
- Add params to ThreadStore create/resume for specifying thread
metadata, since otherwise the metadata from store creation would be used
(incorrectly).
## Why
Several legacy `EventMsg` variants were still emitted or mapped even
though clients either ignored them or had moved to item/lifecycle
events. `Op::Undo` had also degraded to an unavailable shim, so this
removes that dead task path instead of preserving a command that cannot
do useful work.
`McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are
intentionally kept because useful consumers still depend on them: MCP
startup completion drives readiness behavior, and the begin events let
app-server/core consumers surface in-progress web-search and
image-generation items before the final payload arrives.
## What Changed
- Removed weak legacy event variants and payloads from `codex-protocol`,
including legacy agent deltas, background events, and undo lifecycle
events.
- Kept/restored `EventMsg::McpStartupComplete`,
`EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with
serializer and emission coverage.
- Updated core, rollout, MCP server, app-server thread history,
review/delegate filtering, and tests to rely on the useful replacement
events that remain.
- Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI
slash-command comments.
- Stopped agent job/background progress and compaction retry notices
from emitting `BackgroundEvent` payloads.
## Verification
- `cargo check -p codex-protocol -p codex-app-server-protocol -p
codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-core --test all suite::items`
- `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core
-p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- Earlier coverage on this PR also included `codex-mcp`, `codex-tui`,
core library tests, MCP/plugin/delegate/review/agent job tests, and MCP
startup TUI tests.
## Summary
Fixes a regression introduced in #10941 so that heredocs do not permit
file redirects to be approved by rules, and adds scenario tests to cover
this behavior.
Previously, heredoc command parsing would allow redirects and
environment variables:
```bash
# commands_for_exec_policy() would parse this via parse_shell_lc_single_command_prefix
PATH=/tmp/bad:$PATH cat <<'EOF' > /tmp/bad/hello.txt
hello
EOF
```
This conflicts with the Codex Rules documentation; heredoc parsing logic
should abide by the same strictness of parsing.
## Tests
- [x] Updated unit tests accordingly
- [x] Added scenario tests for these cases
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
On Windows, Codex runs shell commands through a top-level
`powershell.exe -NoProfile -Command ...` wrapper. `execpolicy` was
matching that wrapper instead of the inner command, so prefix rules like
`["git", "push"]` did not fire for PowerShell-wrapped commands even
though the same normalization already happens for `bash -lc` on Unix.
This change makes the Windows shell wrapper transparent to rule matching
while preserving the existing Windows unmatched-command safelist and
dangerous-command heuristics.
## What changed
- add `parse_powershell_command_plain_commands()` in
`shell-command/src/powershell.rs` to unwrap the top-level PowerShell
`-Command` body with `extract_powershell_command()` and parse it with
the existing PowerShell AST parser
- update `core/src/exec_policy.rs` so `commands_for_exec_policy()`
treats top-level PowerShell wrappers like `bash -lc` and evaluates rules
against the parsed inner commands
- carry a small `ExecPolicyCommandOrigin` through unmatched-command
evaluation and expose `is_safe_powershell_words()` /
`is_dangerous_powershell_words()` so Windows safelist and
dangerous-command checks still work after unwrap
- add Windows-focused tests for wrapped PowerShell prompt/allow matches,
wrapper parsing, and unmatched safe/dangerous inner commands, and
re-enable the end-to-end `execpolicy_blocks_shell_invocation` test on
Windows
## Testing
- `cargo test -p codex-shell-command`
## Why
Codex now has configurable TUI keymaps, but the composer still behaves
like a plain text field. Users who prefer modal editing need a way to
keep Vim muscle memory while drafting prompts, and the keymap picker
needs to expose Vim-specific actions if those bindings are configurable
instead of hardcoded.
## What Changed
- Adds composer Vim mode with insert/normal state, common normal-mode
movement and editing commands, `d`/`y` operator-pending flows, and
mode-aware footer and cursor indicators.
- Adds `/vim`, an optional global `toggle_vim_mode` binding, and
`tui.vim_mode_default` so Vim mode can be toggled per session or enabled
as the default composer state.
- Extends runtime and config keymaps with `vim_normal` and
`vim_operator` contexts, exposes those contexts in `/keymap`, refreshes
the config schema, and validates Vim bindings separately.
- Integrates Vim normal mode with existing composer behavior: `/` opens
slash command entry, `!` enters shell mode, `j`/`k` navigate history at
history boundaries, successful submissions reset back to normal mode,
and paste burst handling remains insert-mode only.
- Teaches the TUI render path to apply and restore cursor style so Vim
insert mode can use a bar cursor without leaving the terminal in that
state after exit.
## Validation
- `cargo test -p codex-tui keymap -- --nocapture` on the keymap/Vim
coverage
- `cargo insta pending-snapshots`
## Docs
This introduces user-facing `/vim`, `tui.vim_mode_default`, and Vim
keymap contexts under `tui.keymap`, so the public CLI configuration and
slash-command docs should be updated before the feature ships.
## Why
When an MCP or app tool is configured with approval mode `approve`
(always allow), users expect that decision to be authoritative. In
guardian auto-review mode, ARC could still return `ask-user`, which then
routed the approval question into guardian with the ARC reason as
context. That meant a tool explicitly configured as always allowed still
went through both safety monitors before running.
This change keeps the existing ARC behavior for non-auto-review
sessions, but avoids the ARC-to-guardian sequence when
`approvals_reviewer = auto_review` and the tool approval mode is
`approve`.
## What changed
- Short-circuit MCP tool approval handling when `approval_mode ==
approve` and `approvals_reviewer == auto_review`.
- Updated the MCP approval regression test so the auto-review case
asserts neither ARC nor guardian is called.
- Preserved existing tests that verify ARC can still block always-allow
MCP tools outside guardian auto-review mode.
## Verification
- `cargo test -p codex-core --lib mcp_tool_call`
## Description
Ignore these top-level config keys when loading project-scoped
config.toml files:
```
"openai_base_url",
"chatgpt_base_url",
"model_provider",
"model_providers",
"profile",
"profiles",
"experimental_realtime_ws_base_url",
```
## What changed
- Add a project-local config denylist for credential-routing fields such
as `openai_base_url`, `chatgpt_base_url`, `model_provider`,
`model_providers`, `profile`, `profiles`, and
`experimental_realtime_ws_base_url`.
- Strip those fields from project config layers before they participate
in effective config merging, while leaving safe project-local settings
intact.
- Track ignored project-local keys on config layers and surface a
startup warning telling users to move those settings to user-level
`config.toml` if they intentionally need them.
- Update profile behavior coverage so project-local `profile` /
`profiles` entries are ignored instead of overriding user-level profile
selection.
## Verification
- `cargo test -p codex-config`
- `cargo test -p codex-core
project_layer_ignores_unsupported_config_keys`
- `cargo test -p codex-core project_profiles_are_ignored`
- `cargo test -p codex-core config::config_loader_tests`
- migrate `thread/turns/list` to ThreadStore. Uses ThreadStore for most
data now but merges in the in-memory state from thread manager
- keep v2 `thread/list` pathless-store friendly by converting
`StoredThread` directly to API `Thread`
- add regression coverage for pathless store history/listing
## Why
`item/fileChange/outputDelta` text output was only the tool's summary or
error text and not used by client surfaces.
We keep `item/fileChange/outputDelta` in the app-server protocol as a
deprecated compatibility entry, but the server no longer emits it.
## What changed
- stop the `apply_patch` runtime from emitting `ExecCommandOutputDelta`
events
- simplify `item_event_to_server_notification` so command output deltas
always map to `item/commandExecution/outputDelta`
- remove the app-server bookkeeping that tried to detect whether an
output delta belonged to a file change
- mark `item/fileChange/outputDelta` as a deprecated legacy protocol
entry in the v2 types, schema, and README
- simplify the file-change approval tests so they only wait for
completion instead of expecting output-delta notifications
## Testing
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-thread-manager-sample`
- `cargo test -p codex-app-server-protocol
protocol::event_mapping::tests::exec_command_output_delta_maps_to_command_execution_output_delta
-- --exact`
- `cargo test -p codex-app-server
turn_start_file_change_approval_accept_for_session_persists_v2 --
--exact` *(failed before the test assertions because the wiremock
`/responses` mock received 0 requests in setup)*
## Why
Large MCP tool call outputs can make rollout JSONL files enormous. In
the session that motivated this change, the biggest JSONL records were:
- `event_msg/mcp_tool_call_end`
- `response_item/function_call_output`
both containing the same unbounded MCP payloads - just 3 MCP tool calls
that each were multi-hundred MBs 😱
This PR truncates both of those JSONL records.
## How
#### For `response_item/function_call_output`
Unified exec already bounds tool output before it is injected into
model-facing history, which also keeps the corresponding rollout
`response_item/function_call_output` records small.
MCP should follow the same pattern: truncate the model-facing tool
output at the tool-output boundary, while leaving code-mode/raw hook
consumers alone.
#### For `event_msg/mcp_tool_call_end`
`McpToolCallEnd` also needs its own bounded event copy because it is the
app-server/replay/UI event shape that backs `ThreadItem::McpToolCall`.
Unfortunately this is _not_ downstream of the `ToolOutput` trait.
## Model behavior
Model behavior is actually unchanged as a result of this PR.
Before this PR, MCP output was:
1. Converted to `FunctionCallOutput`.
2. Recorded into in-memory history.
3. Truncated by `ContextManager::record_items()` before later model
turns saw it.
After this branch, MCP output is truncated earlier, in
`McpToolOutput::response_payload()`, using the same helper. Then
`ContextManager::record_items()` sees an already-truncated output and
effectively has little/no additional work to do.
So the model should still see the same kind of truncated function-call
output. The practical difference is where truncation happens: earlier,
before rollout persistence/app-server emission can see the giant
payload.
## Verification
- `cargo test -p codex-core mcp_tool_output`
- `cargo test -p codex-core
mcp_tool_call::tests::truncate_mcp_tool_result_for_event`
- `cargo test -p codex-core
mcp_post_tool_use_payload_uses_model_tool_name_args_and_result`
- `just fmt`
- `just fix -p codex-core`
- `git diff --check`
## Summary
Codex is repurposing `session` to mean a thread group, so the realtime
provider session id should no longer use `session_id` / `sessionId` in
Codex-facing protocol payloads. This PR renames that provider-specific
field to `realtime_session_id` / `realtimeSessionId` and intentionally
breaks clients that still send the old field names.
## What Changed
- Renamed realtime provider session fields in `ConversationStartParams`,
`RealtimeConversationStartedEvent`, and `RealtimeEvent::SessionUpdated`.
- Renamed app-server v2 realtime request and notification fields to
`realtimeSessionId`.
- Removed legacy serde aliases for `session_id` / `sessionId`; clients
must send the new names.
- Propagated the rename through core realtime startup, app-server
adapters, codex-api websocket handling, and TUI realtime state.
- Regenerated app-server protocol schema/TypeScript outputs and updated
app-server README examples.
- Kept upstream Realtime API concepts unchanged: provider `session.id`
parsing and `x-session-id` headers still use the upstream wire names.
## Testing
- CI is running on the latest pushed commit.
- Earlier local verification on this PR:
- `cargo test -p codex-protocol`
- `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-core
realtime_conversation`
- `cargo test -p codex-app-server-protocol`
- `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-app-server
realtime_conversation`
- attempted `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-tui` (local
linker bus error while linking the test binary)
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`multi_agents_v2` is meant to be independently gated from the older
`collab` feature. The tool registry still treated the
collaboration-style agent tools as `collab`-only, so enabling
`multi_agents_v2` without `collab` omitted the v2 agent tools. Review
and guardian sub-sessions also need to keep agent spawning disabled even
when the outer session has `multi_agents_v2` enabled.
## What changed
- Include the collab-backed agent tools when either `multi_agents_v2` or
`collab` is enabled.
- Explicitly disable `multi_agents_v2` for review and guardian review
sub-sessions, matching the existing `spawn_csv` and `collab`
restrictions.
- Add a registry test that enables `multi_agents_v2`, disables `collab`,
and verifies the v2 agent tools are present while legacy `send_input`
and `resume_agent` remain hidden.
## Testing
- Added
`test_build_specs_multi_agent_v2_does_not_require_collab_feature`.
## Why
After `hooks/list` exposes the hook inventory, clients need a way to
persist user hook preferences, make those changes effective in
already-open sessions, and distinguish user-controllable hooks from
managed requirements without adding another bespoke app-server write
API.
## What
- Extends `hooks/list` entries with effective `enabled` state.
- Persists user-level hook state under `hooks.state.<hook-id>` so the
model can grow beyond a single boolean over time.
- Uses the existing `config/batchWrite` path for hook state updates
instead of introducing a dedicated hook write RPC.
- Refreshes live session hook engines after config writes so
already-open threads observe updated enablement without a restart.
## Stack
1. openai/codex#19705
2. openai/codex#19778
3. This PR - openai/codex#19840
4. openai/codex#19882
## Reviewer Notes
The generated schema files account for much of the raw diff. The core
behavior is in:
- `hooks/src/config_rules.rs`, which resolves per-hook user state from
the config layer stack.
- `hooks/src/engine/discovery.rs`, which projects effective enablement
into `hooks/list` from source-derived managedness.
- `config/src/hook_config.rs`, which defines the new `hooks.state`
representation.
- `core/src/session/mod.rs`, which rebuilds live hook state after user
config reloads.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- [x] Move the allowlist out of core crate
- [x] Add Teams, SharePoint, Outlook Email, and Outlook Calendar to the
tool_suggest discoverable plugin allowlist
- [x] Add focused coverage for Microsoft curated plugin discovery
## Testing
- just fmt
- cargo test -p codex-core-plugins
- cargo test -p codex-core
list_tool_suggest_discoverable_plugins_returns_
## Why
Reused Guardian review trunks can still have older child-turn events
queued when a later review starts. The review waiter currently accepts
the first terminal event it sees from the shared child session, so a
stale `TurnComplete` can be attributed to the new review. That produces
impossible analytics combinations such as non-null TTFT with sub-10 ms
completion latency and zero token deltas on `trunk_reused` reviews.
## What changed
- Preserve the child turn id returned by the Guardian review
`Op::UserTurn` submission.
- Restrict Guardian review waiting to events correlated with that
submitted child turn.
- Restrict timeout/abort draining to terminal events for the same child
turn.
- Add regression coverage for stale prior-turn completions, stale
prior-turn errors, and interrupt draining in
`codex-rs/core/src/guardian/review_session.rs`.
## Verification
- `cargo test -p codex-core guardian::review_session::tests::`
- `cargo clippy -p codex-core --tests -- -D warnings`
## Why
We need a way to list the available hooks to expose via the TUI and App
so users can view and manage their hooks
## What
- Adds `hooks/list` for one or more `cwd` values that returns discovered
hook metadata
## Stack
1. openai/codex#19705
2. This PR - openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882
## Review Notes
The generated schema files account for most of the raw diff, these files
have the core change:
- `hooks/src/engine/discovery.rs` builds the inventory entries during
hook discovery while leaving runtime handlers focused on execution.
- `app-server/src/codex_message_processor.rs` wires `hooks/list` into
the app-server flow for each requested `cwd`.
- `app-server-protocol/src/protocol/v2.rs` defines the new v2
request/response payloads exposed on the wire.
### Core Changes
`core/src/plugins/manager.rs` adds `plugins_for_layer_stack(...)` so
`skills/list` and `hooks/list`can resolve plugin state for each
requested `cwd`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- remove the Windows-specific unified-exec environment block from tool
selection
- keep `unified_exec` default-off on Windows unless the feature is
explicitly enabled
- normalize model-provided `shell_type = unified_exec` to
`shell_command` when the feature is disabled
- drop obsolete tests tied to the removed environment gate and keep the
feature-flag regression coverage
## Why
Now that the session/long-lived process backend is implemented for the
Windows sandbox, we don't need to hard disable it anymore. We will be
rolling out slowly using a feature gate.
## Impact
This allows manual Windows opt-in in CLI and app-backed flows while
preserving the existing default-off behavior for Windows users.
---------
Co-authored-by: canvrno-oai <kbond@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Summary:
- Add a checked-in codex-core public API listing generated by
cargo-public-api.
- Add scripts/regen-public-api.sh with an embedded crate list,
auto-install for cargo-public-api 0.51.0, pinned nightly, and --check
mode.
- Add Rust CI jobs on the codex Linux x64 runner pool to verify the
listing stays up to date.
Testing:
- bash -n scripts/regen-public-api.sh
- just regen-public-api --check
- yq '.' .github/workflows/rust-ci.yml
.github/workflows/rust-ci-full.yml
- git diff --check
Plugin MCP servers are loaded from plugin manifests rather than
top-level `[mcp_servers]`, so their tool approval preferences need to be
stored and applied through the owning plugin config. Without this,
choosing "Always allow" for a plugin MCP tool could write a preference
that was not reliably used on later tool calls.
## Summary
- Add plugin-scoped MCP policy config under
`plugins.<plugin>.mcp_servers`, including server enablement, tool
allow/deny lists, server defaults, and per-tool approval modes.
- Overlay plugin MCP policy onto manifest-provided server configs when
plugins are loaded.
- Route persistent "Always allow" writes for plugin MCP tools back to
the owning `plugins.<plugin>.mcp_servers.<server>.tools.<tool>` config
entry.
- Reload user config after persisting an approval and make the plugin
load cache config-aware so stale plugin MCP policy is not reused after
`config.toml` changes.
- Regenerate the config schema and add coverage for plugin MCP policy
loading, approval lookup, persistence, and stale-cache prevention.
## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-core --lib plugin_mcp`
## Why
`x-codex-turn-metadata` is sent as an HTTP/WebSocket header, but Codex
was serializing the metadata JSON with raw UTF-8 string contents. When a
workspace path contains non-ASCII characters, common HTTP stacks can
reject or corrupt that header before the request reaches the provider.
Fixes#17468. Also addresses the duplicate WebSocket report in #19581.
## What changed
- Added `codex_utils_string::to_ascii_json_string`, a shared helper that
serializes JSON normally while escaping non-ASCII string content as
`\uXXXX`.
- Switched turn metadata header serialization, including merged
Responses API client metadata, to use the ASCII-safe JSON helper.
- Added coverage for non-ASCII workspace paths and non-ASCII client
metadata while preserving the same parsed JSON values.
## Verification
- `cargo test -p codex-utils-string`
- `cargo test -p codex-core turn_metadata`
- `just bazel-lock-check`
Summary
- Add `[features.apps_mcp_path_override]` config with a `path` field for
overriding only the built-in apps MCP path.
- Keep existing host/base URL derivation unchanged and append the
configured path after that base.
- Regenerate the config schema with the custom feature-config case.
Test Plan
- Not run for latest revision; only `just fmt` and `just
write-config-schema` were run.
- Earlier revision: `cargo test -p codex-features`
- Earlier revision: `cargo test -p codex-mcp`
## Why
This bug is exposed by Guardian/auto-review approvals. With the managed
network proxy enabled, a blocked network request can be reported back
through the network approval service as an approval denial after the
command has already started. Before this change, the shell and unified
exec runtimes registered those network approval calls, but did not have
a way to observe an async proxy denial as a cancellation/failure signal
for the running process.
The result was confusing: Guardian/auto-review could correctly deny
network access, but the command path could keep running or unregister
the approval without surfacing the denial as the command failure.
## What Changed
- `NetworkApprovalService` now attaches a cancellation token to active
and deferred network approvals.
- Proxy-denial outcomes are recorded only for active registrations,
cancel the owning token, and are consumed when the approval is
finalized.
- The shell runtime combines the normal command timeout with the
network-denial cancellation token.
- Unified exec stores the deferred network approval object, terminates
tracked processes when the proxy denial arrives, and returns the denial
as a process failure while polling or completing the process.
- Tool orchestration passes the active network approval cancellation
token into the sandbox attempt and preserves deferred approval errors
instead of silently unregistering them.
- App-server `command/exec` now handles the combined
timeout-or-cancellation expiration variant used by the runtime.
## Verification
- `cargo test -p codex-core network_approval --lib`
- `cargo clippy -p codex-app-server --all-targets -- -D warnings`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
---------
Co-authored-by: Codex <noreply@openai.com>
- Fetches and caches remote /installed plugin state
- Lets skills/list load skills from remote-installed cached plugins
without requiring a local marketplace entry
- Routes plugin list/startup/install/uninstall changes through async
plugin cache invalidation and MCP refresh
## Summary
- include the live auto-review trunk rollout when `/feedback` uploads
logs
- upload that attachment as
`auto-review-rollout-<parent-thread-id>.jsonl` so it is distinguishable
from the parent rollout
- show the same auto-review attachment name in the TUI consent popup
## Scope
- this only covers the live cached auto-review trunk for the current
parent thread
- it does not add durable historical parent->auto-review lookup
- it does not add persisted rollout support for ephemeral parallel
review forks
## UI
<img width="599" height="185" alt="Screenshot 2026-04-28 at 1 17 18 PM"
src="https://github.com/user-attachments/assets/6a0e79c2-5d21-4702-8a89-f765778bc9e9"
/>
## Validation
- `cargo test -p codex-core
cached_guardian_subagent_exposes_its_rollout_path`
- `cargo test -p codex-feedback`
- `cargo test -p codex-app-server`
- `cargo test -p codex-tui feedback_upload_consent_popup_snapshot`
- `cargo test -p codex-tui
feedback_good_result_consent_popup_includes_connectivity_diagnostics_filename`
## Known unrelated local failures
- `cargo test -p codex-core` currently fails in the pre-existing proxy
env snapshot test
`tools::runtimes::tests::maybe_wrap_shell_lc_with_snapshot_keeps_user_proxy_env_when_proxy_inactive`
- `cargo test -p codex-tui` currently hits pre-existing `status::*`
snapshot drift unrelated to this change
## Follow-Up
- persist parallel auto-review fork sessions so /feedback can include
their rollout history too
- attach each persisted fork as its own clearly named file, for example
auto-review-rollout-<parent-thread-id>-fork <n>.jsonl, instead of
merging multiple Guardian sessions into one attachment
- keep the same live-session-only scope initially; durable historical
parent -> auto-review lookup can remain a separate decision if we later
need feedback from resumed sessions
Summary:
- Add codex-thread-manager-sample, a one-shot binary that starts a
ThreadManager thread, submits a prompt, and prints the final assistant
output.
- Pass ThreadStore into ThreadManager::new and expose
thread_store_from_config for existing callsites.
- Build the sample Config directly with only --model and prompt inputs.
Verification:
- just fmt
- cargo check -p codex-thread-manager-sample -p codex-app-server -p
codex-mcp-server
- git diff --check
Tests: Not run per request.
## Why
`agents.max_depth` is a legacy multi-agent v1 guard. Multi-agent v2 uses
task-path routing and its own session/thread limits, so v2 should not
reject nested `spawn_agent` calls just because the thread-spawn depth
has reached the v1 maximum.
Keeping the v1 depth guard active in v2 prevents deeper task trees even
though the v2 path still needs the depth value only for lineage and
task-path metadata.
## What Changed
- Removed the depth-limit rejection from the multi-agent v2
`spawn_agent` handler while still computing child depth for lineage/path
metadata.
- Made the depth-based disabling of legacy `SpawnCsv`/`Collab` tools
apply only when `Feature::MultiAgentV2` is disabled.
- Added `multi_agent_v2_spawn_agent_ignores_configured_max_depth` to
cover a v2 child spawning another agent when `agent_max_depth = 1`,
while the existing v1 depth-limit tests continue to enforce the legacy
behavior.
## Verification
- `cargo test -p codex-core
multi_agent_v2_spawn_agent_ignores_configured_max_depth -- --nocapture`
- `cargo test -p codex-core depth_limit -- --nocapture`
- `cargo test -p codex-core tools::handlers::multi_agents::tests --
--nocapture`
## Why
The explicit profile path from #20117 is meant for standalone testing,
but it still inherited the
shell cwd and all managed requirements implicitly. The pre-existing
launcher path even called out
that it did not support a separate cwd yet in
[`debug_sandbox.rs`](509453f688/codex-rs/cli/src/debug_sandbox.rs (L174-L179)).
For a standalone command, the useful default is to let the caller choose
the project directory being
tested and to avoid administrator-provided constraints unless the caller
explicitly wants to test
those too.
## What changed
- Add explicit-profile-only `-C/--cd DIR`, and use that cwd for both
profile resolution and command
execution.
- Add explicit-profile-only `--include-managed-config`.
- Make explicit profile mode skip managed requirement sources by
default, including cloud
requirements, MDM requirements, `/etc/codex/requirements.toml`, and the
legacy managed-config
requirements projection.
- Preserve all existing invocations outside the explicit-profile path.
## Stack
1. #20117 `sandbox-ui-profile`
2. #20118 `sandbox-ui-config` --> this PR
Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.
## Tests ran
- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_`
- `cargo test -p codex-core
load_config_layers_can_ignore_managed_requirements`
- `cargo test -p codex-core
load_config_layers_includes_cloud_requirements`
- macOS branch-binary smoke on the rebased top of stack: `-C` changed
execution cwd, explicit
profile mode omitted managed proxy env under `env -i`, and
`--include-managed-config` restored it.
- Linux devbox branch-binary smoke on the rebased top of stack: `-C`
changed execution cwd for
built-in and user-defined explicit profiles.
Messages sent with `followup_task` already arrive at their target
recipient promptly (at message boundaries while sampling, or after the
pending tool call completes) -- having `interrupt` is not worth the
added complexity.
## Why
`codex sandbox` is useful for exercising sandbox behavior directly, but
before this stack the CLI
only picked up permission profiles indirectly from the active config.
The existing debug-sandbox path
already compiled `[permissions]` profiles through normal config loading,
as covered by the existing
profile tests in
[`debug_sandbox.rs`](de2ccf9473/codex-rs/cli/src/debug_sandbox.rs (L715-L760)).
This adds the smallest stable entry point first: an explicit profile
selector that reuses the same
config machinery as normal Codex config, so standalone testing becomes
possible without changing
current no-selector behavior.
## What changed
- Add additive `--permissions-profile NAME` support to `codex sandbox
macos|linux|windows`.
- Resolve built-in and user-defined profile names by feeding
`default_permissions` through the
existing config compilation path instead of inventing a sandbox-only
parser.
- Make an explicit selector win over an ambient active profile's legacy
`sandbox_mode`.
- Keep the existing no-selector behavior unchanged.
## Stack
1. #20117 `sandbox-ui-profile` --> this PR
2. #20118 `sandbox-ui-config`
Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.
## Tests ran
- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_parses_permissions_profile`
- `cargo test -p codex-core
cli_override_takes_precedence_over_profile_sandbox_mode`
- macOS branch-binary smoke on the rebased top of stack: built-in
`:workspace` and user-defined
profiles both executed successfully through `--permissions-profile`.
- Linux devbox branch-binary smoke on the rebased top of stack: built-in
`:workspace` and
user-defined profiles both executed successfully through
`--permissions-profile`.
## Summary
- Change `EnvironmentProvider` to return concrete `Environment`
instances instead of `EnvironmentConfigurations`.
- Make `DefaultEnvironmentProvider` provide the provider-visible `local`
environment plus optional `remote` environment from
`CODEX_EXEC_SERVER_URL`.
- Keep `EnvironmentManager` as the concrete cache while exposing its own
explicit local environment for `local_environment()` fallback paths.
## Validation
- `just fmt`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`PermissionProfile` is the canonical runtime permission model in the
Rust workspace, but the Linux sandbox helper still accepted a legacy
`SandboxPolicy` plus separate filesystem and network policy flags. That
translation layer made the helper interface harder to reason about and
left `linux-sandbox`-specific callers and tests coupled to the legacy
policy representation.
This change moves the helper onto `PermissionProfile` directly so the
Linux sandbox plumbing matches the rest of the permission stack.
## What changed
- changed `codex-linux-sandbox` to accept `--permission-profile` and
derive the runtime filesystem and network policies internally
- updated the in-process seccomp and legacy Landlock path in
`codex-rs/linux-sandbox` to operate on `PermissionProfile`
- updated Linux sandbox argv construction in `codex-rs/sandboxing`,
`codex-rs/core`, and the CLI debug sandbox path to pass the canonical
profile instead of serializing compatibility policy projections
- simplified the Linux sandbox tests to build the exact permission
profile under test, including the managed-proxy path and
direct-runtime-enforcement carveout coverage
- removed helper-local `SandboxPolicy` usage from `bwrap` tests where
`FileSystemSandboxPolicy` is already the value being exercised
## Testing
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-linux-sandbox` (on this macOS host, the crate
compiled cleanly and its Linux-only tests were cfg-gated)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-cli --no-run`
## Why
Unsupported features must fail closed and Codex must not expose
OpenAI-hosted fallback paths when the active provider cannot support
them. In practice, Bedrock should not surface app connectors, MCP
servers, tool search/suggestions, image generation, web search, or JS
REPL until those paths are explicitly supported for that provider.
This PR moves that decision into provider-owned capability metadata
instead of scattering Bedrock-specific checks across callers.
## What changed
- Adds `ProviderCapabilities` to `codex-model-provider`, with default
support for existing providers and a Bedrock override that disables
unsupported launch surfaces.
- Adds `ToolCapabilityBounds` to `codex-tools` so provider capability
limits can clamp otherwise-enabled tool config.
- Applies capability bounds when building session and review-thread tool
config.
- Routes MCP/app connector configuration through
`McpManager::mcp_config`, which filters configured MCP servers and app
connectors based on the active provider.
- Updates app-server MCP list/read paths to use the filtered MCP config.
- Adds coverage for default provider capabilities, Bedrock disabled
capabilities, and optional tool-surface clamping.
## Testing
built locally and verified that bedrock responses api now return without
errors calling unsupported tools.
## Summary
- Add `disable_tool_suggest` to app and plugin config, schema, and
TypeScript output
- Exclude disabled connectors and plugins from tool suggestion discovery
- Persist "never show again" tool-suggestion choices back into
`config.toml`
- Update config docs and add coverage for connector and plugin
suppression
## Testing
- Added and updated unit tests for config persistence and tool-suggest
filtering
- Not run (not requested)
## Why
Plugins can bundle lifecycle hooks, but Codex previously only discovered
hooks from user, project, and managed config layers. This adds the
plugin discovery and runtime plumbing needed for plugin-bundled hooks
while keeping execution behind the `plugin_hooks` feature flag.
## What
- Discovers plugin hook sources from each plugin's default
`hooks/hooks.json`.
- Supports `plugin.json` manifest `hooks` entries as either relative
paths or inline hook objects.
- Plumbs discovered plugin hook sources through plugin loading into the
hook runtime when `plugin_hooks` is enabled.
- Marks plugin-originated hook runs as `HookSource::Plugin`.
- Injects `PLUGIN_ROOT` and `CLAUDE_PLUGIN_ROOT` into plugin hook
command environments.
- Updates generated schemas and hook source metadata for the plugin hook
source.
## Stack
1. This PR - openai/codex#19705
2. openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882
## Reviewer Notes
- Core logic is in `codex-rs/core-plugins/src/loader.rs` and
`codex-rs/hooks/src/engine/discovery.rs`
- Moved existing / adding new tests to
`codex-rs/core-plugins/src/loader_tests.rs` hence the large diff there
- Otherwise mostly plumbing and minor schema updates
### Core Changes
The `codex-rs/core` changes are limited to wiring plugin hook support
into existing core flows:
- `core/src/session/session.rs` conditionally pulls effective plugin
hook sources and plugin hook load warnings from `PluginsManager` when
`plugin_hooks` is enabled, then passes them into `HooksConfig`.
- `core/src/hook_runtime.rs` adds the `plugin` metric tag for
`HookSource::Plugin`.
- `core/config.schema.json` picks up the new `plugin_hooks` feature
flag, and `core/src/plugins/manager_tests.rs` updates fixtures for the
added plugin hook fields.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Rollout traces need an identifier that can be used to correlate a Codex
inference with upstream Responses API, proxy, and engine logs. The
reduced trace model already exposed `upstream_request_id`, but it was
being populated from the Responses API `response.id`. That value is
useful for `previous_response_id` chaining, but it is not the transport
request id that upstream systems key on.
This PR separates those concepts so trace consumers can reliably answer
both questions:
- which Responses API response did this inference produce?
- which upstream request handled it?
## Structure
The change keeps the upstream request id at the same lifecycle level as
the provider stream:
- `codex-api` captures the `x-request-id` HTTP response header when the
SSE stream is created and exposes it on `ResponseStream`. Fixture and
websocket streams set the field to `None` because they do not have that
HTTP response header.
- `codex-core` carries that stream-level id into `InferenceTraceAttempt`
when recording terminal stream outcomes. Completed, failed, cancelled,
dropped-stream, and pre-response error paths all record the id when it
is available.
- `rollout-trace` now records both identifiers in raw terminal inference
events and response payloads: `response_id` for the Responses API
`response.id`, and `upstream_request_id` for `x-request-id`.
- The reducer stores both fields on `InferenceCall`. It also uses
`response_id` for `previous_response_id` conversation linking, which
removes the old accidental dependency on the misnamed
`upstream_request_id` field.
- Terminal inference reduction now consumes the full terminal payload
(`InferenceCompleted`, `InferenceFailed`, or `InferenceCancelled`) in
one place. That keeps status, partial payloads, response ids, and
upstream request ids consistent across success, failure, cancellation,
and late stream-mapper events.
## Why This Shape
`x-request-id` is a property of the HTTP/provider response envelope, not
an SSE event. Capturing it once in `codex-api` and plumbing it through
terminal trace recording avoids trying to infer the value from stream
contents, and it preserves the id even when the stream fails or is
cancelled after only partial output.
Keeping `response_id` separate from `upstream_request_id` also makes the
reduced trace model less surprising: `response_id` remains the
conversation-continuation id, while `upstream_request_id` is the
operational correlation id for upstream debugging.
## Validation
The PR updates trace and reducer coverage for:
- reading `x-request-id` from SSE response headers;
- storing the true upstream request id on completed inference calls;
- preserving upstream request ids for cancelled and late-cancelled
inference streams;
- keeping `previous_response_id` reconstruction tied to `response_id`
rather than transport request ids.
## Why
MultiAgentV2 `wait_agent` currently clamps short waits to a fixed 10
second minimum. That default is still useful for preventing tight
polling loops, but it is too rigid for environments that need faster
mailbox wake-up checks or a larger minimum to discourage frequent
polling.
This PR makes the minimum wait timeout configurable from the existing
MultiAgentV2 feature config section, so operators can tune the behavior
without changing the legacy multi-agent tool surface.
## What Changed
- Added `features.multi_agent_v2.min_wait_timeout_ms`.
- Defaulted the new setting to the existing 10 second floor.
- Validated the configured value as `1..=3600000`, matching the existing
one hour maximum wait bound.
- Applied the configured minimum to MultiAgentV2 `wait_agent` runtime
clamping.
- Plumbed the configured minimum into the `wait_agent` tool schema,
including the effective default when the minimum is above the normal 30
second default.
- Regenerated `core/config.schema.json`.
## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib multi_agent_v2`
- `just fix -p codex-core`
## Why
Slow Codex turns are easier to debug when token usage is visible in the
trace itself, without joining against separate analytics. This adds
token usage to existing turn-handling spans for regular user turns only.
[Example
turn](https://openai.datadoghq.com/apm/trace/9d353efa2cb5de1f4c5b93dc33c3df04?colorBy=service&graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=3555541504891512675&spanViewType=metadata&traceQuery=)
<img width="1447" height="967" alt="Screenshot 2026-04-24 at 3 03 07 PM"
src="https://github.com/user-attachments/assets/ab7bb187-e7fc-41f0-a366-6c44610b2b2c"
/>
## What Changed
Added response-level token fields on completed handle_responses spans:
gen_ai.usage.input_tokens
gen_ai.usage.cache_read.input_tokens
gen_ai.usage.output_tokens
codex.usage.reasoning_output_tokens
codex.usage.total_tokens
Added aggregate token fields on regular turn spans:
codex.turn.token_usage.*
Added an explicit regular-turn opt-in via
SessionTask::records_turn_token_usage_on_span() so this is not coupled
to span-name strings.
## Testing
- `cargo test -p codex-otel`
- `cargo test -p codex-core
turn_and_completed_response_spans_record_token_usage`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-otel`
- Manual local Electron/app-server smoke test: regular user turn emits
the new span fields
Known status: `cargo test -p codex-core` was attempted and failed in
unrelated existing areas: config approvals, request-permissions,
git-info ordering, and subagent metadata persistence.
## Why
The migration away from `SandboxPolicy` needs new configs to start from
permissions profiles instead of deriving profiles from legacy sandbox
modes. Existing users can have empty `config.toml` files, and we should
not rewrite user-owned config files that may live in shared
repositories.
This PR introduces built-in profile names so an empty config can resolve
to a canonical `PermissionProfile`, while explicit named `[permissions]`
profiles still behave predictably.
## What changed
- Adds built-in `default_permissions` profile names:
- `:read-only` maps to `PermissionProfile::read_only()`.
- `:workspace` maps to the workspace-write profile, including
project-root metadata carveouts.
- `:danger-no-sandbox` maps to `PermissionProfile::Disabled`, preserving
the distinction between no sandbox and a broad managed sandbox.
- Reserves the `:` prefix for built-in profiles so user-defined
`[permissions]` profiles cannot collide with future built-ins.
- Allows `default_permissions` to reference a built-in profile without
requiring a `[permissions]` table.
- Makes an otherwise empty config choose a built-in profile by
trust/platform context: trusted or untrusted project roots use
`:workspace` when the platform supports that sandbox, while roots
without a trust decision use `:read-only`.
- Keeps legacy `sandbox_mode` configs on the legacy path, and still
rejects user-defined `[permissions]` profiles that omit
`default_permissions` so we do not silently guess among custom profiles.
- Preserves compatibility behavior for implicit defaults: bare
`network.enabled = true` allows runtime network without starting the
managed proxy, explicit profile proxy policy still starts the proxy, and
implicit workspace/add-dir roots keep legacy metadata carveouts.
## Verification
- `cargo test -p codex-core builtin --lib`
- `cargo test -p codex-core profile_network_proxy_config`
- `cargo test -p codex-core
implicit_builtin_workspace_profile_preserves_add_dir_metadata_carveouts`
- `cargo test -p codex-core
permissions_profiles_network_enabled_allows_runtime_network_without_proxy`
- `cargo test -p codex-core
permissions_profiles_proxy_policy_starts_managed_network_proxy`
## Documentation
Public Codex config docs should mention these built-in names when the
`[permissions]` config format is ready to document as stable.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19900).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* #20008
* __->__ #19900
## Why
Network access approval prompts were showing the generic retry reason,
which made auto-review focus on the blocked connection instead of the
command that caused it. This makes network approvals easier to assess by
telling the reviewer to evaluate whether the triggering command was
authorised by the user and within policy, and to treat the network call
as acceptable when it is a reasonable consequence of that command.
## What changed
- Split guardian approval request prompt rendering so `NetworkAccess`
has a dedicated branch.
- For network requests, show `Network approval context` and `Network
access JSON` instead of `Retry reason` / `Planned action JSON`.
- Added regression coverage for the network approval prompt wording and
for omitting retry reason in this case.
## Verification
- `cargo test -p codex-core
guardian::tests::build_guardian_prompt_items_explains_network_access_review_scope`
## Why
- Without change: MCP tool call spans include request-side details such
as server, tool, call ID, connector, session, and turn.
- Issue: Some useful telemetry is only known by the MCP server after it
handles the tool call, such as target identity or whether the call
triggered a user-facing flow.
## What Changed
- With change: Codex reads allowlisted telemetry from
`_meta["codex/telemetry"]["span"]` and records it on the
`mcp.tools.call` span.
- Adds span fields for `codex.mcp.target.id` and
`codex.mcp.user_flow.triggered`, with strict type checks and bounded
target ID length.
## Verification
`codex-rs/core/src/mcp_tool_call_tests.rs`