## Why
This is the first mechanical slice of moving tool spec ownership toward
the handlers. `codex-tools` should keep shared primitives and conversion
helpers, while builtin tool specs and registration planning live in
`codex-core` with the handlers that own those tools.
Keeping this PR to relocation and import updates isolates the copy/move
review from the later logic change that wires specs through registered
handlers.
## What changed
- Moved builtin tool spec constructors from `codex-rs/tools/src` into
`codex-rs/core/src/tools/handlers/*_spec.rs` or nearby core tool
modules.
- Moved the registry planning code into
`codex-rs/core/src/tools/spec_plan.rs` and its associated types/tests
into core.
- Kept shared primitives in `codex-tools`, including `ToolSpec`,
schema/types, discovery/config primitives, dynamic/MCP conversion
helpers, and code-mode collection helpers.
- Updated handlers that referenced moved argument types or tool-name
constants to use the core spec modules.
- Moved spec tests next to the moved spec modules.
## Verification
- `cargo check -p codex-tools`
- `cargo check -p codex-core`
- `cargo test -p codex-tools`
- `cargo test -p codex-core _spec::tests`
- `cargo test -p codex-core tools::spec_plan::tests`
- `just fix -p codex-tools`
- `just fix -p codex-core`
Note: I also tried the broader `cargo test -p codex-core tools::`; it
reached the moved spec-plan/spec tests successfully, then aborted with a
stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`,
which is outside this spec relocation.
## Why
Tool registration used to bind a tool name to a handler externally,
which left ownership split between the registry plan and the handler
implementation. Some built-in handlers also multiplexed multiple in-core
tools by switching on the invoked tool name internally.
This moves the registry identity onto the handler itself and makes
built-in multi-tool areas use separate concrete handlers, so each
registered handler instance owns exactly one tool name and one dispatch
path.
## What Changed
- Added `ToolHandler::tool_name()` and changed
`ToolRegistryBuilder::register_handler` to derive the registry key from
the handler.
- Split built-in multiplexed handlers into concrete per-tool handlers
for unified exec, shell/local shell/container exec, MCP resources, goal
tools, and agent job tools.
- Kept name-carrying handler instances only where the runtime target is
inherently external or dynamic, such as MCP tools, dynamic tools, and
unavailable placeholders.
- Updated `ToolHandlerKind` and registry-plan construction so plan
entries map directly to concrete handler registrations.
## Verification
- `cargo test -p codex-tools tool_registry_plan`
- `cargo test -p codex-core --lib tools::registry_tests`
- `just fix -p codex-tools`
- `just fix -p codex-core`
## Summary
Adds the debug CLI entry point for reducing recorded rollout traces.
This gives developers a direct way to inspect whether the emitted trace
stream reduces into the expected conversation/runtime model.
## Stack
This is PR 5/5 in the rollout trace stack.
- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command
## Review Notes
This PR is intentionally last: it depends on the trace crate, core
recorder, runtime/tool events, and session/agent edge data all existing.
The command should remain a debug/developer tool and avoid adding new
runtime behavior.
The useful review question is whether the CLI exposes the reducer in the
smallest practical way for local inspection without turning the debug
command into a supported user-facing workflow.
## Summary
Extends rollout tracing across tool dispatch and code-mode runtime
boundaries. This records canonical tool-call lifecycle events and links
code-mode execution/wait operations back to the model-visible calls that
caused them.
## Stack
This is PR 3/5 in the rollout trace stack.
- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command
## Review Notes
This PR is about attribution. Reviewers should focus on whether direct
tool calls, code-mode-originated tool calls, waits, outputs, and
cancellation boundaries are recorded with enough source information for
deterministic reduction without coupling the reducer to live runtime
internals.
The stack remains valid after this layer: tool and code-mode traces
reduce through the existing crate model, while the broader session and
multi-agent relationships are added in the next PR.
## Why
This PR prepares the stack to enable Clippy await-holding lints that
were left disabled in #18178. The mechanical lock-scope cleanup is
handled separately; this PR is the documentation/configuration layer for
the remaining await-across-guard sites.
Without explicit annotations, reviewers and future maintainers cannot
tell whether an await-holding warning is a real concurrency smell or an
intentional serialization boundary.
## What changed
- Configures `clippy.toml` so `await_holding_invalid_type` also covers
`tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`.
- Adds targeted `#[expect(clippy::await_holding_invalid_type, reason =
...)]` annotations for intentional async guard lifetimes.
- Documents the main categories of intentional cases: active-turn state
transitions that must remain atomic, session-owned MCP manager accesses,
remote-control websocket serialization, JS REPL kernel/process
serialization, OAuth persistence, external bearer token refresh
serialization, and tests that intentionally serialize shared global or
session-owned state.
- For external bearer token refresh, documents the existing
serialization boundary: holding `cached_token` across the provider
command prevents concurrent cache misses from starting duplicate refresh
commands, and the current behavior is small enough that an explicit
expectation is easier to maintain than adding another synchronization
primitive.
## Verification
- `cargo clippy -p codex-login --all-targets`
- `cargo clippy -p codex-connectors --all-targets`
- `cargo clippy -p codex-core --all-targets`
- The follow-up PR #18698 enables `await_holding_invalid_type` and
`await_holding_lock` as workspace `deny` lints, so any undocumented
remaining offender will fail Clippy.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423).
* #18698
* __->__ #18423
## Summary
- honor `_meta["codex/imageDetail"] == "original"` on MCP image content
and map it to `detail: "original"` where supported
- strip that detail back out when the active model does not support
original-detail image inputs
- update code-mode `image(...)` to accept individual MCP image blocks
- teach `js_repl` / `codex.emitImage(...)` to preserve the same hint
from raw MCP image outputs
- document the new `_meta` contract and add generic RMCP-backed coverage
across protocol, core, code-mode, and js_repl paths
stacked on #17402.
MCP tools returned by `tool_search` (deferred tools) get registered in
our `ToolRegistry` with a different format than directly available
tools. this leads to two different ways of accessing MCP tools from our
tool catalog, only one of which works for each. fix this by registering
all MCP tools with the namespace format, since this info is already
available.
also, direct MCP tools are registered to responsesapi without a
namespace, while deferred MCP tools have a namespace. this means we can
receive MCP `FunctionCall`s in both formats from namespaces. fix this by
always registering MCP tools with namespace, regardless of deferral
status.
make code mode track `ToolName` provenance of tools so it can map the
literal JS function name string to the correct `ToolName` for
invocation, rather than supporting both in core.
this lets us unify to a single canonical `ToolName` representation for
each MCP tool and force everywhere to use that one, without supporting
fallbacks.
## Why
For more advanced MCP usage, we want the model to be able to emit
parallel MCP tool calls and have Codex execute eligible ones
concurrently, instead of forcing all MCP calls through the serial block.
The main design choice was where to thread the config. I made this
server-level because parallel safety depends on the MCP server
implementation. Codex reads the flag from `mcp_servers`, threads the
opted-in server names into `ToolRouter`, and checks the parsed
`ToolPayload::Mcp { server, .. }` at execution time. That avoids relying
on model-visible tool names, which can be incomplete in
deferred/search-tool paths or ambiguous for similarly named
servers/tools.
## What was added
Added `supports_parallel_tool_calls` for MCP servers.
Before:
```toml
[mcp_servers.docs]
command = "docs-server"
```
After:
```toml
[mcp_servers.docs]
command = "docs-server"
supports_parallel_tool_calls = true
```
MCP calls remain serial by default. Only tools from opted-in servers are
eligible to run in parallel. Docs also now warn to enable this only when
the server’s tools are safe to run concurrently, especially around
shared state or read/write races.
## Testing
Tested with a local stdio MCP server exposing real delay tools. The
model/Responses side was mocked only to deterministically emit two MCP
calls in the same turn.
Each test called `query_with_delay` and `query_with_delay_2` with `{
"seconds": 25 }`.
| Build/config | Observed | Wall time |
| --- | --- | --- |
| main with flag enabled | serial | `58.79s` |
| PR with flag enabled | parallel | `31.73s` |
| PR without flag | serial | `56.70s` |
PR with flag enabled showed both tools start before either completed;
main and PR-without-flag completed the first delay before starting the
second.
Also added an integration test.
Additional checks:
- `cargo test -p codex-tools` passed
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server` passed
- `git diff --check` passed
- [x] Expand tool search to custom MCPs.
- [x] Rename several variables/fields to be more generic.
Updated tool & server name lifecycles:
**Raw Identity**
ToolInfo.server_name is raw MCP server name.
ToolInfo.tool.name is raw MCP tool name.
MCP calls route back to raw via parse_tool_name() returning
(tool.server_name, tool.tool.name).
mcpServerStatus/list now groups by raw server and keys tools by
Tool.name: mod.rs:599
App-server just forwards that grouped raw snapshot:
codex_message_processor.rs:5245
**Callable Names**
On list-tools, we create provisional callable_namespace / callable_name:
mcp_connection_manager.rs:1556
For non-app MCP, provisional callable name starts as raw tool name.
For codex-apps, provisional callable name is sanitized and strips
connector name/id prefix; namespace includes connector name.
Then qualify_tools() sanitizes callable namespace + name to ASCII alnum
/ _ only: mcp_tool_names.rs:128
Note: this is stricter than Responses API. Hyphen is currently replaced
with _ for code-mode compatibility.
**Collision Handling**
We do initially collapse example-server and example_server to the same
base.
Then qualify_tools() detects distinct raw namespace identities behind
the same sanitized namespace and appends a hash to the callable
namespace: mcp_tool_names.rs:137
Same idea for tool-name collisions: hash suffix goes on callable tool
name.
Final list_all_tools() map key is callable_namespace + callable_name:
mcp_connection_manager.rs:769
**Direct Model Tools**
Direct MCP tool declarations use the full qualified sanitized key as the
Responses function name.
The raw rmcp Tool is converted but renamed for model exposure.
**Tool Search / Deferred**
Tool search result namespace = final ToolInfo.callable_namespace:
tool_search.rs:85
Tool search result nested name = final ToolInfo.callable_name:
tool_search.rs:86
Deferred tool handler is registered as "{namespace}:{name}":
tool_registry_plan.rs:248
When a function call comes back, core recombines namespace + name, looks
up the full qualified key, and gets the raw server/tool for MCP
execution: codex.rs:4353
**Separate Legacy Snapshot**
collect_mcp_snapshot_from_manager_with_detail() still returns a map
keyed by qualified callable name.
mcpServerStatus/list no longer uses that; it uses
McpServerStatusSnapshot, which is raw-inventory shaped.
## Why
This is another small step in the `codex-core` -> `codex-tools`
migration described in `AGENTS.md`.
`core/src/tools/spec.rs` and `core/src/tools/code_mode/mod.rs` were both
hand-rolling the same pure transformation: convert visible `ToolSpec`s
into code-mode nested tool definitions, then sort and deduplicate by
tool name. That logic does not depend on core runtime state or handlers,
so keeping it in `codex-core` makes `spec.rs` harder to peel out later
than it needs to be.
## What Changed
- Add `collect_code_mode_tool_definitions()` to
`codex-rs/tools/src/code_mode.rs`.
- Reuse that helper from `codex-rs/core/src/tools/spec.rs` when
assembling the `exec` tool description.
- Reuse the same helper from `codex-rs/core/src/tools/code_mode/mod.rs`
when exposing nested tool metadata to the code-mode runtime.
This is intended to be a straight refactor with no behavior change and
no new test surface.
## Verification
- `cargo test -p codex-tools`
- `cargo test -p codex-core tools::spec::tests`
- `cargo test -p codex-core code_mode_only_`
## Why
`codex-rs/core/src/client_common.rs` still had a `tools` re-export
module that forwarded `codex_tools` types back into `codex-core`. After
the earlier extraction work in #16379, #16471, #16477, and #16481, that
extra layer no longer adds value.
Removing it keeps dependencies explicit: the `codex-core` modules that
actually use `ToolSpec` and related types now depend on `codex_tools`
directly instead of reaching through `client_common`.
## What Changed
- removed the `client_common::tools` re-export module from
`core/src/client_common.rs`
- updated the remaining `codex-core` consumers to import `codex_tools`
directly
- adjusted the affected test code to reference
`codex_tools::ResponsesApiTool` directly as well
This is a mechanical cleanup only. It does not change tool behavior or
runtime logic.
## Testing
- `cargo test -p codex-core client_common::tests`
- `cargo test -p codex-core tools::router::tests`
- `cargo test -p codex-core tools::context::tests`
- `cargo test -p codex-core tools::spec::tests`
## Why
The longer-term `codex-tools` migration is to move pure tool-definition
and tool-spec plumbing out of `codex-core` while leaving session- and
runtime-coupled orchestration behind.
The remaining code-mode adapter layer in
`core/src/tools/code_mode_description.rs` was a good next extraction
seam because it only transformed `ToolSpec` values for code mode and
already delegated the low-level description rendering to
`codex-code-mode`.
## What Changed
- added `codex-rs/tools/src/code_mode.rs` with
`augment_tool_spec_for_code_mode()` and
`tool_spec_to_code_mode_tool_definition()`
- added focused unit coverage in `codex-rs/tools/src/code_mode_tests.rs`
- rewired `core/src/tools/spec.rs` and `core/src/tools/code_mode/mod.rs`
to use the extracted adapters from `codex-tools`
- removed the old `core/src/tools/code_mode_description.rs` shim and its
test file from `codex-core`
- added the `codex-code-mode` dependency to `codex-tools`, updated
`Cargo.lock`, and refreshed the `codex-tools` README to reflect the
expanded boundary
## Test Plan
- `cargo test -p codex-tools`
- `CARGO_TARGET_DIR=/tmp/codex-core-code-mode-adapters cargo test -p
codex-core --lib tools::spec::`
- `CARGO_TARGET_DIR=/tmp/codex-core-code-mode-adapters cargo test -p
codex-core --lib tools::code_mode::`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just argument-comment-lint`
## References
- #15923
- #15928
- #15944
- #15953
- #16031
- #16047
- #16129
- move the shared byte-based middle truncation logic from `core` into
`codex-utils-string`
- keep token-specific truncation in `codex-core` so rollout can reuse
the shared helper in the next stacked PR
---------
Co-authored-by: Codex <noreply@openai.com>
Moves Code Mode to a new crate with no dependencies on codex. This
create encodes the code mode semantics that we want for lifetime,
mounting, tool calling.
The model-facing surface is mostly unchanged. `exec` still runs raw
JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
are still available through `tools.*`, and helpers like `text`, `image`,
`store`, `load`, `notify`, `yield_control`, and `exit` still exist.
The major change is underneath that surface:
- Old code mode was an external Node runtime.
- New code mode is an in-process V8 runtime embedded directly in Rust.
- Old code mode managed cells inside a long-lived Node runner process.
- New code mode manages cells in Rust, with one V8 runtime thread per
active `exec`.
- Old code mode used JSON protocol messages over child stdin/stdout plus
Node worker-thread messages.
- New code mode uses Rust channels and direct V8 callbacks/events.
This PR also fixes the two migration regressions that fell out of that
substrate change:
- `wait { terminate: true }` now waits for the V8 runtime to actually
stop before reporting termination.
- synchronous top-level `exit()` now succeeds again instead of surfacing
as a script error.
---
- `core/src/tools/code_mode/*` is now mostly an adapter layer for the
public `exec` / `wait` tools.
- `code-mode/src/service.rs` owns cell sessions and async control flow
in Rust.
- `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
JavaScript execution.
- each `exec` spawns a dedicated runtime thread plus a Rust
session-control task.
- helper globals are installed directly into the V8 context instead of
being injected through a source prelude.
- helper modules like `tools.js` and `@openai/code_mode` are synthesized
through V8 module resolution callbacks in Rust.
---
Also added a benchmark for showing the speed of init and use of a code
mode env:
```
$ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
scenario tools samples warmups iters mean/exec p95/exec rssΔ p50 rssΔ max
cold_exec 0 30 0 1 1.13ms 1.20ms 8.05MiB 8.06MiB
warm_exec 0 30 1 25 473.43us 512.49us 912.00KiB 1.33MiB
cold_exec 32 30 0 1 1.03ms 1.15ms 8.08MiB 8.11MiB
warm_exec 32 30 1 25 509.73us 545.76us 960.00KiB 1.30MiB
cold_exec 128 30 0 1 1.14ms 1.19ms 8.30MiB 8.34MiB
warm_exec 128 30 1 25 575.08us 591.03us 736.00KiB 864.00KiB
memory uses a fresh-process max RSS delta for each scenario
```
---------
Co-authored-by: Codex <noreply@openai.com>
- Split the feature system into a new `codex-features` crate.
- Cut `codex-core` and workspace consumers over to the new config and
warning APIs.
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
Cleanup image semantics in code mode.
`view_image` now returns `{image_url:string, details?: string}`
`image()` now allows both string parameter and `{image_url:string,
details?: string}`
Summary
- document that code mode only exposes `exec` and the renamed `wait`
tool
- update code mode tool spec and descriptions to match the new tool name
- rename tests and helper references from `exec_wait` to `wait`
Testing
- Not run (not requested)
## Why
Once the repo-local lint exists, `codex-rs` needs to follow the
checked-in convention and CI needs to keep it from drifting. This commit
applies the fallback `/*param*/` style consistently across existing
positional literal call sites without changing those APIs.
The longer-term preference is still to avoid APIs that require comments
by choosing clearer parameter types and call shapes. This PR is
intentionally the mechanical follow-through for the places where the
existing signatures stay in place.
After rebasing onto newer `main`, the rollout also had to cover newly
introduced `tui_app_server` call sites. That made it clear the first cut
of the CI job was too expensive for the common path: it was spending
almost as much time installing `cargo-dylint` and re-testing the lint
crate as a representative test job spends running product tests. The CI
update keeps the full workspace enforcement but trims that extra
overhead from ordinary `codex-rs` PRs.
## What changed
- keep a dedicated `argument_comment_lint` job in `rust-ci`
- mechanically annotate remaining opaque positional literals across
`codex-rs` with exact `/*param*/` comments, including the rebased
`tui_app_server` call sites that now fall under the lint
- keep the checked-in style aligned with the lint policy by using
`/*param*/` and leaving string and char literals uncommented
- cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
registry/git metadata in the lint job
- split changed-path detection so the lint crate's own `cargo test` step
runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
- continue to run the repo wrapper over the `codex-rs` workspace, so
product-code enforcement is unchanged
Most of the code changes in this commit are intentionally mechanical
comment rewrites or insertions driven by the lint itself.
## Verification
- `./tools/argument-comment-lint/run.sh --workspace`
- `cargo test -p codex-tui-app-server -p codex-tui`
- parsed `.github/workflows/rust-ci.yml` locally with PyYAML
---
* -> #14652
* #14651
- **Summary**
- expose `exit` through the code mode bridge and module so scripts can
stop mid-flight
- surface the helper in the description documentation
- add a regression test ensuring `exit()` terminates execution cleanly
- **Testing**
- Not run (not requested)
Summary
- add the code_mode_only feature flag/config schema and wire its
dependency on code_mode
- update code mode tool descriptions to list nested tools with detailed
headers
- restrict available tools for prompt and exec descriptions when
code_mode_only is enabled and test the behavior
Testing
- Not run (not requested)
This change moves code_mode exec session settings out of the runtime API
and into an optional first-line pragma, so instead of calling runtime
helpers like set_yield_time() or set_max_output_tokens_per_exec_call(),
the model can write // @exec: {"yield_time_ms": ...,
"max_output_tokens": ...} at the top of the freeform exec source. Rust
now parses that pragma before building the source, validates it, and
passes the values directly in the exec start message to the code-mode
broker, which applies them at session start without any worker-runtime
mutation path. The @openai/code_mode module no longer exposes those
setter functions, the docs and grammar were updated to describe the
pragma form, and the existing code_mode tests were converted to use
pragma-based configuration instead.
Summary
- make all code-mode tools accessible as globals so callers only need
`tools.<name>`
- rename text/image helpers and key globals (store, load, ALL_TOOLS,
etc.) to reflect the new shared namespace
- update the JS bridge, runners, descriptions, router, and tests to
follow the new API
Testing
- Not run (not requested)
- Update the code-mode executor, wait handler, and protocol plumbing to
use cell IDs instead of session IDs for node communication
- Switch tool metadata, wait description, and suite tests to refer to
cell IDs so user-visible messages match the new terminology
**Testing**
- Not run (not requested)
## Summary
- create the turn-scoped `ToolCallRuntime` before starting the code mode
worker so the worker reuses the same runtime and router
- thread the shared runtime through the code mode service/worker path
and use it for nested tool calls
- model aborted tool calls as a concrete `ToolOutput` so aborted
responses still produce valid tool output shapes
## Testing
- `just fmt`
- `cargo test -p codex-core` (still running locally)
Summary
- expose the default yield timeout through code mode runtime so the
handler, wait tool, and protocol share the same 10s value that matches
unified exec
- document the timeout change in the tool descriptions and propagate the
value all the way into the runner metadata
- adjust Cargo.lock to keep the dependency tree in sync with the added
code mode tool dependency
Testing
- Not run (not requested)
- **Summary**
- migrate the code mode handler, service, worker, process, runner, and
bridge assets into the `tools/code_mode` module tree
- split Execution, protocol, and handler logic into dedicated files and
relocate the tool definition into `code_mode/spec.rs`
- update core references and tests to stitch the new organization
together
- **Testing**
- Not run (not requested)