Commit Graph

581 Commits

Author SHA1 Message Date
josiah-openai
937dd3812d Add supports_parallel_tool_calls flag to included mcps (#17667)
## Why

For more advanced MCP usage, we want the model to be able to emit
parallel MCP tool calls and have Codex execute eligible ones
concurrently, instead of forcing all MCP calls through the serial block.

The main design choice was where to thread the config. I made this
server-level because parallel safety depends on the MCP server
implementation. Codex reads the flag from `mcp_servers`, threads the
opted-in server names into `ToolRouter`, and checks the parsed
`ToolPayload::Mcp { server, .. }` at execution time. That avoids relying
on model-visible tool names, which can be incomplete in
deferred/search-tool paths or ambiguous for similarly named
servers/tools.

## What was added

Added `supports_parallel_tool_calls` for MCP servers.

Before:

```toml
[mcp_servers.docs]
command = "docs-server"
```

After:

```toml
[mcp_servers.docs]
command = "docs-server"
supports_parallel_tool_calls = true
```

MCP calls remain serial by default. Only tools from opted-in servers are
eligible to run in parallel. Docs also now warn to enable this only when
the server’s tools are safe to run concurrently, especially around
shared state or read/write races.

## Testing

Tested with a local stdio MCP server exposing real delay tools. The
model/Responses side was mocked only to deterministically emit two MCP
calls in the same turn.

Each test called `query_with_delay` and `query_with_delay_2` with `{
"seconds": 25 }`.

| Build/config | Observed | Wall time |
| --- | --- | --- |
| main with flag enabled | serial | `58.79s` |
| PR with flag enabled | parallel | `31.73s` |
| PR without flag | serial | `56.70s` |

PR with flag enabled showed both tools start before either completed;
main and PR-without-flag completed the first delay before starting the
second.

Also added an integration test.

Additional checks:

- `cargo test -p codex-tools` passed
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server` passed
- `git diff --check` passed
2026-04-13 15:16:34 -07:00
pakrym-oai
ac82443d07 Use AbsolutePathBuf in skill loading and codex_home (#17407)
Helps with FS migration later
2026-04-13 10:26:51 -07:00
friel-openai
776246c3f5 Make forked agent spawns keep parent model config (#17247)
## Summary

When a `spawn_agent` call does a full-history fork, keep the parent's
effective agent type and model configuration instead of applying child
role/model overrides.

This is the minimal config-inheritance slice of #16055. Prompt-cache key
inheritance and MCP tool-surface stability are split into follow-up PRs.

## Design

- Reject `agent_type`, `model`, and `reasoning_effort` for v1
`fork_context` spawns.
- Reject `agent_type`, `model`, and `reasoning_effort` for v2
`fork_turns = "all"` spawns.
- Keep v2 partial-history forks (`fork_turns = "N"`) configurable;
requested model/reasoning overrides and role config still apply there.
- Keep non-forked spawn behavior unchanged.

## Tests

- `cargo +1.93.1 test -p codex-core spawn_agent_fork_context --lib`
- `cargo +1.93.1 test -p codex-core multi_agent_v2_spawn_fork_turns
--lib`
- `cargo +1.93.1 test -p codex-core
multi_agent_v2_spawn_partial_fork_turns_allows_agent_type_override
--lib`
2026-04-13 15:28:40 +00:00
jif-oai
bacb92b1d7 Build remote exec env from exec-server policy (#17216)
## Summary
- add an exec-server `envPolicy` field; when present, the server starts
from its own process env and applies the shell environment policy there
- keep `env` as the exact environment for local/embedded starts, but
make it an overlay for remote unified-exec starts
- move the shell-environment-policy builder into `codex-config` so Core
and exec-server share the inherit/filter/set/include behavior
- overlay only runtime/sandbox/network deltas from Core onto the
exec-server-derived env

## Why
Remote unified exec was materializing the shell env inside Core and
forwarding the whole map to exec-server, so remote processes could
inherit the orchestrator machine's `HOME`, `PATH`, etc. This keeps the
base env on the executor while preserving Core-owned runtime additions
like `CODEX_THREAD_ID`, unified-exec defaults, network proxy env, and
sandbox marker env.

## Validation
- `just fmt`
- `git diff --check`
- `cargo test -p codex-exec-server --lib`
- `cargo test -p codex-core --lib unified_exec::process_manager::tests`
- `cargo test -p codex-core --lib exec_env::tests`
- `cargo test -p codex-core --lib exec_env_tests` (compile-only; filter
matched 0 tests)
- `cargo test -p codex-config --lib shell_environment` (compile-only;
filter matched 0 tests)
- `just bazel-lock-update`

## Known local validation issue
- `just bazel-lock-check` is not runnable in this checkout: it invokes
`./scripts/check-module-bazel-lock.sh`, which is missing.

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: pakrym-oai <pakrym@openai.com>
2026-04-13 09:59:08 +01:00
starr-openai
d626dc3895 Run exec-server fs operations through sandbox helper (#17294)
## Summary
- run exec-server filesystem RPCs requiring sandboxing through a
`codex-fs` arg0 helper over stdin/stdout
- keep direct local filesystem execution for `DangerFullAccess` and
external sandbox policies
- remove the standalone exec-server binary path in favor of top-level
arg0 dispatch/runtime paths
- add sandbox escape regression coverage for local and remote filesystem
paths

## Validation
- `just fmt`
- `git diff --check`
- remote devbox: `cd codex-rs && bazel test --bes_backend=
--bes_results_url= //codex-rs/exec-server:all` (6/6 passed)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-12 18:36:03 -07:00
pakrym-oai
7c1e41c8b6 Add MCP tool wall time to model output (#17406)
Include MCP wall time in the output so the model is aware of how long
it's calls are taking.
2026-04-12 18:26:15 -07:00
Francis Chalissery
720932ca3d [codex] Support flattened deferred MCP tool calls (#17556)
## Summary
- register flattened handler aliases for deferred MCP tools
- cover the node_repl-shaped deferred MCP call path in tool registry
tests

## Root Cause
Deferred MCP tools were registered only under their namespaced handler
key, e.g. `mcp__node_repl__:js`. If the model/bridge emitted the
flattened qualified name `mcp__node_repl__js`, core parsed it as an MCP
payload but dispatch looked up the flattened handler key and returned
`unsupported call` before reaching the MCP handler.

## Validation
- `just fmt`
- `cargo test -p codex-tools
search_tool_registers_deferred_mcp_flattened_handlers`
- `cargo test -p codex-core
search_tool_registers_namespaced_mcp_tool_aliases`
- `git diff --check`
2026-04-12 13:19:36 -07:00
Won Park
ba839c23f3 changing decision semantics after guardian timeout (#17486)
**Summary**

This PR treats Guardian timeouts as distinct from explicit denials in
the core approval paths.
Timeouts now return timeout-specific guidance instead of Guardian
policy-rejection messaging.
It updates the command, shell, network, and MCP approval flows and adds
focused test coverage.
2026-04-12 00:00:50 -07:00
sayan-oai
1325bcd3f6 chore: refactor name and namespace to single type (#17402)
avoid passing them both around, unify on a type. this now also keys
`ToolRegistry`.

tests pass
2026-04-11 23:06:22 +00:00
Eric Traut
e9e7ef3d36 Fix thread/list cwd filtering for Windows verbatim paths (#17414)
Addresses #17302

Problem: `thread/list` compared cwd filters with raw path equality, so
`resume --last` could miss Windows sessions when the saved cwd used a
verbatim path form and the current cwd did not.

Solution: Normalize cwd comparisons through the existing path comparison
utilities before falling back to direct equality, and add Windows
regression coverage for verbatim paths. I made this a general utility
function and replaced all of the duplicated instance of it across the
code base.
2026-04-10 23:08:02 -07:00
Won Park
37aac89a6d representing guardian review timeouts in protocol types (#17381)
## Summary

- Add `TimedOut` to Guardian/review carrier types:
  - `ReviewDecision::TimedOut`
  - `GuardianAssessmentStatus::TimedOut`
  - app-server v2 `GuardianApprovalReviewStatus::TimedOut`
- Regenerate app-server JSON/TypeScript schemas for the new wire shape.
- Wire the new status through core/app-server/TUI mappings with
conservative fail-closed handling.
- Keep `TimedOut` non-user-selectable in the approval UI.

**Does not change runtime behavior yet; emitting `TimeOut` and
parent-model timeout messaging will come in followup PRs**
2026-04-10 20:02:33 -07:00
Owen Lin
a3be74143a fix(guardian, app-server): introduce guardian review ids (#17298)
## Description

This PR introduces `review_id` as the stable identifier for guardian
reviews and exposes it in app-server `item/autoApprovalReview/started`
and `item/autoApprovalReview/completed` events.

Internally, guardian rejection state is now keyed by `review_id` instead
of the reviewed tool item ID. `target_item_id` is still included when a
review maps to a concrete thread item, but it is no longer overloaded as
the review lifecycle identifier.

## Motivation

We'd like to give users the ability to preempt a guardian review while
it's running (approve or decline).

However, we can't implement the API that allows the user to override a
running guardian review because we didn't have a unique `review_id` per
guardian review. Using `target_item_id` is not correct since:
- with execve reviews, there can be multiple execve calls (and therefore
guardian reviews) per shell command
- with network policy reviews, there is no target item ID

The PR that actually implements user overrides will use `review_id` as
the stable identifier.
2026-04-10 16:21:02 -07:00
jif-oai
d39a722865 feat: description multi-agent v2 (#17338) 2026-04-10 15:31:32 +01:00
viyatb-oai
b976e701a8 fix: support split carveouts in windows elevated sandbox (#14568)
## Summary
- preserve legacy Windows elevated sandbox behavior for existing
policies
- add elevated-only support for split filesystem policies that can be
represented as readable-root overrides, writable-root overrides, and
extra deny-write carveouts
- resolve those elevated filesystem overrides during sandbox transform
and thread them through setup and policy refresh
- keep failing closed for explicit unreadable (`none`) carveouts and
reopened writable descendants under read-only carveouts
- for explicit read-only-under-writable-root carveouts, materialize
missing carveout directories during elevated setup before applying the
deny-write ACL
- document the elevated vs restricted-token support split in the core
README

## Example
Given a split filesystem policy like:

```toml
":root" = "read"
":cwd" = "write"
"./docs" = "read"
"C:/scratch" = "write"
```

the elevated backend now provisions the readable-root overrides,
writable-root overrides, and extra deny-write carveouts during setup and
refresh instead of collapsing back to the legacy workspace-only shape.

If a read-only carveout under a writable root is missing at setup time,
elevated setup creates that carveout as an empty directory before
applying its deny-write ACE; otherwise the sandboxed command could
create it later and bypass the carveout. This is only for explicit
policy carveouts. Best-effort workspace protections like `.codex/` and
`.agents/` still skip missing directories.

A policy like:

```toml
"/workspace" = "write"
"/workspace/docs" = "read"
"/workspace/docs/tmp" = "write"
```

still fails closed, because the elevated backend does not reopen
writable descendants under read-only carveouts yet.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-09 17:34:52 -07:00
Matthew Zeng
d7f99b0fa6 [mcp] Expand tool search to custom MCPs. (#16944)
- [x] Expand tool search to custom MCPs.
- [x] Rename several variables/fields to be more generic.

Updated tool & server name lifecycles:

**Raw Identity**

ToolInfo.server_name is raw MCP server name.
ToolInfo.tool.name is raw MCP tool name.
MCP calls route back to raw via parse_tool_name() returning
(tool.server_name, tool.tool.name).
mcpServerStatus/list now groups by raw server and keys tools by
Tool.name: mod.rs:599
App-server just forwards that grouped raw snapshot:
codex_message_processor.rs:5245

**Callable Names**

On list-tools, we create provisional callable_namespace / callable_name:
mcp_connection_manager.rs:1556
For non-app MCP, provisional callable name starts as raw tool name.
For codex-apps, provisional callable name is sanitized and strips
connector name/id prefix; namespace includes connector name.
Then qualify_tools() sanitizes callable namespace + name to ASCII alnum
/ _ only: mcp_tool_names.rs:128
Note: this is stricter than Responses API. Hyphen is currently replaced
with _ for code-mode compatibility.

**Collision Handling**

We do initially collapse example-server and example_server to the same
base.
Then qualify_tools() detects distinct raw namespace identities behind
the same sanitized namespace and appends a hash to the callable
namespace: mcp_tool_names.rs:137
Same idea for tool-name collisions: hash suffix goes on callable tool
name.
Final list_all_tools() map key is callable_namespace + callable_name:
mcp_connection_manager.rs:769

**Direct Model Tools**

Direct MCP tool declarations use the full qualified sanitized key as the
Responses function name.
The raw rmcp Tool is converted but renamed for model exposure.

**Tool Search / Deferred**

Tool search result namespace = final ToolInfo.callable_namespace:
tool_search.rs:85
Tool search result nested name = final ToolInfo.callable_name:
tool_search.rs:86
Deferred tool handler is registered as "{namespace}:{name}":
tool_registry_plan.rs:248
When a function call comes back, core recombines namespace + name, looks
up the full qualified key, and gets the raw server/tool for MCP
execution: codex.rs:4353

**Separate Legacy Snapshot**

collect_mcp_snapshot_from_manager_with_detail() still returns a map
keyed by qualified callable name.
mcpServerStatus/list no longer uses that; it uses
McpServerStatusSnapshot, which is raw-inventory shaped.
2026-04-09 13:34:52 -07:00
neil-oai
a92a5085bd Forward app-server turn clientMetadata to Responses (#16009)
## Summary
App-server v2 already receives turn-scoped `clientMetadata`, but the
Rust app-server was dropping it before the outbound Responses request.
This change keeps the fix lightweight by threading that metadata through
the existing turn-metadata path rather than inventing a new transport.

## What we're trying to do and why
We want turn-scoped metadata from the app-server protocol layer,
especially fields like Hermes/GAAS run IDs, to survive all the way to
the actual Responses API request so it is visible in downstream
websocket request logging and analytics.

The specific bug was:
- app-server protocol uses camelCase `clientMetadata`
- Responses transport already has an existing turn metadata carrier:
`x-codex-turn-metadata`
- websocket transport already rewrites that header into
`request.request_body.client_metadata["x-codex-turn-metadata"]`
- but the Rust app-server never parsed or stored `clientMetadata`, so
nothing from the app-server request was making it into that existing
path

This PR fixes that without adding a new header or a second metadata
channel.

## How we did it
### Protocol surface
- Add optional `clientMetadata` to v2 `TurnStartParams` and
`TurnSteerParams`
- Regenerate the JSON schema / TypeScript fixtures
- Update app-server docs to describe the field and its behavior

### Runtime plumbing
- Add a dedicated core op for app-server user input carrying turn-scoped
metadata: `Op::UserInputWithClientMetadata`
- Wire `turn/start` and `turn/steer` through that op / signature path
instead of dropping the metadata at the message-processor boundary
- Store the metadata in `TurnMetadataState`

### Transport behavior
- Reuse the existing serialized `x-codex-turn-metadata` payload
- Merge the new app-server `clientMetadata` into that JSON additively
- Do **not** replace built-in reserved fields already present in the
turn metadata payload
- Keep websocket behavior unchanged at the outer shape level: it still
sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON
string now contains the merged fields
- Keep HTTP fallback behavior unchanged except that the existing
`x-codex-turn-metadata` header now includes the merged fields too

### Request shape before / after
Before, a websocket `response.create` looked like:
```json
{
  "type": "response.create",
  "client_metadata": {
    "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}"
  }
}
```
Even if the app-server caller supplied `clientMetadata`, it was not
represented there.

After, the same request shape is preserved, but the serialized payload
now includes the new turn-scoped fields:
```json
{
  "type": "response.create",
  "client_metadata": {
    "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}"
  }
}
```

## Validation
### Targeted tests added / updated
- protocol round-trip coverage for `clientMetadata` on `turn/start` and
`turn/steer`
- protocol round-trip coverage for `Op::UserInputWithClientMetadata`
- `TurnMetadataState` merge test proving client metadata is added
without overwriting reserved built-in fields
- websocket request-shape test proving outbound `response.create`
contains merged metadata inside
`client_metadata["x-codex-turn-metadata"]`
- app-server integration tests proving:
- `turn/start` forwards `clientMetadata` into the outbound Responses
request path
  - websocket warmup + real turn request both behave correctly
  - `turn/steer` updates the follow-up request metadata

### Commands run
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields
--lib`
- `cargo test -p codex-core --test all
responses_websocket_preserves_custom_turn_metadata_fields`
- `cargo test -p codex-app-server --test all client_metadata`
- `cargo test -p codex-app-server --test all
turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2
-- --nocapture`
- `just fmt`
- `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol
-p codex-app-server`
- `just fix -p codex-exec -p codex-tui-app-server`
- `just argument-comment-lint`

### Full suite note
`cargo test` in `codex-rs` still fails in:
-
`suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request`

I verified that same failure on a clean detached `HEAD` worktree with an
isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.
2026-04-09 11:52:37 -07:00
jif-oai
c0b5d8d24a Skip local shell snapshots for remote unified exec (#17217)
## Summary
- detect remote exec-server sessions in the unified-exec runtime
- bypass the local shell-snapshot bootstrap only for those remote
sessions
- preserve existing local snapshot wrapping, PowerShell UTF-8 prefixing,
sandbox orchestration, and zsh-fork handling

## Why
The shell snapshot file is currently captured and stored next to Core.
If Core wraps a remote command with `. /path/to/local/snapshot`, the
process starts on the executor and tries to source a path from the
orchestrator filesystem. This keeps remote commands from receiving that
known-local path until shell snapshots are captured/restored on the
executor side.

## Validation
- `just fmt`
- `git diff --check`
- `cargo test -p codex-core --lib tools::runtimes::tests`

Co-authored-by: Codex <noreply@openai.com>
2026-04-09 17:30:18 +01:00
maja-openai
dcbc91fd39 Update guardian output schema (#17061)
## Summary
- Update guardian output schema to separate risk, authorization,
outcome, and rationale.
- Feed guardian rationale into rejection messages.
- Split the guardian policy into template and tenant-config sections.

## Validation
- `cargo test -p codex-core mcp_tool_call`
- `env -u CODEX_SANDBOX_NETWORK_DISABLED INSTA_UPDATE=always cargo test
-p codex-core guardian::`

---------

Co-authored-by: Owen Lin <owen@openai.com>
2026-04-08 15:47:29 -07:00
canvrno-oai
58ad79b60e Fix missing fields (#17149)
Fix missing `image_generation_tool_auth_allowed` in two locations.
2026-04-08 13:53:53 -07:00
Won Park
e003f84e1e release ready, enabling only for siwc users (#17046)
**Disabling Image-Gen for Non-SIWC Codex Users**

We are only enabling image-gen feature for SIWC Codex users until there
comes a fix in ResponsesAPI to omit output from responses.completed, to
prevent the following issues:

1. websocket blows up due to heavier load (images) than before (text) 
2. http parser streams through n^2 of n-base64 bytes (sum of base64s of
all images generated in turn) that causes long delays in
turn_completion.
2026-04-08 11:22:39 -07:00
pakrym-oai
35b5720e8d Use AbsolutePathBuf for exec cwd plumbing (#17063)
## Summary
- Carry `AbsolutePathBuf` through tool cwd parsing/resolution instead of
resolving workdirs to raw `PathBuf`s.
- Type exec/sandbox request cwd fields as `AbsolutePathBuf` through
`ExecParams`, `ExecRequest`, `SandboxCommand`, and unified exec runtime
requests.
- Keep `PathBuf` conversions at external/event boundaries and update
existing tests/fixtures for the typed cwd.

## Validation
- `cargo check -p codex-core --tests`
- `cargo check -p codex-sandboxing --tests`
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-core --lib tools::handlers::`
- `just fix -p codex-sandboxing`
- `just fix -p codex-core`
- `just fmt`

Full `codex-core` test suite was not run locally; per repo guidance I
kept local validation targeted.
2026-04-08 10:54:12 -07:00
pakrym-oai
4c07dd4d25 Configure multi_agent_v2 spawn agent hints (#17071)
Allow multi_agent_v2 features to have its own temporary configuration
under `[features.multi_agent_v2]`

```
[features.multi_agent_v2]
enabled = true
usage_hint_enabled = false
usage_hint_text = "Custom delegation guidance."
hide_spawn_agent_metadata = true
```

Absent `usage_hint_text` means use the default hint.

```
[features]
multi_agent_v2 = true
```

still works as the boolean shorthand.
2026-04-08 08:42:18 -07:00
Vivian Fang
d47b755aa2 Render namespace description for tools (#16879) 2026-04-08 02:39:40 -07:00
Vivian Fang
ea516f9a40 Support anyOf and enum in JsonSchema (#16875)
This brings us into better alignment with the JSON schema subset that is
supported in
<https://developers.openai.com/api/docs/guides/structured-outputs#supported-schemas>,
and also allows us to render richer function signatures in code mode
(e.g., anyOf{null, OtherObjectType})
2026-04-08 01:07:55 -07:00
viyatb-oai
3c1adbabcd fix: refresh network proxy settings when sandbox mode changes (#17040)
## Summary

Fix network proxy sessions so changing sandbox mode recomputes the
effective managed network policy and applies it to the already-running
per-session proxy.

## Root Cause

`danger_full_access_denylist_only` injects `"*"` only while building the
proxy spec for Full Access. Sessions built that spec once at startup, so
a later permission switch to Full Access left the live proxy in its
original restricted policy. Switching back needed the same recompute
path to remove the synthetic wildcard again.

## What Changed

- Preserve the original managed network proxy config/requirements so the
effective spec can be recomputed for a new sandbox policy.
- Refresh the current session proxy when sandbox settings change, then
reapply exec-policy network overlays.
- Add an in-place proxy state update path while rejecting
listener/port/SOCKS changes that cannot be hot-reloaded.
- Keep runtime proxy settings cheap to snapshot and update.
- Add regression coverage for workspace-write -> Full Access ->
workspace-write.
2026-04-08 03:07:55 +00:00
pakrym-oai
600c3e49e0 [codex] Apply patches through executor filesystem (#17048)
## Summary
- run apply_patch through the executor filesystem when a remote
environment is present instead of shelling out to the local process
- thread the executor FileSystem into apply_patch interception and keep
existing local behavior for non-remote turns
- make the apply_patch integration harness use the executor filesystem
for setup/assertions
- add remote-aware skips for turn-diff coverage that still reads the
test-runner filesystem

## Why
Remote apply_patch needed to mutate the remote workspace instead of the
local checkout. The tests also needed to seed and assert workspace state
through the same filesystem abstraction so local and remote runs
exercise the same behavior.

## Validation
- `just fmt`
- `git diff --check`
- `cargo check -p core_test_support --tests`
- `cargo test -p codex-core --test all
suite::shell_serialization::apply_patch_custom_tool_call -- --nocapture`
- `cargo test -p codex-core --test all
suite::apply_patch_cli::apply_patch_cli_updates_file_appends_trailing_newline
-- --nocapture`
- remote `cargo test -p codex-core --test all apply_patch_cli --
--nocapture` (229 passed)
2026-04-07 16:35:02 -07:00
Dylan Hurd
6c36e7d688 fix(app-server) revert null instructions changes (#17047) 2026-04-07 15:18:34 -07:00
pakrym-oai
e9702411ab [codex] Migrate apply_patch to executor filesystem (#17027)
- Migrate apply-patch verification and application internals to use the
async `ExecutorFileSystem` abstraction from `exec-server`.
- Convert apply-patch `cwd` handling to `AbsolutePathBuf` through the
verifier/parser/handler boundary.

Doesn't change how the tool itself works.
2026-04-07 21:20:22 +00:00
Dylan Hurd
d45513ce5a fix(core) revert Command line in unified exec output (#17031)
## Summary
https://github.com/openai/codex/pull/13860 changed the serialized output
format of Unified Exec. This PR reverts those changes and some related
test changes

## Testing
- [x] Update tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-07 13:35:40 -07:00
pakrym-oai
f1a2b920f9 [codex] Make AbsolutePathBuf joins infallible (#16981)
Having to check for errors every time join is called is painful and
unnecessary.
2026-04-07 10:52:08 -07:00
Owen Lin
0b9e42f6f7 fix(guardian): don't throw away transcript when over budget (#16956)
## Description

This PR changes guardian transcript compaction so oversized
conversations no longer collapse into a nearly empty placeholder.

Before this change, if the retained user history alone exceeded the
message budget, guardian would replace the entire transcript with
`<transcript omitted to preserve budget for planned action>`!

That meant approvals, especially network approvals, could lose the
recent tool call and tool result that explained what guardian was
actually reviewing. Now we keep a compact but usable transcript instead
of dropping it all.

### Before
```
The following is the Codex agent history whose request action you are assessing...
>>> TRANSCRIPT START
<transcript omitted to preserve budget for planned action>
>>> TRANSCRIPT END

Conversation transcript omitted due to size.

The Codex agent has requested the following action:
>>> APPROVAL REQUEST START
Retry reason:
Sandbox blocked outbound network access.

Assess the exact planned action below. Use read-only tool checks when local state matters.
Planned action JSON:
{
  "tool": "network_access",
  "target": "https://example.com:443",
  "host": "example.com",
  "protocol": "https",
  "port": 443
}
>>> APPROVAL REQUEST END
```

### After
```
The following is the Codex agent history whose request action you are assessing...
>>> TRANSCRIPT START
[1] user: Please investigate why uploads to example.com are failing and retry if needed.
[8] user: If the request looks correct, go ahead and try again with network access.
[9] tool shell call: {"command":["curl","-X","POST","https://example.com/upload"],"cwd":"/repo"}
[10] tool shell result: sandbox blocked outbound network access
>>> TRANSCRIPT END

Some conversation entries were omitted.

The Codex agent has requested the following action:
>>> APPROVAL REQUEST START
Retry reason:
Sandbox blocked outbound network access.

Assess the exact planned action below. Use read-only tool checks when local state matters.
Planned action JSON:
{
  "tool": "network_access",
  "target": "https://example.com:443",
  "host": "example.com",
  "protocol": "https",
  "port": 443
}
>>> APPROVAL REQUEST END
```
2026-04-07 10:19:16 -07:00
Eric Traut
feb4f0051a Fix nested exec thread ID restore (#16882)
Addresses #15527

Problem: Nested `codex exec` commands could source a shell snapshot that
re-exported the parent `CODEX_THREAD_ID`, so commands inside the nested
session were attributed to the wrong thread.

Solution: Reapply the live command env's `CODEX_THREAD_ID` after
sourcing the snapshot.
2026-04-07 09:26:22 -07:00
Eric Traut
3b32de4fab Stabilize flaky multi-agent followup interrupt test (#16739)
Problem: The multi-agent followup interrupt test polled history before
interrupt cleanup and mailbox wakeup were guaranteed to settle, which
made it flaky under CI scheduling variance.

Solution: Wait for the child turn's `TurnAborted(Interrupted)` event
before asserting that the redirected assistant envelope is recorded and
no plain user message is left behind.
2026-04-07 09:24:14 -07:00
jif-oai
4cc6818996 chore: keep request_user_input tool to persist cache on multi-agents (#17009) 2026-04-07 16:53:31 +01:00
pakrym-oai
413c1e1fdf [codex] reduce module visibility (#16978)
## Summary
- reduce public module visibility across Rust crates, preferring private
or crate-private modules with explicit crate-root public exports
- update external call sites and tests to use the intended public crate
APIs instead of reaching through module trees
- add the module visibility guideline to AGENTS.md

## Validation
- `cargo check --workspace --all-targets --message-format=short` passed
before the final fix/format pass
- `just fix` completed successfully
- `just fmt` completed successfully
- `git diff --check` passed
2026-04-07 08:03:35 -07:00
jif-oai
99f167e6bf chore: hide nickname for debug flag (#17007) 2026-04-07 11:31:13 +01:00
jif-oai
68e16baabe chore: send_message and followup_task do not return anything (#17008) 2026-04-07 11:26:36 +01:00
jif-oai
2a8c3a2a52 feat: drop agent ID from v2 (#17005) 2026-04-07 10:56:01 +01:00
Ahmed Ibrahim
24c598e8a9 Honor null thread instructions (#16964)
- Treat explicit null thread instructions as a blank-slate override
while preserving omitted-field fallback behavior.
- Preserve null through rollout resume/fork and keep explicit empty
strings distinct.
- Add app-server v2 start/fork coverage for the tri-state instruction
params.
2026-04-07 04:10:19 +00:00
starr-openai
a504d8f0fa Disable env-bound tools when exec server is none (#16349)
## Summary
- make `CODEX_EXEC_SERVER_URL=none` map to an explicit disabled
environment mode instead of inferring from a missing URL
- expose environment capabilities (`exec_enabled`, `filesystem_enabled`)
so tool building can gate behavior explicitly and future
multi-environment work has a clearer seam
- suppress env-backed tools when the relevant capability is unavailable,
including exec tools, `js_repl`, `apply_patch`, `list_dir`, and
`view_image`
- keep handler/runtime backstops so disabled environments still reject
execution if a tool path somehow bypasses registration

## Testing
- `just fmt`
- `cargo test -p codex-exec-server`
- `cargo test -p codex-tools
disabled_environment_omits_environment_backed_tools`
- `cargo test -p codex-tools
environment_capabilities_gate_exec_and_filesystem_tools_independently`
- remote devbox Bazel build via `codex-applied-devbox`:
`//codex-rs/cli:cli`
2026-04-06 17:22:06 -07:00
rhan-oai
756c45ec61 [codex-analytics] add protocol-native turn timestamps (#16638)
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16638).
* #16870
* #16706
* #16659
* #16641
* #16640
* __->__ #16638
2026-04-06 16:22:59 -07:00
Ahmed Ibrahim
8a19dbb177 Add spawn context for MultiAgentV2 children (#16746) 2026-04-03 19:56:59 -07:00
Ahmed Ibrahim
af8a9d2d2b remove temporary ownership re-exports (#16626)
Stacked on #16508.

This removes the temporary `codex-core` / `codex-login` re-export shims
from the ownership split and rewrites callsites to import directly from
`codex-model-provider-info`, `codex-models-manager`, `codex-api`,
`codex-protocol`, `codex-feedback`, and `codex-response-debug-context`.

No behavior change intended; this is the mechanical import cleanup layer
split out from the ownership move.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-03 00:33:34 -07:00
Ahmed Ibrahim
6fff9955f1 extract models manager and related ownership from core (#16508)
## Summary
- split `models-manager` out of `core` and add `ModelsManagerConfig`
plus `Config::to_models_manager_config()` so model metadata paths stop
depending on `core::Config`
- move login-owned/auth-owned code out of `core` into `codex-login`,
move model provider config into `codex-model-provider-info`, move API
bridge mapping into `codex-api`, move protocol-owned types/impls into
`codex-protocol`, and move response debug helpers into a dedicated
`response-debug-context` crate
- move feedback tag emission into `codex-feedback`, relocate tests to
the crates that now own the code, and keep broad temporary re-exports so
this PR avoids a giant import-only rewrite

## Major moves and decisions
- created `codex-models-manager` as the owner for model
cache/catalog/config/model info logic, including the new
`ModelsManagerConfig` struct
- created `codex-model-provider-info` as the owner for provider config
parsing/defaults and kept temporary `codex-login`/`codex-core`
re-exports for old import paths
- moved `api_bridge` error mapping + `CoreAuthProvider` into
`codex-api`, while `codex-login::api_bridge` temporarily re-exports
those symbols and keeps the `auth_provider_from_auth` wrapper
- moved `auth_env_telemetry` and `provider_auth` ownership to
`codex-login`
- moved `CodexErr` ownership to `codex-protocol::error`, plus
`StreamOutput`, `bytes_to_string_smart`, and network policy helpers to
protocol-owned modules
- created `codex-response-debug-context` for
`extract_response_debug_context`, `telemetry_transport_error_message`,
and related response-debug plumbing instead of leaving that behavior in
`core`
- moved `FeedbackRequestTags`, `emit_feedback_request_tags`, and
`emit_feedback_request_tags_with_auth_env` to `codex-feedback`
- deferred removal of temporary re-exports and the mechanical import
rewrites to a stacked follow-up PR so this PR stays reviewable

## Test moves
- moved auth refresh coverage from `core/tests/suite/auth_refresh.rs` to
`login/tests/suite/auth_refresh.rs`
- moved text encoding coverage from
`core/tests/suite/text_encoding_fix.rs` to
`protocol/src/exec_output_tests.rs`
- moved model info override coverage from
`core/tests/suite/model_info_overrides.rs` to
`models-manager/src/model_info_overrides_tests.rs`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-02 23:00:02 -07:00
Michael Bolin
7a3eec6fdb core: cut codex-core compile time 48% with native async SessionTask (#16631)
## Why

This continues the compile-time cleanup from #16630. `SessionTask`
implementations are monomorphized, but `Session` stores the task behind
a `dyn` boundary so it can drive and abort heterogenous turn tasks
uniformly. That means we can move the `#[async_trait]` expansion off the
implementation trait, keep a small boxed adapter only at the storage
boundary, and preserve the existing task lifecycle semantics while
reducing the amount of generated async-trait glue in `codex-core`.

One measurement caveat showed up while exploring this: a warm
incremental benchmark based on `touch core/src/tasks/mod.rs && cargo
check -p codex-core --lib` was basically flat, but that was the wrong
benchmark for this change. Using package-clean `codex-core` rebuilds,
like #16630, shows the real win.

Relevant pre-change code:

- [`SessionTask` with
`#[async_trait]`](3c7f013f97/codex-rs/core/src/tasks/mod.rs (L129-L182))
- [`RunningTask` storing `Arc<dyn
SessionTask>`](3c7f013f97/codex-rs/core/src/state/turn.rs (L69-L77))

## What changed

- Switched `SessionTask::{run, abort}` to native RPITIT futures with
explicit `Send` bounds.
- Added a private `AnySessionTask` adapter that boxes those futures only
at the `Arc<dyn ...>` storage boundary.
- Updated `RunningTask` to store `Arc<dyn AnySessionTask>` and removed
`#[async_trait]` from the concrete task impls plus test-only
`SessionTask` impls.

## Timing

Benchmarked package-clean `codex-core` rebuilds with dependencies left
warm:

```shell
cargo check -p codex-core --lib >/dev/null
cargo clean -p codex-core >/dev/null
/usr/bin/time -p cargo +nightly rustc -p codex-core --lib -- \
  -Z time-passes \
  -Z time-passes-format=json >/dev/null
```

| revision | rustc `total` | process `real` | `generate_crate_metadata`
| `MIR_borrow_checking` | `monomorphization_collector_graph_walk` |
| --- | ---: | ---: | ---: | ---: | ---: |
| parent `3c7f013f9735` | 67.21s | 67.71s | 24.61s | 23.43s | 22.43s |
| this PR `2cafd783ac22` | 35.08s | 35.60s | 8.01s | 7.25s | 7.15s |
| delta | -47.8% | -47.4% | -67.5% | -69.1% | -68.1% |

For completeness, the warm touched-file benchmark stayed flat (`1.96s`
parent vs `1.97s` this PR), which is why that benchmark should not be
used to evaluate this refactor.

## Verification

- Ran `cargo test -p codex-core`; this change compiled and task-related
tests passed before hitting the same unrelated 5
`config::tests::*guardian*` failures already present on the parent
stack.
2026-04-02 23:39:56 +00:00
Michael Bolin
3c7f013f97 core: cut codex-core compile time 63% with native async ToolHandler (#16630)
## Why

`ToolHandler` was still paying a large compile-time tax from
`#[async_trait]` on every concrete handler impl, even though the only
object-safe boundary the registry actually stores is the internal
`AnyToolHandler` adapter.

This PR removes that macro-generated async wrapper layer from concrete
`ToolHandler` impls while keeping the existing object-safe shim in
`AnyToolHandler`. In practice, that gets essentially the same
compile-time win as the larger type-erasure refactor in #16627, but with
a much smaller diff and without changing the public shape of
`ToolHandler<Output = T>`.

That tradeoff matters here because this is a broad `codex-core` hotspot
and reviewers should be able to judge the compile-time impact from hard
numbers, not vibes.

## Headline result

On a clean `codex-core` package rebuild (`cargo clean -p codex-core`
before each command), rustc `total` dropped from **187.15s to 68.98s**
versus the shared `0bd31dc382bd` baseline: **-63.1%**.

The biggest hot passes dropped by roughly **71-72%**:

| Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta |
|---|---:|---:|---:|
| `total` | 187.15s | 68.98s | **-63.1%** |
| `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** |
| `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** |
| `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%**
|
| `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** |

Important caveat: `-Z time-passes` timings are nested, so
`generate_crate_metadata` and `monomorphization_collector_graph_walk`
are mostly overlapping, not additive.

## Why this PR over #16627

#16627 already proved that the `ToolHandler` stack was the right
hotspot, but it got there by making `ToolHandler` object-safe and
changing every handler to return `BoxFuture<Result<AnyToolResult, _>>`
directly.

This PR keeps the lower-churn shape:

- `ToolHandler` remains generic over `type Output`.
- Concrete handlers use native RPITIT futures with explicit `Send`
bounds.
- `AnyToolHandler` remains the only object-safe adapter and still does
the boxing at the registry boundary, as before.
- The implementation diff is only **33 files, +28/-77**.

The measurements are at least comparable, and in this run this PR is
slightly faster than #16627 on the pass-level total:

| Metric | #16627 | This PR | Delta |
|---|---:|---:|---:|
| `total` | 79.90s | 68.98s | **-13.7%** |
| `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** |
| `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%**
|
| `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% |

## Profile data

### Crate-level timings

`cargo +nightly build -p codex-core --lib -Z unstable-options
--timings=json` after `cargo clean -p codex-core`.

Baseline data below is reused from the shared parent `0bd31dc382bd`
profile because this PR and #16627 are both one commit on top of that
same parent.

| Crate | Baseline `duration` | This PR `duration` | Delta | Baseline
`rmeta_time` | This PR `rmeta_time` | Delta |
|---|---:|---:|---:|---:|---:|---:|
| `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** |
174.474507208s | 55.873015583s | **-68.0%** |
| `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a |

### Pass-level timings

`cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z
time-passes-format=json` after `cargo clean -p codex-core`.

| Pass | Baseline | This PR | Delta |
|---|---:|---:|---:|
| `total` | 187.150662083s | 68.978770375s | **-63.1%** |
| `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%**
|
| `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** |
| `monomorphization_collector_graph_walk` | 79.737515042s |
22.190207417s | **-72.2%** |
| `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% |
| `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% |
| `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% |
| process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s /
85.90s / 2.92s | n/a |

### Self-profile query summary

`cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z
self-profile-events=default,query-keys,args,llvm,artifact-sizes` after
`cargo clean -p codex-core`, summarized with `measureme summarize -p
0.5`.

| Query / phase | Baseline self time | This PR self time | Delta |
Baseline total time | This PR total time | Baseline item count | This PR
item count | Baseline cache hits | This PR cache hits |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s |
48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 |
| `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184
| n/a | 15,298 |
| `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a |
79,247 |
| `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s
| n/a | 256 | n/a | 0 |
| `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a |
0 |
| `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 |
n/a | 0 |
| `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a |
99,990 | n/a | 0 |
| `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a
| 78,762 | n/a | 235,877 |

| Summary | Baseline | This PR |
|---|---:|---:|
| `evaluate_obligation` % of total CPU | 70.821% | 38.880% |
| self-profile total CPU time | 255.042999997s | 120.661175956s |
| process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s /
103.66s / 3.54s |

### Artifact sizes

From the same `measureme summarize` output:

| Artifact | Baseline | This PR | Delta |
|---|---:|---:|---:|
| `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 |
| `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 |
| `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448
|
| `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 |
| `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 |
| `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 |
| `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes |
-6,073 |
| `work_product_index` | 19,565 bytes | 19,565 bytes | 0 |

### Baseline hotspots before this change

These are the top normalized obligation buckets from the shared baseline
profile:

| Obligation bucket | Samples | Duration |
|---|---:|---:|
| `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s |
| `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 |
5.63s |
| `trait:T as tools::registry::ToolHandler` | 876 | 5.45s |
| `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s |
| `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s |
|
`outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider`
| 637 | 3.73s |
| `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s |
| `outlives:tasks::regular::RegularTask` | 726 | 3.57s |

Top `items_of_instance` entries before this change were mostly concrete
async handler/task impls:

| Instance | Duration |
|---|---:|
| `tasks::regular::{impl#2}::run` | 3.79s |
| `tools::handlers::mcp::{impl#0}::handle` | 3.27s |
| `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action`
| 3.09s |
| `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s |
| `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s |
| `tasks::review::{impl#4}::run` | 2.82s |
| `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s |
| `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` |
2.73s |
| `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s |
| `tasks::compact::{impl#4}::run` | 2.45s |

## What changed

Relevant pre-change registry shape:
[`codex-rs/core/src/tools/registry.rs`](0bd31dc382/codex-rs/core/src/tools/registry.rs (L38-L219))

Current registry shape in this PR:
[`codex-rs/core/src/tools/registry.rs`](41f7ac0ade/codex-rs/core/src/tools/registry.rs (L38-L203))

- `ToolHandler::{is_mutating, handle}` now return native `impl Future +
Send` futures instead of using `#[async_trait]`.
- `AnyToolHandler` remains the object-safe adapter and boxes those
futures at the registry boundary with explicit lifetimes.
- Concrete handlers and the registry test handler drop `#[async_trait]`
but otherwise keep their async method bodies intact.
- Representative examples:
[`codex-rs/core/src/tools/handlers/shell.rs`](41f7ac0ade/codex-rs/core/src/tools/handlers/shell.rs (L223-L379)),
[`codex-rs/core/src/tools/handlers/unified_exec.rs`](41f7ac0ade/codex-rs/core/src/tools/handlers/unified_exec.rs),
[`codex-rs/core/src/tools/registry_tests.rs`](41f7ac0ade/codex-rs/core/src/tools/registry_tests.rs)

## Tradeoff

This is intentionally less invasive than #16627: it does **not** move
result boxing into every concrete handler and does **not** change
`ToolHandler` into an object-safe trait.

Instead, it keeps the existing registry-level type-erasure boundary and
only removes the macro-generated async wrapper layer from concrete
impls. So the runtime boxing story stays basically the same as before,
while the compile-time savings are still large.

## Verification

Existing verification for this branch still applies:

- Ran `cargo test -p codex-core`; this change compiled and the suite
reached the known unrelated `config::tests::*guardian*` failures, with
no local diff under `codex-rs/core/src/config/`.

Profiling commands used for the tables above:

- `cargo clean -p codex-core`
- `cargo +nightly build -p codex-core --lib -Z unstable-options
--timings=json`
- `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z
time-passes-format=json`
- `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z
self-profile-events=default,query-keys,args,llvm,artifact-sizes`
- `measureme summarize -p 0.5`
2026-04-02 16:03:52 -07:00
jif-oai
7fc36249b5 chore: rename assign_task for followup_task (#16571) 2026-04-02 16:51:17 +02:00
Michael Bolin
c1d18ceb6f [codex] Remove codex-core config type shim (#16529)
## Why

This finishes the config-type move out of `codex-core` by removing the
temporary compatibility shim in `codex_core::config::types`. Callers now
depend on `codex-config` directly, which keeps these config model types
owned by the config crate instead of re-expanding `codex-core` as a
transitive API surface.

## What Changed

- Removed the `codex-rs/core/src/config/types.rs` re-export shim and the
`core::config::ApprovalsReviewer` re-export.
- Updated `codex-core`, `codex-cli`, `codex-tui`, `codex-app-server`,
`codex-mcp-server`, and `codex-linux-sandbox` call sites to import
`codex_config::types` directly.
- Added explicit `codex-config` dependencies to downstream crates that
previously relied on the `codex-core` re-export.
- Regenerated `codex-rs/core/config.schema.json` after updating the
config docs path reference.
2026-04-02 01:19:44 -07:00
Michael Bolin
e846fed2b1 fix: move some test utilities out of codex-rs/core/src/tools/spec.rs (#16524)
The `#[cfg(test)]` in `codex-rs/core/src/tools/spec.rs` smelled funny to
me and it turns out these members were straightforward to move.
2026-04-02 00:49:37 -07:00
Michael Bolin
5131e0de45 Move tool registry plan tests into codex-tools (#16521)
## Why

#16513 moved pure tool-registry planning into `codex-tools`, but much of
the corresponding spec/feature-gating coverage still lived in
`codex-core`. That leaves the tests for planner behavior in the crate
that no longer owns that logic and makes the next extraction steps
harder to review.

## What

Move the planner-only `spec_tests.rs` coverage into
`codex-rs/tools/src/tool_registry_plan_tests.rs` and wire it up from
`codex-rs/tools/src/tool_registry_plan.rs` using the crate-local `#[path
= "tool_registry_plan_tests.rs"] mod tests;` pattern.

The `codex-core` test file now keeps the core-side integration checks:
router-visible model tool lists, namespaced handler alias registration,
shell adapter behavior, and MCP schema edge cases that still exercise
the `core` binding layer.

## Verification

- `cargo test -p codex-tools`
- `cargo test -p codex-core tools::spec::tests`
2026-04-02 00:26:51 -07:00