Commit Graph

25 Commits

Author SHA1 Message Date
Matthew Zeng
d7f99b0fa6 [mcp] Expand tool search to custom MCPs. (#16944)
- [x] Expand tool search to custom MCPs.
- [x] Rename several variables/fields to be more generic.

Updated tool & server name lifecycles:

**Raw Identity**

ToolInfo.server_name is raw MCP server name.
ToolInfo.tool.name is raw MCP tool name.
MCP calls route back to raw via parse_tool_name() returning
(tool.server_name, tool.tool.name).
mcpServerStatus/list now groups by raw server and keys tools by
Tool.name: mod.rs:599
App-server just forwards that grouped raw snapshot:
codex_message_processor.rs:5245

**Callable Names**

On list-tools, we create provisional callable_namespace / callable_name:
mcp_connection_manager.rs:1556
For non-app MCP, provisional callable name starts as raw tool name.
For codex-apps, provisional callable name is sanitized and strips
connector name/id prefix; namespace includes connector name.
Then qualify_tools() sanitizes callable namespace + name to ASCII alnum
/ _ only: mcp_tool_names.rs:128
Note: this is stricter than Responses API. Hyphen is currently replaced
with _ for code-mode compatibility.

**Collision Handling**

We do initially collapse example-server and example_server to the same
base.
Then qualify_tools() detects distinct raw namespace identities behind
the same sanitized namespace and appends a hash to the callable
namespace: mcp_tool_names.rs:137
Same idea for tool-name collisions: hash suffix goes on callable tool
name.
Final list_all_tools() map key is callable_namespace + callable_name:
mcp_connection_manager.rs:769

**Direct Model Tools**

Direct MCP tool declarations use the full qualified sanitized key as the
Responses function name.
The raw rmcp Tool is converted but renamed for model exposure.

**Tool Search / Deferred**

Tool search result namespace = final ToolInfo.callable_namespace:
tool_search.rs:85
Tool search result nested name = final ToolInfo.callable_name:
tool_search.rs:86
Deferred tool handler is registered as "{namespace}:{name}":
tool_registry_plan.rs:248
When a function call comes back, core recombines namespace + name, looks
up the full qualified key, and gets the raw server/tool for MCP
execution: codex.rs:4353

**Separate Legacy Snapshot**

collect_mcp_snapshot_from_manager_with_detail() still returns a map
keyed by qualified callable name.
mcpServerStatus/list no longer uses that; it uses
McpServerStatusSnapshot, which is raw-inventory shaped.
2026-04-09 13:34:52 -07:00
Vivian Fang
d47b755aa2 Render namespace description for tools (#16879) 2026-04-08 02:39:40 -07:00
Michael Bolin
3c7f013f97 core: cut codex-core compile time 63% with native async ToolHandler (#16630)
## Why

`ToolHandler` was still paying a large compile-time tax from
`#[async_trait]` on every concrete handler impl, even though the only
object-safe boundary the registry actually stores is the internal
`AnyToolHandler` adapter.

This PR removes that macro-generated async wrapper layer from concrete
`ToolHandler` impls while keeping the existing object-safe shim in
`AnyToolHandler`. In practice, that gets essentially the same
compile-time win as the larger type-erasure refactor in #16627, but with
a much smaller diff and without changing the public shape of
`ToolHandler<Output = T>`.

That tradeoff matters here because this is a broad `codex-core` hotspot
and reviewers should be able to judge the compile-time impact from hard
numbers, not vibes.

## Headline result

On a clean `codex-core` package rebuild (`cargo clean -p codex-core`
before each command), rustc `total` dropped from **187.15s to 68.98s**
versus the shared `0bd31dc382bd` baseline: **-63.1%**.

The biggest hot passes dropped by roughly **71-72%**:

| Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta |
|---|---:|---:|---:|
| `total` | 187.15s | 68.98s | **-63.1%** |
| `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** |
| `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** |
| `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%**
|
| `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** |

Important caveat: `-Z time-passes` timings are nested, so
`generate_crate_metadata` and `monomorphization_collector_graph_walk`
are mostly overlapping, not additive.

## Why this PR over #16627

#16627 already proved that the `ToolHandler` stack was the right
hotspot, but it got there by making `ToolHandler` object-safe and
changing every handler to return `BoxFuture<Result<AnyToolResult, _>>`
directly.

This PR keeps the lower-churn shape:

- `ToolHandler` remains generic over `type Output`.
- Concrete handlers use native RPITIT futures with explicit `Send`
bounds.
- `AnyToolHandler` remains the only object-safe adapter and still does
the boxing at the registry boundary, as before.
- The implementation diff is only **33 files, +28/-77**.

The measurements are at least comparable, and in this run this PR is
slightly faster than #16627 on the pass-level total:

| Metric | #16627 | This PR | Delta |
|---|---:|---:|---:|
| `total` | 79.90s | 68.98s | **-13.7%** |
| `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** |
| `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%**
|
| `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% |

## Profile data

### Crate-level timings

`cargo +nightly build -p codex-core --lib -Z unstable-options
--timings=json` after `cargo clean -p codex-core`.

Baseline data below is reused from the shared parent `0bd31dc382bd`
profile because this PR and #16627 are both one commit on top of that
same parent.

| Crate | Baseline `duration` | This PR `duration` | Delta | Baseline
`rmeta_time` | This PR `rmeta_time` | Delta |
|---|---:|---:|---:|---:|---:|---:|
| `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** |
174.474507208s | 55.873015583s | **-68.0%** |
| `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a |

### Pass-level timings

`cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z
time-passes-format=json` after `cargo clean -p codex-core`.

| Pass | Baseline | This PR | Delta |
|---|---:|---:|---:|
| `total` | 187.150662083s | 68.978770375s | **-63.1%** |
| `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%**
|
| `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** |
| `monomorphization_collector_graph_walk` | 79.737515042s |
22.190207417s | **-72.2%** |
| `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% |
| `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% |
| `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% |
| process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s /
85.90s / 2.92s | n/a |

### Self-profile query summary

`cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z
self-profile-events=default,query-keys,args,llvm,artifact-sizes` after
`cargo clean -p codex-core`, summarized with `measureme summarize -p
0.5`.

| Query / phase | Baseline self time | This PR self time | Delta |
Baseline total time | This PR total time | Baseline item count | This PR
item count | Baseline cache hits | This PR cache hits |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s |
48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 |
| `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184
| n/a | 15,298 |
| `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a |
79,247 |
| `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s
| n/a | 256 | n/a | 0 |
| `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a |
0 |
| `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 |
n/a | 0 |
| `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a |
99,990 | n/a | 0 |
| `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a
| 78,762 | n/a | 235,877 |

| Summary | Baseline | This PR |
|---|---:|---:|
| `evaluate_obligation` % of total CPU | 70.821% | 38.880% |
| self-profile total CPU time | 255.042999997s | 120.661175956s |
| process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s /
103.66s / 3.54s |

### Artifact sizes

From the same `measureme summarize` output:

| Artifact | Baseline | This PR | Delta |
|---|---:|---:|---:|
| `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 |
| `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 |
| `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448
|
| `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 |
| `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 |
| `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 |
| `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes |
-6,073 |
| `work_product_index` | 19,565 bytes | 19,565 bytes | 0 |

### Baseline hotspots before this change

These are the top normalized obligation buckets from the shared baseline
profile:

| Obligation bucket | Samples | Duration |
|---|---:|---:|
| `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s |
| `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 |
5.63s |
| `trait:T as tools::registry::ToolHandler` | 876 | 5.45s |
| `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s |
| `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s |
|
`outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider`
| 637 | 3.73s |
| `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s |
| `outlives:tasks::regular::RegularTask` | 726 | 3.57s |

Top `items_of_instance` entries before this change were mostly concrete
async handler/task impls:

| Instance | Duration |
|---|---:|
| `tasks::regular::{impl#2}::run` | 3.79s |
| `tools::handlers::mcp::{impl#0}::handle` | 3.27s |
| `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action`
| 3.09s |
| `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s |
| `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s |
| `tasks::review::{impl#4}::run` | 2.82s |
| `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s |
| `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` |
2.73s |
| `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s |
| `tasks::compact::{impl#4}::run` | 2.45s |

## What changed

Relevant pre-change registry shape:
[`codex-rs/core/src/tools/registry.rs`](0bd31dc382/codex-rs/core/src/tools/registry.rs (L38-L219))

Current registry shape in this PR:
[`codex-rs/core/src/tools/registry.rs`](41f7ac0ade/codex-rs/core/src/tools/registry.rs (L38-L203))

- `ToolHandler::{is_mutating, handle}` now return native `impl Future +
Send` futures instead of using `#[async_trait]`.
- `AnyToolHandler` remains the object-safe adapter and boxes those
futures at the registry boundary with explicit lifetimes.
- Concrete handlers and the registry test handler drop `#[async_trait]`
but otherwise keep their async method bodies intact.
- Representative examples:
[`codex-rs/core/src/tools/handlers/shell.rs`](41f7ac0ade/codex-rs/core/src/tools/handlers/shell.rs (L223-L379)),
[`codex-rs/core/src/tools/handlers/unified_exec.rs`](41f7ac0ade/codex-rs/core/src/tools/handlers/unified_exec.rs),
[`codex-rs/core/src/tools/registry_tests.rs`](41f7ac0ade/codex-rs/core/src/tools/registry_tests.rs)

## Tradeoff

This is intentionally less invasive than #16627: it does **not** move
result boxing into every concrete handler and does **not** change
`ToolHandler` into an object-safe trait.

Instead, it keeps the existing registry-level type-erasure boundary and
only removes the macro-generated async wrapper layer from concrete
impls. So the runtime boxing story stays basically the same as before,
while the compile-time savings are still large.

## Verification

Existing verification for this branch still applies:

- Ran `cargo test -p codex-core`; this change compiled and the suite
reached the known unrelated `config::tests::*guardian*` failures, with
no local diff under `codex-rs/core/src/config/`.

Profiling commands used for the tables above:

- `cargo clean -p codex-core`
- `cargo +nightly build -p codex-core --lib -Z unstable-options
--timings=json`
- `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z
time-passes-format=json`
- `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z
self-profile-events=default,query-keys,args,llvm,artifact-sizes`
- `measureme summarize -p 0.5`
2026-04-02 16:03:52 -07:00
Michael Bolin
9f71d57a65 Extract code-mode nested tool collection into codex-tools (#16509)
## Why
This is another small step in the `codex-core` -> `codex-tools`
migration described in `AGENTS.md`.

`core/src/tools/spec.rs` and `core/src/tools/code_mode/mod.rs` were both
hand-rolling the same pure transformation: convert visible `ToolSpec`s
into code-mode nested tool definitions, then sort and deduplicate by
tool name. That logic does not depend on core runtime state or handlers,
so keeping it in `codex-core` makes `spec.rs` harder to peel out later
than it needs to be.

## What Changed
- Add `collect_code_mode_tool_definitions()` to
`codex-rs/tools/src/code_mode.rs`.
- Reuse that helper from `codex-rs/core/src/tools/spec.rs` when
assembling the `exec` tool description.
- Reuse the same helper from `codex-rs/core/src/tools/code_mode/mod.rs`
when exposing nested tool metadata to the code-mode runtime.

This is intended to be a straight refactor with no behavior change and
no new test surface.

## Verification
- `cargo test -p codex-tools`
- `cargo test -p codex-core tools::spec::tests`
- `cargo test -p codex-core code_mode_only_`
2026-04-01 22:17:55 -07:00
Michael Bolin
d4464125c5 Remove client_common tool re-exports (#16482)
## Why

`codex-rs/core/src/client_common.rs` still had a `tools` re-export
module that forwarded `codex_tools` types back into `codex-core`. After
the earlier extraction work in #16379, #16471, #16477, and #16481, that
extra layer no longer adds value.

Removing it keeps dependencies explicit: the `codex-core` modules that
actually use `ToolSpec` and related types now depend on `codex_tools`
directly instead of reaching through `client_common`.

## What Changed

- removed the `client_common::tools` re-export module from
`core/src/client_common.rs`
- updated the remaining `codex-core` consumers to import `codex_tools`
directly
- adjusted the affected test code to reference
`codex_tools::ResponsesApiTool` directly as well

This is a mechanical cleanup only. It does not change tool behavior or
runtime logic.

## Testing

- `cargo test -p codex-core client_common::tests`
- `cargo test -p codex-core tools::router::tests`
- `cargo test -p codex-core tools::context::tests`
- `cargo test -p codex-core tools::spec::tests`
2026-04-01 19:15:15 -07:00
Michael Bolin
2238c16a91 codex-tools: extract code mode tool spec adapters (#16132)
## Why

The longer-term `codex-tools` migration is to move pure tool-definition
and tool-spec plumbing out of `codex-core` while leaving session- and
runtime-coupled orchestration behind.

The remaining code-mode adapter layer in
`core/src/tools/code_mode_description.rs` was a good next extraction
seam because it only transformed `ToolSpec` values for code mode and
already delegated the low-level description rendering to
`codex-code-mode`.

## What Changed

- added `codex-rs/tools/src/code_mode.rs` with
`augment_tool_spec_for_code_mode()` and
`tool_spec_to_code_mode_tool_definition()`
- added focused unit coverage in `codex-rs/tools/src/code_mode_tests.rs`
- rewired `core/src/tools/spec.rs` and `core/src/tools/code_mode/mod.rs`
to use the extracted adapters from `codex-tools`
- removed the old `core/src/tools/code_mode_description.rs` shim and its
test file from `codex-core`
- added the `codex-code-mode` dependency to `codex-tools`, updated
`Cargo.lock`, and refreshed the `codex-tools` README to reflect the
expanded boundary

## Test Plan

- `cargo test -p codex-tools`
- `CARGO_TARGET_DIR=/tmp/codex-core-code-mode-adapters cargo test -p
codex-core --lib tools::spec::`
- `CARGO_TARGET_DIR=/tmp/codex-core-code-mode-adapters cargo test -p
codex-core --lib tools::code_mode::`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just argument-comment-lint`

## References

- #15923
- #15928
- #15944
- #15953
- #16031
- #16047
- #16129
2026-03-28 15:32:35 -07:00
Ahmed Ibrahim
062fa7a2bb Move string truncation helpers into codex-utils-string (#15572)
- move the shared byte-based middle truncation logic from `core` into
`codex-utils-string`
- keep token-specific truncation in `codex-core` so rollout can reuse
the shared helper in the next stacked PR

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-24 15:45:40 -07:00
Channing Conger
e4eedd6170 Code mode on v8 (#15276)
Moves Code Mode to a new crate with no dependencies on codex. This
create encodes the code mode semantics that we want for lifetime,
mounting, tool calling.

The model-facing surface is mostly unchanged. `exec` still runs raw
JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
are still available through `tools.*`, and helpers like `text`, `image`,
`store`, `load`, `notify`, `yield_control`, and `exit` still exist.

The major change is underneath that surface:

- Old code mode was an external Node runtime.
- New code mode is an in-process V8 runtime embedded directly in Rust.
- Old code mode managed cells inside a long-lived Node runner process.
- New code mode manages cells in Rust, with one V8 runtime thread per
active `exec`.
- Old code mode used JSON protocol messages over child stdin/stdout plus
Node worker-thread messages.
- New code mode uses Rust channels and direct V8 callbacks/events.

This PR also fixes the two migration regressions that fell out of that
substrate change:

- `wait { terminate: true }` now waits for the V8 runtime to actually
stop before reporting termination.
- synchronous top-level `exit()` now succeeds again instead of surfacing
as a script error.

---

- `core/src/tools/code_mode/*` is now mostly an adapter layer for the
public `exec` / `wait` tools.
- `code-mode/src/service.rs` owns cell sessions and async control flow
in Rust.
- `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
JavaScript execution.
- each `exec` spawns a dedicated runtime thread plus a Rust
session-control task.
- helper globals are installed directly into the V8 context instead of
being injected through a source prelude.
- helper modules like `tools.js` and `@openai/code_mode` are synthesized
through V8 module resolution callbacks in Rust.

---

Also added a benchmark for showing the speed of init and use of a code
mode env:
```
$ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
     Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
scenario       tools samples    warmups      iters      mean/exec       p95/exec       rssΔ p50       rssΔ max
cold_exec          0      30          0          1         1.13ms         1.20ms        8.05MiB        8.06MiB
warm_exec          0      30          1         25       473.43us       512.49us      912.00KiB        1.33MiB
cold_exec         32      30          0          1         1.03ms         1.15ms        8.08MiB        8.11MiB
warm_exec         32      30          1         25       509.73us       545.76us      960.00KiB        1.30MiB
cold_exec        128      30          0          1         1.14ms         1.19ms        8.30MiB        8.34MiB
warm_exec        128      30          1         25       575.08us       591.03us      736.00KiB      864.00KiB
memory uses a fresh-process max RSS delta for each scenario
```

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-20 23:36:58 -07:00
Ahmed Ibrahim
2e22885e79 Split features into codex-features crate (#15253)
- Split the feature system into a new `codex-features` crate.
- Cut `codex-core` and workspace consumers over to the new config and
warning APIs.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-19 20:12:07 -07:00
pakrym-oai
5cada46ddf Return image URL from view_image tool (#15072)
Cleanup image semantics in code mode.

`view_image` now returns `{image_url:string, details?: string}` 

`image()` now allows both string parameter and `{image_url:string,
details?: string}`
2026-03-18 13:58:20 -07:00
pakrym-oai
88e5382fc4 Propagate tool errors to code mode (#15075)
Clean up error flow to push the FunctionCallError all the way up to
dispatcher and allow code mode to surface as exception.
2026-03-18 13:57:55 -07:00
pakrym-oai
606d85055f Add notify to code-mode (#14842)
Allows model to send an out-of-band notification.

The notification is injected as another tool call output for the same
call_id.
2026-03-18 09:37:13 -07:00
pakrym-oai
ee756eb80f Rename exec_wait tool to wait (#14983)
Summary
- document that code mode only exposes `exec` and the renamed `wait`
tool
- update code mode tool spec and descriptions to match the new tool name
- rename tests and helper references from `exec_wait` to `wait`

Testing
- Not run (not requested)
2026-03-17 14:22:26 -07:00
Ahmed Ibrahim
0d531c05f2 Fix code mode yield startup race (#14959) 2026-03-17 11:09:12 -07:00
Michael Bolin
b77fe8fefe Apply argument comment lint across codex-rs (#14652)
## Why

Once the repo-local lint exists, `codex-rs` needs to follow the
checked-in convention and CI needs to keep it from drifting. This commit
applies the fallback `/*param*/` style consistently across existing
positional literal call sites without changing those APIs.

The longer-term preference is still to avoid APIs that require comments
by choosing clearer parameter types and call shapes. This PR is
intentionally the mechanical follow-through for the places where the
existing signatures stay in place.

After rebasing onto newer `main`, the rollout also had to cover newly
introduced `tui_app_server` call sites. That made it clear the first cut
of the CI job was too expensive for the common path: it was spending
almost as much time installing `cargo-dylint` and re-testing the lint
crate as a representative test job spends running product tests. The CI
update keeps the full workspace enforcement but trims that extra
overhead from ordinary `codex-rs` PRs.

## What changed

- keep a dedicated `argument_comment_lint` job in `rust-ci`
- mechanically annotate remaining opaque positional literals across
`codex-rs` with exact `/*param*/` comments, including the rebased
`tui_app_server` call sites that now fall under the lint
- keep the checked-in style aligned with the lint policy by using
`/*param*/` and leaving string and char literals uncommented
- cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
registry/git metadata in the lint job
- split changed-path detection so the lint crate's own `cargo test` step
runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
- continue to run the repo wrapper over the `codex-rs` workspace, so
product-code enforcement is unchanged

Most of the code changes in this commit are intentionally mechanical
comment rewrites or insertions driven by the lint itself.

## Verification

- `./tools/argument-comment-lint/run.sh --workspace`
- `cargo test -p codex-tui-app-server -p codex-tui`
- parsed `.github/workflows/rust-ci.yml` locally with PyYAML

---

* -> #14652
* #14651
2026-03-16 16:48:15 -07:00
pakrym-oai
a3ba10b44b Add exit helper to code mode scripts (#14851)
- **Summary**
- expose `exit` through the code mode bridge and module so scripts can
stop mid-flight
  - surface the helper in the description documentation
  - add a regression test ensuring `exit()` terminates execution cleanly
- **Testing**
  - Not run (not requested)
2026-03-16 22:07:58 +00:00
pakrym-oai
477a2dd345 Add code_mode_only feature (#14617)
Summary
- add the code_mode_only feature flag/config schema and wire its
dependency on code_mode
- update code mode tool descriptions to list nested tools with detailed
headers
- restrict available tools for prompt and exec descriptions when
code_mode_only is enabled and test the behavior

Testing
- Not run (not requested)
2026-03-13 13:30:19 -07:00
Channing Conger
0daffe667a code_mode: Move exec params from runtime declarations to @pragma (#14511)
This change moves code_mode exec session settings out of the runtime API
and into an optional first-line pragma, so instead of calling runtime
helpers like set_yield_time() or set_max_output_tokens_per_exec_call(),
the model can write // @exec: {"yield_time_ms": ...,
"max_output_tokens": ...} at the top of the freeform exec source. Rust
now parses that pragma before building the source, validates it, and
passes the values directly in the exec start message to the code-mode
broker, which applies them at session start without any worker-runtime
mutation path. The @openai/code_mode module no longer exposes those
setter functions, the docs and grammar were updated to describe the
pragma form, and the existing code_mode tests were converted to use
pragma-based configuration instead.
2026-03-13 03:27:42 +00:00
pakrym-oai
a2546d5dff Expose code-mode tools through globals (#14517)
Summary
- make all code-mode tools accessible as globals so callers only need
`tools.<name>`
- rename text/image helpers and key globals (store, load, ALL_TOOLS,
etc.) to reflect the new shared namespace
- update the JS bridge, runners, descriptions, router, and tests to
follow the new API

Testing
- Not run (not requested)
2026-03-12 15:43:59 -07:00
pakrym-oai
04e14bdf23 Rename exec session IDs to cell IDs (#14510)
- Update the code-mode executor, wait handler, and protocol plumbing to
use cell IDs instead of session IDs for node communication
- Switch tool metadata, wait description, and suite tests to refer to
cell IDs so user-visible messages match the new terminology

**Testing**
- Not run (not requested)
2026-03-12 14:05:30 -07:00
pakrym-oai
dadffd27d4 Fix MCP tool calling (#14491)
Properly escape mcp tool names and make tools only available via
imports.
2026-03-12 13:38:52 -07:00
pakrym-oai
09ba6b47ae Reuse tool runtime for code mode worker (#14496)
## Summary
- create the turn-scoped `ToolCallRuntime` before starting the code mode
worker so the worker reuses the same runtime and router
- thread the shared runtime through the code mode service/worker path
and use it for nested tool calls
- model aborted tool calls as a concrete `ToolOutput` so aborted
responses still produce valid tool output shapes

## Testing
- `just fmt`
- `cargo test -p codex-core` (still running locally)
2026-03-12 12:48:32 -07:00
pakrym-oai
d1b03f0d7f Add default code-mode yield timeout (#14484)
Summary
- expose the default yield timeout through code mode runtime so the
handler, wait tool, and protocol share the same 10s value that matches
unified exec
- document the timeout change in the tool descriptions and propagate the
value all the way into the runner metadata
- adjust Cargo.lock to keep the dependency tree in sync with the added
code mode tool dependency

Testing
- Not run (not requested)
2026-03-12 12:06:23 -07:00
pakrym-oai
cfe3f6821a Cleanup code_mode tool descriptions (#14480)
Move to separate files and clarify a bit.
2026-03-12 11:13:35 -07:00
pakrym-oai
c0528b9bd9 Move code mode tool files under tools/code_mode and split functionality (#14476)
- **Summary**
- migrate the code mode handler, service, worker, process, runner, and
bridge assets into the `tools/code_mode` module tree
- split Execution, protocol, and handler logic into dedicated files and
relocate the tool definition into `code_mode/spec.rs`
- update core references and tests to stitch the new organization
together
- **Testing**
  - Not run (not requested)
2026-03-12 09:54:11 -07:00