mirror of
https://github.com/openai/codex.git
synced 2026-05-03 02:46:39 +00:00
## Why `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in #16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. ## Headline result On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from **187.15s to 68.98s** versus the shared `0bd31dc382bd` baseline: **-63.1%**. The biggest hot passes dropped by roughly **71-72%**: | Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta | |---|---:|---:|---:| | `total` | 187.15s | 68.98s | **-63.1%** | | `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** | | `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%** | | `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** | Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. ## Why this PR over #16627 #16627 already proved that the `ToolHandler` stack was the right hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only **33 files, +28/-77**. The measurements are at least comparable, and in this run this PR is slightly faster than #16627 on the pass-level total: | Metric | #16627 | This PR | Delta | |---|---:|---:|---:| | `total` | 79.90s | 68.98s | **-13.7%** | | `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** | | `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%** | | `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% | ## Profile data ### Crate-level timings `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and #16627 are both one commit on top of that same parent. | Crate | Baseline `duration` | This PR `duration` | Delta | Baseline `rmeta_time` | This PR `rmeta_time` | Delta | |---|---:|---:|---:|---:|---:|---:| | `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** | 174.474507208s | 55.873015583s | **-68.0%** | | `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a | ### Pass-level timings `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. | Pass | Baseline | This PR | Delta | |---|---:|---:|---:| | `total` | 187.150662083s | 68.978770375s | **-63.1%** | | `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%** | | `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** | | `monomorphization_collector_graph_walk` | 79.737515042s | 22.190207417s | **-72.2%** | | `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% | | `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% | | `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% | | process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s / 85.90s / 2.92s | n/a | ### Self-profile query summary `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. | Query / phase | Baseline self time | This PR self time | Delta | Baseline total time | This PR total time | Baseline item count | This PR item count | Baseline cache hits | This PR cache hits | |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s | 48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 | | `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184 | n/a | 15,298 | | `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a | 79,247 | | `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s | n/a | 256 | n/a | 0 | | `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a | 0 | | `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 | n/a | 0 | | `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a | 99,990 | n/a | 0 | | `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a | 78,762 | n/a | 235,877 | | Summary | Baseline | This PR | |---|---:|---:| | `evaluate_obligation` % of total CPU | 70.821% | 38.880% | | self-profile total CPU time | 255.042999997s | 120.661175956s | | process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s / 103.66s / 3.54s | ### Artifact sizes From the same `measureme summarize` output: | Artifact | Baseline | This PR | Delta | |---|---:|---:|---:| | `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 | | `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 | | `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448 | | `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 | | `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 | | `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 | | `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes | -6,073 | | `work_product_index` | 19,565 bytes | 19,565 bytes | 0 | ### Baseline hotspots before this change These are the top normalized obligation buckets from the shared baseline profile: | Obligation bucket | Samples | Duration | |---|---:|---:| | `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s | | `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 | 5.63s | | `trait:T as tools::registry::ToolHandler` | 876 | 5.45s | | `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s | | `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s | | `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` | 637 | 3.73s | | `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s | | `outlives:tasks::regular::RegularTask` | 726 | 3.57s | Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: | Instance | Duration | |---|---:| | `tasks::regular::{impl#2}::run` | 3.79s | | `tools::handlers::mcp::{impl#0}::handle` | 3.27s | | `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` | 3.09s | | `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s | | `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s | | `tasks::review::{impl#4}::run` | 2.82s | | `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s | | `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` | 2.73s | | `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s | | `tasks::compact::{impl#4}::run` | 2.45s | ## What changed Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](0bd31dc382/codex-rs/core/src/tools/registry.rs (L38-L219)) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](41f7ac0ade/codex-rs/core/src/tools/registry.rs (L38-L203)) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](41f7ac0ade/codex-rs/core/src/tools/handlers/shell.rs (L223-L379)), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](41f7ac0ade/codex-rs/core/src/tools/handlers/unified_exec.rs), [`codex-rs/core/src/tools/registry_tests.rs`](41f7ac0ade/codex-rs/core/src/tools/registry_tests.rs) ## Tradeoff This is intentionally less invasive than #16627: it does **not** move result boxing into every concrete handler and does **not** change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. ## Verification Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::*guardian*` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`
48 lines
1.6 KiB
Rust
48 lines
1.6 KiB
Rust
use super::*;
|
|
use pretty_assertions::assert_eq;
|
|
|
|
struct TestHandler;
|
|
|
|
impl ToolHandler for TestHandler {
|
|
type Output = crate::tools::context::FunctionToolOutput;
|
|
|
|
fn kind(&self) -> ToolKind {
|
|
ToolKind::Function
|
|
}
|
|
|
|
async fn handle(&self, _invocation: ToolInvocation) -> Result<Self::Output, FunctionCallError> {
|
|
unreachable!("test handler should not be invoked")
|
|
}
|
|
}
|
|
|
|
#[test]
|
|
fn handler_looks_up_namespaced_aliases_explicitly() {
|
|
let plain_handler = Arc::new(TestHandler) as Arc<dyn AnyToolHandler>;
|
|
let namespaced_handler = Arc::new(TestHandler) as Arc<dyn AnyToolHandler>;
|
|
let namespace = "mcp__codex_apps__gmail";
|
|
let tool_name = "gmail_get_recent_emails";
|
|
let namespaced_name = tool_handler_key(tool_name, Some(namespace));
|
|
let registry = ToolRegistry::new(HashMap::from([
|
|
(tool_name.to_string(), Arc::clone(&plain_handler)),
|
|
(namespaced_name, Arc::clone(&namespaced_handler)),
|
|
]));
|
|
|
|
let plain = registry.handler(tool_name, /*namespace*/ None);
|
|
let namespaced = registry.handler(tool_name, Some(namespace));
|
|
let missing_namespaced = registry.handler(tool_name, Some("mcp__codex_apps__calendar"));
|
|
|
|
assert_eq!(plain.is_some(), true);
|
|
assert_eq!(namespaced.is_some(), true);
|
|
assert_eq!(missing_namespaced.is_none(), true);
|
|
assert!(
|
|
plain
|
|
.as_ref()
|
|
.is_some_and(|handler| Arc::ptr_eq(handler, &plain_handler))
|
|
);
|
|
assert!(
|
|
namespaced
|
|
.as_ref()
|
|
.is_some_and(|handler| Arc::ptr_eq(handler, &namespaced_handler))
|
|
);
|
|
}
|