## Why
The `dev/cc/ref-def` branch preserves richer JSON Schema detail for
connector tools, including `$defs` and nested shapes. That improves
fidelity, but it pushes the largest connector schemas well past the
intended tool-schema budget. This PR adds a best-effort compaction pass
for unusually large tool input schemas so the p99 and max tails stay
small while ordinary schemas are left alone.
## What Changed
- Added best-effort large-schema compaction in
`codex-rs/tools/src/json_schema.rs` after schema sanitization and
definition pruning.
- Compaction runs as a waterfall only while the compact JSON budget
proxy is exceeded:
1. Strip schema `description` metadata.
2. Drop root `$defs` / `definitions`.
3. Collapse deep nested complex schema objects to `{}`.
- Kept top-level argument names and immediate schema shape where
possible.
## Corpus Results
Scope: 2,025 schemas under `golden_schemas`, all parsed successfully.
Token count is `o200k_base` over compact JSON from
`parse_tool_input_schema`.
| Percentile | Before `origin/main` `4dbca61e20` | After branch
`dev/cc/ref-def` `f9bf071758` | After this PR |
|---|---:|---:|---:|
| p0 | 9 | 9 | 9 |
| p10 | 59 | 63 | 63 |
| p25 | 81 | 86 | 86 |
| p50 | 114 | 127 | 125 |
| p75 | 174 | 205 | 202 |
| p90 | 295 | 335 | 322 |
| p95 | 391 | 526 | 422 |
| p99 | 794 | 1,303 | 689 |
| max | 2,836 | 3,337 | 887 |
After this PR, `0 / 2,025` schemas are over 1k tokens.
### Compaction Savings
These are cumulative waterfall stages over the same corpus. Later passes
only run for schemas that are still over the compact JSON budget proxy.
| Stage | Total tokens | Step savings | Schemas changed by step |
|---|---:|---:|---:|
| No compaction | 391,862 | - | - |
| Strip schema `description` metadata | 350,961 | 40,901 | 66 |
| Drop root `$defs` / `definitions` | 340,683 | 10,278 | 13 |
| Collapse deep complex schemas to `{}` | 335,875 | 4,808 | 6 |
codex-tools
codex-tools is the shared support crate for building, adapting, and executing
model-visible tools outside codex-core.
Today this crate owns the host-facing tool models and helpers that no longer
need to live in core/src/tools/spec.rs or core/src/client_common.rs:
- aggregate host models such as
ToolSpec,ConfiguredToolSpec,LoadableToolSpec,ResponsesApiNamespace, andResponsesApiNamespaceTool - host discovery models used while assembling tool sets, including discoverable-tool models and request-plugin-install helpers
- host adapters such as schema sanitization, MCP/dynamic conversion, code-mode augmentation, and image-detail normalization
- shared executable-tool contracts such as
ToolExecutor,ToolCall, andToolOutput
That extraction is the first step in a longer migration. The goal is not to
move all of core/src/tools into this crate in one shot. Instead, the plan is
to peel off reusable pieces in reviewable increments while keeping
compatibility-sensitive orchestration in codex-core until the surrounding
boundaries are ready.
Vision
Over time, this crate should hold host-side tool machinery that is shared by multiple consumers, for example:
- host-visible aggregate tool models
- tool-set planning and discovery helpers
- MCP and dynamic-tool adaptation into Responses API shapes
- code-mode compatibility shims that do not depend on
codex-core - other narrowly scoped host utilities that multiple crates need
The corresponding non-goals are just as important:
- do not move
codex-coreorchestration here prematurely - do not pull
Session/TurnContext/ approval flow / runtime execution logic into this crate unless those dependencies have first been split into stable shared interfaces - do not turn this crate into a grab-bag for unrelated helper code
Migration approach
The expected migration shape is:
- Keep extension-owned executable-tool authoring in
codex-extension-api. - Move host-side planning/adaptation helpers here when they no longer need to
stay coupled to
codex-core. - Leave compatibility-sensitive adapters in
codex-corewhile downstream call sites are updated. - Only extract higher-level host infrastructure after the crate boundaries are clear and independently testable.
Crate conventions
This crate should start with stricter structure than core/src/tools so it
stays easy to grow:
src/lib.rsshould remain exports-only.- Business logic should live in named module files such as
foo.rs. - Unit tests for
foo.rsshould live in a siblingfoo_tests.rs. - The implementation file should wire tests with:
#[cfg(test)]
#[path = "foo_tests.rs"]
mod tests;
If this crate starts accumulating code that needs runtime state from
codex-core, that is a sign to revisit the extraction boundary before adding
more here.