- Adds localVersion to plugin summaries and remoteVersion to share
context, including generated API schemas.
- Hydrates local and remote plugin versions from manifests and remote
release metadata.
- Adds default-on plugin_sharing gate for shared-with-me listing and
plugin/share/save, with disabled-path errors
and focused coverage.
## Summary
Plugin Creator now documents the shorter local-plugin handoff URL that
the app can interpret directly.
[#22221](https://github.com/openai/codex/pull/22221) teaches the skill
to end marketplace-backed creation flows with named View and Share
links; this follow-up updates those examples so the skill only emits the
normalized plugin name, the absolute marketplace path, and optional
share mode.
The documented shape is:
```txt
codex://plugins/<normalized-plugin-name>?marketplacePath=<absolute-marketplace-json-path>
codex://plugins/<normalized-plugin-name>?marketplacePath=<absolute-marketplace-json-path>&mode=share
```
The skill text now states exactly where the normalized plugin name
belongs, exactly where the absolute marketplace path belongs, and that
it should not add `pluginName` or `hostId` query parameters.
## Testing
Tests: plugin-creator skill validation.
## Why
`remote_control` can appear in `config.toml`, CLI feature overrides, and
the app-server config APIs. Before this PR, app-server startup treated
`config.features.enabled(Feature::RemoteControl)` as the signal to start
remote control ([base
code](5e3ee5eddf/codex-rs/app-server/src/lib.rs (L678-L680))).
That meant a user with:
```toml
[features]
remote_control = true
```
would accidentally opt every app-server process into remote control.
Remote-control startup should instead be a per-process launch decision
made by CLI flags.
## What Changed
- Marks `Feature::RemoteControl` as `Stage::Removed`, keeping
`remote_control` as a known compatibility key while making it
config-inert.
- Adds a hidden `--remote-control` process flag to `codex app-server`
and standalone `codex-app-server`.
- Plumbs that flag through
`AppServerRuntimeOptions.remote_control_enabled` and makes app-server
startup use only that runtime option to decide whether to start remote
control.
- Removes the app-server config mutation hook that reloaded config and
toggled remote control at runtime.
- Updates managed daemon spawning to use `codex app-server
--remote-control --listen unix://` instead of `--enable remote_control`.
Config APIs can still list, read, write, and set `remote_control`; those
operations just no longer affect remote-control process enrollment.
## Why
`tool_search` still carries the server-specific result-cap path added in
#17684 for `computer-use`: when the model omitted `limit`, a matching
result expanded the search to 20 and then `limit_results_by_bucket`
applied per-bucket caps. That makes default result handling depend on a
one-off server exception instead of the single
`TOOL_SEARCH_DEFAULT_LIMIT` path.
This PR removes that custom branch so omitted `limit` values use the
ordinary global default consistently. The implementation being retired
is the pre-change bucketed search path in
[`tool_search.rs`](5e3ee5eddf/codex-rs/core/src/tools/handlers/tool_search.rs (L121-L190)).
## What changed
- Collapse `ToolSearchHandler::search` back to one BM25 search with the
resolved limit.
- Remove `limit_results_by_bucket`, the `computer-use` constants, and
the omitted-limit plumbing that only existed for the override.
- Drop dead `ToolSearchEntry::limit_bucket` metadata from deferred MCP
and dynamic search entries.
- Remove tests and helpers that only asserted the deleted override
behavior.
- Add direct handler-level unit coverage for omitted/default and
explicit `tool_search` result limits.
## Validation
- `cargo test -p codex-core tool_search`
- The matching unit tests passed, including the new omitted/default and
explicit result-limit coverage.
- The broader `--test all` search-tool fixture phase then failed before
sending mocked response requests in
`tool_search_indexes_only_enabled_non_app_mcp_tools` and
`tool_search_uses_non_app_mcp_server_instructions_as_namespace_description`.
- `cargo test -p codex-core`
- The touched tool-search coverage passed before the run later aborted
in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
with a stack overflow.
## Why
`chatwidget.rs` is still carrying too many unrelated responsibilities in
one file. After #21866 consolidated some of the state it tracks, this
starts the next phase by moving coherent state/helper clusters out of
the main module without changing behavior.
This PR is intentionally mechanical: it only moves existing functions,
structs, and helpers into focused modules so the boundaries are easier
to review before the less mechanical refactors that should follow.
## What Changed
- Moved user-message, composer, queue, pending steer, and merge/remap
helpers into `codex-rs/tui/src/chatwidget/user_messages.rs`.
- Added `codex-rs/tui/src/chatwidget/exec_state.rs` for unified exec
bookkeeping helpers.
- Added `codex-rs/tui/src/chatwidget/rate_limits.rs` for rate-limit
warning, prompt, and error classification state.
- Moved plugin list fetch and install auth-flow state into
`codex-rs/tui/src/chatwidget/plugins.rs`.
- Made a couple of test-only `VecDeque` imports explicit now that those
tests no longer inherit the parent module import.
## Verification
- `cargo test -p codex-tui` was run
## Follow-On Refactor Phases
This PR is phase 1: mechanical helper and state moves. Planned follow-up
PRs:
- Phase 2: extract input and submission flow, including queued user
messages, shell prompt submission, pending steer restoration, and thread
input snapshot/restore behavior.
- Phase 3: extract protocol, replay, streaming, and tool lifecycle
handling, while preserving active-cell grouping, transcript
invalidation, interrupt deferral, and final-message separator behavior.
- Phase 4: extract settings, popups, and status surfaces, including
model/reasoning/collaboration/personality popups, permission prompts,
rate-limit UI, and connectors helpers.
- Phase 5: clean up the remaining constructor and orchestration code
once the larger behavior domains have moved out, leaving `chatwidget.rs`
as the composition layer.
- make ThreadStore::update_thread_metadata accept a broad range of
metadata patches
- keep ThreadStore::append_items as raw canonical history append (no
metadata side effects)
- in the local store, write these metadata updates to a combination of
sqlite and rollout jsonl files for backwards-compat. It special cases
which fields need to go into jsonl vs sqlite vs whatever, confining the
awkwardness to just this implementation
- in remote stores we can simply persist the metadata directly to a
database, no special casing required.
- move the "implicit metadata updates triggered by appending rollout
items" from the RolloutRecorder (which is local-threadstore-specific) to
the LiveThread layer above the ThreadStore, inside of a private helper
utility called ThreadMetadataSync. LiveThread calls ThreadStore
append_items and update_metadata separately.
- Add a generic update metadata method to ThreadManager that works on
both live threads and "cold" threads
- Call that ThreadManager method from app server code, so app server
doesn't need to worry about whether the thread is live or not
## Why
`tool_search` already had solid end-to-end coverage for discovery and
follow-up execution, but it did not prove that distinct pieces of
indexed search text actually work in integration. In particular, we were
not exercising whether unique tool names, descriptions, namespaces,
underscore-expanded dynamic names, and schema-property terms were
sufficient to surface the expected deferred tools.
This change adds focused integration coverage for those term sources so
regressions in search text construction are caught by a real `TestCodex`
flow instead of only by lower-level unit tests.
## What changed
- added a small helper in `core/tests/suite/search_tool.rs` to assert
that a `tool_search_output` contains an expected namespace child tool
- added an MCP integration test that issues several `tool_search_call`s
and verifies distinct query terms match the expected app tools:
- exact tool name: `calendar_timezone_option_99`
- tool description phrase: `uploaded document`
- top-level schema property: `starts_at`
- added a dynamic-tool integration test that verifies distinct query
terms match the expected deferred dynamic tool:
- exact name: `quasar_ping_beacon`
- underscore-expanded name: `quasar ping beacon`
- description phrase: `saffron metronome`
- namespace: `orbit_ops`
- schema property: `chrono_spec`
## Validation
- `cargo test -p codex-core tool_search_matches_`
## Docs
No documentation update needed.
## Why
This is the base PR in the split stack for the permissions migration. It
isolates stack-safety work that had been mixed into the larger
permissions PR, so reviewers can evaluate the async-future changes
separately from the permissions model changes in #22267.
The main risk this addresses is large or recursive multi-agent futures
overflowing smaller runner stacks. A follow-up review also called out
that `shutdown_live_agent` must remain quiescent: callers should not
remove a live agent from tracking or release its spawn slot until the
worker loop has actually terminated.
## What Changed
- Boxes the large async futures in the multi-agent spawn, resume, and
close tool handlers.
- Boxes the `AgentControl` spawn and recursive close/shutdown paths that
can otherwise build very deep futures.
- Keeps `shutdown_live_agent` waiting for thread termination before
removing/releasing the live agent, preserving the previous shutdown
ordering while still boxing the recursive close path.
## Verification Strategy
The focused local coverage was `cargo test -p codex-core multi_agents`,
which exercises the multi-agent spawn/resume/close handlers, cascade
close/resume behavior, and the shutdown path touched by this PR.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22266).
* #22330
* #22329
* #22328
* #22327
* __->__ #22266
Introduce execute_to_pending and wait_to_pending APIs that freeze
pending-mode runtimes until an explicit resume, while preserving the
existing continuously-running execute path. Add runtime and service
coverage for pending, resume, completion, and freeze behavior.
## Summary
This refactor makes tool handlers the owner of the specs they can
publish, so registry construction can register handlers once and
separately publish only the specs that should be model-visible.
The main motivation is deferred tools: MCP and dynamic tools still need
handlers registered up front, but deferred tools should be discoverable
through `tool_search` rather than emitted in the initial tool spec list.
## What changed
- `McpHandler` and `DynamicToolHandler` can return their own `ToolSpec`.
- `build_tool_registry_builder` now collects handlers, registers them
through the no-spec path, and publishes only non-deferred handler specs.
- Deferred MCP and dynamic tool names are combined into one
`all_deferred_tools` set that drives spec filtering, code-mode
deferred-tool signaling, and `tool_search` registration.
- `tool_search` registration now requires both deferred tools and
`namespace_tools`.
- Namespace specs are merged in `spec_plan`, preserving top-level spec
order, sorting tools within each namespace, and backfilling empty
namespace descriptions.
- Hosted web search and image-generation specs are included in the
collected spec vector before namespace merge/publication, and tool-name
tests that should not care about hosted relative order now compare sets.
## Testing
- `cargo test -p codex-core tools::spec::tests:: -- --nocapture`
- `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture`
- `cargo test -p codex-core
tools::router::tests::specs_filter_deferred_dynamic_tools --
--nocapture`
- `cargo test -p codex-core
suite::prompt_caching::prompt_tools_are_consistent_across_requests --
--nocapture`
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-core -- --skip
tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
passed the library suite after skipping the known stack-overflowing unit
test.
Full `cargo test -p codex-core` currently hits a stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`;
the same focused test reproduces on `origin/main`.
## Summary
- make workspace owner nudge handling unconditional in the TUI now that
it is fully rolled out
- keep `workspace_owner_usage_nudge` as a removed no-op compatibility
flag so old configs/app overrides remain accepted during rollout
- remove flag-disabled test setup
## Companion PR
- https://github.com/openai/openai/pull/876351 removes the Codex Apps
Statsig rollout gate override after this change is available to the
app/runtime path
## Validation
- `just write-config-schema`
- `just fmt`
- `cargo test -p codex-features`
- `cargo test -p codex-tui status_and_layout`
## Why
Remote exec-server now needs one executor websocket to serve multiple
harness JSON-RPC sessions. Rendezvous routes by `stream_id`, and the
exec-server side needs to use the same stable relay frame contract
instead of a hand-rolled JSON shape.
The relay protocol also needs to make ownership boundaries clear:
harness and executor endpoints own sequencing, acks, retries, duplicate
suppression, segmentation, and reassembly; rendezvous only routes
frames.
## What Changed
- Add the checked-in `codex.exec_server.relay.v1.RelayMessageFrame`
proto plus generated prost bindings for `codex-exec-server`.
- Encode remote harness/executor relay traffic as binary protobuf
websocket frames while keeping local websocket JSON-RPC unchanged.
- Demux executor-side relay streams into independent
`ConnectionProcessor` sessions keyed by `stream_id`.
- Add a programmatic `RemoteExecutorConfig::with_bearer_token(...)`
constructor for non-CLI callers and integration tests.
- Add an integration test that starts the remote executor against a fake
registry/rendezvous websocket and verifies two virtual streams share one
executor websocket without cross-talk, including per-stream reset
behavior.
- Document the remote relay envelope, sequence ranges, `ack`/`ack_bits`,
and endpoint responsibilities in `exec-server/README.md`.
## Verification
- `cargo test -p codex-exec-server --test relay
multiplexed_remote_executor_routes_independent_virtual_streams --
--exact`
- `cargo test -p codex-exec-server --test relay`
- `cargo test -p codex-exec-server` passed outside the sandbox. The
sandboxed run hit macOS `sandbox-exec: sandbox_apply: Operation not
permitted` in filesystem sandbox tests.
## Why
Windows CI has been timing out in
`configured_pet_load_is_deferred_until_after_construction` while waiting
for the deferred configured-pet load event.
The test still needs to prove construction returns before the pet image
is available, but the background load slices the built-in pet
spritesheet into frame cache files. That work can exceed the old 2
second deadline on slower or more contended CI machines.
## What Changed
- Increased the test wait for `ConfiguredPetLoaded` from 2 seconds to 30
seconds.
- Kept the post-construction assertion intact so the test still verifies
that the pet is not loaded synchronously during `ChatWidget`
construction.
## How to Test
Targeted tests:
- `cargo test -p codex-tui
configured_pet_load_is_deferred_until_after_construction`
- `just argument-comment-lint`
Additional check:
- `cargo test -p codex-tui` was run, but the broader crate suite did not
complete successfully due to unrelated existing failures:
-
`status::tests::status_permissions_full_disk_managed_without_network_is_external_sandbox`
-
`status::tests::status_permissions_full_disk_managed_with_network_is_danger_full_access`
- later abort in
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all` from
stack overflow
## Why
Code mode only used nested spec lookup at execution time to rediscover
whether a nested tool should be invoked as a function tool or a freeform
tool.
That information is already present in the enabled tool metadata that
code mode builds to expose `tools.*` and `ALL_TOOLS`, so re-looking it
up from the router was redundant and kept execution coupled to a
separate spec lookup path.
## What Changed
- thread `CodeModeToolKind` through the code-mode runtime `ToolCall`
event and `CodeModeNestedToolCall`
- emit the nested tool kind directly from the V8 callback using the
already-enabled tool metadata
- build nested tool payloads from the propagated kind instead of calling
`find_spec`
- remove the now-unused `find_spec` plumbing from the router and
parallel runtime helpers
- add unit coverage for function vs freeform payload shaping and update
affected router tests
## Testing
- `cargo test -p codex-code-mode`
- `cargo test -p codex-core code_mode::tests`
- `cargo test -p codex-core
extension_tool_bundles_are_model_visible_and_dispatchable`
- `cargo test -p codex-core
model_visible_specs_filter_deferred_dynamic_tools`
## Summary
Adds include_collaboration_mode_instructions, which is a config
equivalent to include_permissions_instructions for collaboration modes.
Desired for situations where we want to disable this instruction from
entering the context
## Testing
- [x] Added unit test
## Why
Tool dispatch had two serialization mechanisms:
- `supports_parallel_tool_calls` decides whether a tool participates in
the shared parallel-execution lock.
- `is_mutating` separately gated some calls inside dispatch.
That second hook no longer carried its weight. The remaining
parallel-support flag is already the per-tool concurrency policy, so
keeping a second mutating gate made dispatch harder to follow and left
behind extra session plumbing that only existed for that path.
## What changed
- Removed `is_mutating` from tool handlers and deleted the
`tool_call_gate` path that existed only to support it.
- Simplified dispatch and routing to rely on the existing per-tool
`supports_parallel_tool_calls` boolean.
- Dropped the now-unused handler overrides and related session/test
scaffolding.
- Kept the router/parallel tests focused on the surviving per-tool
behavior.
- Removed the unused `codex-utils-readiness` dependency from
`codex-core` as a follow-up fix for `cargo shear`.
## Testing
- `cargo test -p codex-core
parallel_support_does_not_match_namespaced_local_tool_names`
- `cargo test -p codex-core mcp_parallel_support_uses_handler_data`
- `cargo test -p codex-core
tools_without_handlers_do_not_support_parallel`
## Summary
- tighten unified exec sandbox initialization
- preserve the requested process workdir independently from sandbox
setup
- add regression coverage for the updated invariant
## Validation
- Ran `/tmp/cargo-tools/bin/just fmt`.
- Ran the targeted `codex-core` regression test successfully.
- Ran `cargo test -p codex-core`; it did not complete cleanly because
unrelated existing agent/config-loader tests failed and the run later
aborted on a stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
Co-authored-by: Codex <noreply@openai.com>