codex

mirror of https://github.com/openai/codex.git synced 2026-05-24 21:14:51 +00:00

Author	SHA1	Message	Date
anp-oai	c83ba22359	Allow parallel MCP tool calls when annotated readOnly (#23750 ) ## Summary - Treat MCP tools with `readOnlyHint: true` as parallel-safe even when `supports_parallel_tool_calls` is unset or `false`. - Keep server-level `supports_parallel_tool_calls` as an additive override for non-read-only tools. - Add focused unit coverage for the MCP handler eligibility decision. - Update RMCP integration coverage to keep the serial baseline on a mutable tool, verify read-only concurrency without server opt-in, and preserve the server opt-in concurrency path separately. ## Testing - `just fmt` - `cargo test -p codex-core --lib tools::handlers::mcp::tests::` - `cargo test -p codex-core --test all stdio_mcp_read_only_tool_calls_run_concurrently_without_server_opt_in` - `cargo test -p codex-core --test all stdio_mcp_parallel_tool_calls_opt_in_runs_concurrently` - `cargo test -p codex-rmcp-client`	2026-05-21 20:40:34 -07:00
sayan-oai	7e802b22f1	Expose conversation history to extension tools (#23963 ) ## Why Extension tools that need conversation context should be able to read it from the live tool invocation instead of reaching into thread persistence themselves. ## What changed - Add a `ConversationHistory` snapshot to extension `ToolCall`s and populate it from the current raw in-memory response history. - Expose all history items at this boundary so each extension can filter and bound the subset it needs before consuming or forwarding it. - Cover the adapter and registry dispatch paths and update existing extension tests that construct `ToolCall` literals. ## Test plan - `cargo test -p codex-tools` - `cargo test -p codex-extension-api` - `cargo test -p codex-goal-extension` - `cargo test -p codex-memories-extension` - `cargo test -p codex-core passes_turn_fields_to_extension_call` - `cargo test -p codex-core extension_tool_executors_are_model_visible_and_dispatchable`	2026-05-22 01:11:47 +00:00
jif-oai	e6c8371e4e	refactor: centralize tool exposure planning (#23876 ) ## Why Tool exposure is a planning concern, but the deferred MCP path and dispatch-only legacy shell path were carrying those decisions in handler constructors and a shell-only tool-family builder. Keeping those decisions in `spec_plan` makes the core tool plan easier to follow and keeps handlers focused on runtime behavior. ## What changed - add `PlannedTools` helpers for ordinary runtimes, exposure overrides, dispatch-only runtimes, and hosted specs - inline shell tool assembly into `core/src/tools/spec_plan.rs` and remove the shell-only `tool_family` module - remove exposure state and special exposure constructors from `McpHandler` and `ShellCommandHandler` - keep hidden runtime behavior centralized in `ExposureOverride`, including disabling parallel tool calls for hidden handlers ## Testing - Not run (refactor only)	2026-05-21 16:21:23 +02:00
jif-oai	516f134641	Make tool executor specs mandatory (#23870 ) ## Why `ToolExecutor` is the runtime contract that keeps a callable tool and its model-visible spec together. Leaving `spec()` optional lets a registered runtime silently omit that half of the contract, and it also overloads a missing spec as an exposure decision for tools that should stay dispatchable without being shown to the model. ## What - Make `ToolExecutor::spec()` required and update core, extension, and test tool executors to return a concrete `ToolSpec`. - Add `ToolExposure::Hidden` for dispatch-only tools. The legacy `shell_command` runtime in unified-exec sessions now uses that explicit exposure instead of hiding itself by omitting a spec. - Build MCP tool specs when `McpHandler` is constructed so invalid MCP specs are skipped before the handler is registered. - Keep tool planning aligned with the new contract for direct, deferred, hidden, code-mode, dynamic, and namespaced tool paths. ## Testing - Added tool-plan coverage that invalid MCP tool specs are not registered. - Updated shell-family coverage for the hidden legacy `shell_command` runtime and the affected tool executor test fixtures.	2026-05-21 15:25:56 +02:00
Matthew Zeng	0a4179bb19	[codex] Add plugin id to MCP tool call items (#23737 ) Add owning plugin id to MCP tool call items so we can better filter them at plugin level. ## Summary - add optional `plugin_id` to MCP tool-call items and legacy begin/end events - propagate plugin metadata into emitted core items and app-server v2 `ThreadItem::McpToolCall` - preserve plugin ids through app-server replay/redaction paths and regenerate v2 schema fixtures ## Testing - `just write-app-server-schema` - `just fmt` - `just fix -p codex-core` - `cargo test -p codex-protocol -p codex-app-server-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib` - `cargo check -p codex-tui --tests` - `cargo check -p codex-app-server --tests` - `git diff --check` ## Notes - `just fix -p codex-core` completed with two non-fatal `too_many_arguments` warnings on the touched MCP notification helpers. - A broader `cargo test -p codex-core` run passed core unit tests, then hit shell/sandbox/snapshot failures in the integration target. - A broader app-server downstream run hit the existing `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack overflow; `cargo test -p codex-exec` also hit the existing sandbox expectation mismatch in `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.	2026-05-20 17:02:10 -07:00
Shijie Rao	370b13afc9	Honor client-resolved service tier defaults (#23537 ) ## Why Model catalog responses can now advertise a nullable `default_service_tier` for each model. Codex needs to preserve three distinct states all the way from config/app-server inputs to inference: - no explicit service tier, so the client may apply the current model catalog default when FastMode is enabled - explicit `default`, meaning the user intentionally wants standard routing - explicit catalog tier ids such as `priority`, `flex`, or future tiers Keeping those states distinct prevents the UI from showing one tier while core sends another, especially after model switches or app-server `thread/start` / `turn/start` updates. ## What Changed - Plumbed `default_service_tier` through model catalog protocol types, app-server model responses, generated schemas, model cache fixtures, and provider/model-manager conversions. - Added the request-only `default` service tier sentinel and normalized legacy config spelling so `fast` in `config.toml` still materializes as the runtime/request id `priority`. - Moved catalog default resolution to the TUI/client side, including recomputing the effective service tier when model/FastMode-dependent surfaces change. - Updated app-server thread lifecycle config construction so `serviceTier: null` preserves explicit standard-routing intent by mapping to `default` instead of internal `None`. - Kept core responsible for validating explicit tiers against the current model and stripping `default` before `/v1/responses`, without applying catalog defaults itself. ## Validation - `CARGO_INCREMENTAL=0 cargo build -p codex-cli` - `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list` - `cargo test -p codex-tui service_tier` - `cargo test -p codex-protocol service_tier_for_request` - `cargo test -p codex-core get_service_tier` - `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core service_tier`	2026-05-20 15:57:50 -07:00
jif-oai	c5bd131567	feat: add turn_id and truncation_policy to extension tool calls (#23666 ) ## Why Extension-owned tools currently receive a stripped `ToolCall` with only `call_id`, `tool_name`, and `payload`. That makes extension work that needs turn-local execution context awkward, especially web-search extension work that needs the active `truncation_policy` at tool invocation time. Reconstructing that value from config or `ExtensionData` would be indirect and could drift from the actual turn context, so the cleaner fix is to pass the needed turn metadata directly on the extension-facing invocation type. ## What changed - added `turn_id` and `truncation_policy` to `codex_tools::ToolCall` - populated those fields when core adapts `ToolInvocation` into an extension tool call - added a focused adapter test that verifies extension executors receive the forwarded turn metadata - updated the memories extension tests to construct the richer `ToolCall` - added the `codex-utils-output-truncation` dependency to `codex-tools` and refreshed lockfiles ## Testing - `cargo test -p codex-tools` - `cargo test -p codex-memories-extension` - `cargo test -p codex-core passes_turn_fields_to_extension_call` - `just bazel-lock-update` - `just bazel-lock-check`	2026-05-20 20:14:41 +02:00
pakrym-oai	a52c91d8b5	[codex] Hide deferred tools from code mode prompt (#23605 ) ## Why `code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools` was failing because code-mode prompt generation used the same nested tool spec list for both the model-visible `exec` guide and the runtime `ALL_TOOLS` surface. That allowed deferred MCP/app tools, such as `calendar_timezone_option_99`, to leak into the `exec` description even though they should only be discoverable through `ALL_TOOLS` at runtime. ## What changed Split code-mode nested tool planning into two sets in `core/src/tools/spec_plan.rs`: - runtime nested tool specs still include deferred tools, so `tools[...]` and `ALL_TOOLS` can call them - `exec` prompt docs only render non-deferred tools, so deferred app tools stay out of the model-visible guide ## Validation - `cargo test -p codex-core --test all code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools -- --nocapture` - looped the same focused test 5 additional times with `cargo test -q -p codex-core --test all code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`	2026-05-20 08:09:45 -07:00
Ahmed Ibrahim	5a4202ad90	[codex] Preserve raw code-mode exec output by default (#23564 ) ## Why Code mode can use nested unified exec calls as data sources. When those calls omit `max_output_tokens`, code mode should receive raw command output so the script can parse or summarize it itself. When code mode does provide `max_output_tokens`, that explicit nested budget should be respected, including values above the default unified exec limit, rather than being capped before code mode sees the result. ## What - Preserve direct unified exec truncation behavior, while letting code-mode exec/write_stdin keep `max_output_tokens` as `None` unless explicitly supplied. - Make code-mode tool results use raw output when no explicit limit is present, and use the explicit nested limit directly when one is specified. - Refactor unified exec output formatting so `truncated_output` takes the caller-selected token budget. - Add e2e integration coverage for explicit nested exec limits, omitted nested exec limits, outer exec limit propagation, omitted-limit outputs that exceed both the default and a small truncation policy, explicit nested limits above those caps, and high explicit limits that still compact larger command output. - Reuse the code-mode turn setup helper while directly asserting the exact exec output item in each test. ## Testing - `just fmt` - `git diff --check` - Not run locally per repo guidance; CI should validate the e2e integration tests.	2026-05-20 04:02:14 +00:00
Eric Traut	e43a2e297f	Fix stale background terminal poll events (#23231 ) ## Why Issue #23214 reports `/ps` showing no background terminals while the status line still says it is waiting for a background terminal. The race is in core: `write_stdin` can poll a process that exits before the response returns. The process manager correctly returns `process_id: None`, but the handler still emitted a `TerminalInteraction` event using the requested session id, causing clients to believe a dead process was still being polled. Fixes #23214. ## What changed - Suppress `TerminalInteraction` events for empty `write_stdin` polls once `response.process_id` is `None`. - Continue emitting interactions for non-empty stdin, even if that input causes the process to exit before the response returns. - Extend the unified exec integration test to assert completed empty polls do not emit terminal interactions. ## Verification - `cargo test -p codex-core --test all unified_exec_emits_one_begin_and_one_end_event` - `cargo test -p codex-core --test all unified_exec_emits_terminal_interaction_for_write_stdin` `cargo test -p codex-core` currently aborts in unrelated `agent::control::tests::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale` with a reproducible stack overflow.	2026-05-19 20:48:37 -07:00
Michael Bolin	c58c84d6ee	test: fix multi-agent service tier assertion (#23576 ) ## Why `openai/codex#22169` added a regression test that expects an invalid child `service_tier` to be rejected, but the test used `Result::expect_err` on `SpawnAgentHandler::handle`. That requires the `Ok` type to implement `Debug`, and this handler returns `Box<dyn ToolOutput>`, so Bazel failed while compiling `codex-core` tests before it could run them. ## What changed - Capture the handler result and assert on `result.err()` instead of calling `expect_err`. - Keep the same `FunctionCallError::RespondToModel` assertion for the rejected service tier. ## Verification - `cargo test -p codex-core spawn_agent_role_service_tier_does_not_hide_invalid_spawn_request`	2026-05-19 16:47:20 -07:00
Ahmed Ibrahim	c53da029bc	[codex] Honor role-defined spawn service tiers (#22169 ) ## Why Custom agent roles are ordinary config layers, so a role file can already express `service_tier` just like other config values. The spawned-agent tier path needs to preserve that effective role config and follow the same precedence pattern as model/reasoning. ## What changed - Apply an explicit spawn-time `service_tier` onto the child config before role application, so a role config layer can override it just like role-defined model/reasoning settings do. - Validate the final effective child tier after the final child model is known, while still falling back to the parent tier when no child tier survives. - Add focused integration coverage for both v1 and v2 proving role TOML loads a service tier, spawned children keep that role-configured tier, and a role tier wins over a conflicting spawn-time tier. ## Validation - `just fmt` - `git diff --check` - Local Rust tests not run, per repo guidance; CI should exercise the new coverage.	2026-05-19 22:40:41 +00:00
Matthew Zeng	8335b56c33	Split plugin install discovery into list and request tools (#23372 ) ## Summary - Add `list_available_plugins_to_install` as the inventory step for plugin and connector install suggestions. - Slim `request_plugin_install` so it only handles the actual elicitation, instead of carrying the full discoverable list in its prompt. - Emit send-time telemetry when an install elicitation is dispatched, including requested tool identity in the event payload. - Emit install-result telemetry through `SessionTelemetry`, including tool type, user response action, and completion status. - Update registration and tests to cover the new two-step flow while keeping the existing `tool_suggest` feature gate unchanged. ## Testing - `just fmt` - `cargo test -p codex-tools` - `cargo test -p codex-core request_plugin_install` - `cargo test -p codex-core list_available_plugins_to_install` - `cargo test -p codex-core install_suggestion_tools_can_be_registered_without_search_tool` - `cargo test -p codex-otel manager_records_plugin_install_suggestion_metric` - `cargo test -p codex-otel manager_records_plugin_install_elicitation_sent_metric` - `just fix -p codex-core` - `just fix -p codex-tools` - `just fix -p codex-otel` - `cargo check -p codex-core`	2026-05-19 14:45:37 -07:00
viyatb-oai	3c76081876	Make `deny` canonical for filesystem permission entries (#23493 ) ## Why Filesystem permission profiles used `none` for deny-read entries, which is less direct than the action the entry actually represents. This change makes `deny` the canonical filesystem permission spelling while preserving compatibility for older configs that still send `none`. ## What changed - rename `FileSystemAccessMode::None` to `Deny` - serialize and generate schemas with `deny` as the canonical value - retain `none` only as a legacy input alias for temporary config compatibility - update filesystem glob diagnostics and regression coverage to use the canonical spelling - refresh config and app-server schema fixtures to match the new wire shape ## Validation - `cargo test -p codex-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core config_toml_deserializes_permission_profiles --lib` - `cargo test -p codex-core read_write_glob_patterns_still_reject_non_subpath_globs --lib` Earlier in the session, a broad `cargo test -p codex-core` run reached unrelated pre-existing failures in timing/snapshot/git-info tests under this environment; the targeted surfaces touched by this PR passed cleanly.	2026-05-19 11:03:47 -07:00
jif-oai	05b8ce4354	chore: namespace v1 sub-agent tools (#23475 ) ## Why The v1 sub-agent tools are a single tool family, but they were exposed as separate flat function tools. This makes the model-visible surface less clearly grouped and leaves the legacy names in the same flat namespace as newer agent tooling. ## What - Wraps the v1 `spawn_agent`, `send_input`, `resume_agent`, `wait_agent`, and `close_agent` specs in the `multi_agent_v1` namespace. - Registers the corresponding handlers with namespaced runtime tool names. - Updates tool-planning, deferred tool search, and sub-agent notification tests to assert the namespace shape and child `spawn_agent` lookup. ## Verification - Updated `codex-core` coverage for the v1 multi-agent tool plan, deferred tool search output, and sub-agent tool descriptions.	2026-05-19 19:46:17 +02:00
jif-oai	b3ae3de405	Defer v1 multi-agent tools behind tool search (#23144 ) Summary: defer v1 multi-agent tools when tool_search and namespace tools are available; keep concise searchable descriptions and move the v1 usage guidance into developer instructions; add targeted coverage. Testing: not run per request; ran just fmt.	2026-05-19 15:04:35 +02:00
jif-oai	05e171094d	Remove ToolsConfig from tool planning (#22835 ) ## Why `codex-tools` is meant to hold reusable tool primitives, but `ToolsConfig` had become a second copy of core runtime decisions instead of a small shared contract. It carried provider capabilities, auth/model gates, permission and environment state, web/search/image feature gates, multi-agent settings, and goal availability from core into `codex-tools` ([definition](`22dd9ad392/codex-rs/tools/src/tool_config.rs (L97)`), [stored on each `TurnContext`](`22dd9ad392/codex-rs/core/src/session/turn_context.rs (L87)`)). Every session/context variant then had to build and mutate that snapshot before assembling tools. This PR removes that master object instead of renaming it. Tool planning now reads the live `TurnContext`, where `codex-core` already owns those decisions, while `codex-tools` keeps only reusable primitives and a generic `ToolSetBuilder`/`ToolSet` accumulator. ## What Changed - Removed `ToolsConfig` / `ToolsConfigParams` from `codex-tools`; the crate keeps the shared helpers that still belong there, including request-user-input mode selection, shell backend/type resolution, `UnifiedExecShellMode`, and `ToolEnvironmentMode`. - Replaced config-snapshot planning with `ToolRouter::from_turn_context` and a `spec_plan` pipeline over `CoreToolPlanContext`, deriving provider capabilities, auth gates, model support, feature gates, environment count, goal support, multi-agent options, web search, and image generation from the authoritative turn state. - Added generic `codex_tools::ToolSetBuilder` / `ToolSet`, plus the small core adapter needed to accumulate `CoreToolRuntime` values and hosted model specs. - Added the `tool_family::shell` registration module and moved shell/unified-exec/memory accounting call sites to read the narrow per-turn fields directly. - Narrowed `TurnContext` to the remaining explicit per-turn fields needed by planning: `available_models`, `unified_exec_shell_mode`, and `goal_tools_supported`. - Reworked MCP exposure and tool-search setup so deferred/direct MCP behavior is driven by the current turn rather than a precomputed config snapshot. - Replaced the large expected-spec fixture tests with focused behavior-level coverage for shell tools, environments, goal and agent-job gates, MCP direct/deferred exposure, tool search, request-plugin-install, code mode, multi-agent mode, hosted tools, and extension executor dispatch. ## Verification - `cargo check -p codex-tools` - `cargo check -p codex-core --lib` - `cargo test -p codex-tools` - `cargo test -p codex-core spec_plan --lib` - `cargo test -p codex-core router --lib`	2026-05-19 11:24:09 +02:00
Eric Traut	84d941d07f	[1 of 7] Add thread settings to UserInput (#23080 ) Stack position: [1 of 7] ## Summary The first three PRs in this stack are a cleanup pass before the actual thread settings API work. Today, core has several overlapping "user input" ops: `UserInput`, `UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how much next-turn state they carry, which makes the later queued thread settings update harder to reason about and review. This PR starts that cleanup by adding the shared `ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry it. Existing variants remain in place here, so this layer is mostly a behavior-preserving API shape change plus mechanical constructor updates. ## End State After PR3 By the end of PR3, `Op::UserInput` is the only "user input" core op. It can carry optional thread settings overrides for callers that need to update stored defaults with a turn, while callers without updates use empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are deleted. ## End State After PR5 By the end of PR5, core will have only two ops for this area: - `Op::UserInput` for user-input-bearing submissions. - `Op::ThreadSettings` for settings-only updates. ## Stack 1. [1 of 7] [Add thread settings to UserInput](https://github.com/openai/codex/pull/23080) (this PR) 2. [2 of 7] [Remove UserInputWithTurnContext](https://github.com/openai/codex/pull/23081) 3. [3 of 7] [Remove UserTurn](https://github.com/openai/codex/pull/23075) 4. [4 of 7] [Placeholder for OverrideTurnContext cleanup](https://github.com/openai/codex/pull/23087) 5. [5 of 7] [Replace OverrideTurnContext with ThreadSettings](https://github.com/openai/codex/pull/22508) 6. [6 of 7] [Add app-server thread settings API](https://github.com/openai/codex/pull/22509) 7. [7 of 7] [Sync TUI thread settings](https://github.com/openai/codex/pull/22510)	2026-05-18 18:48:35 -07:00
sayan-oai	daa11820b0	Remove ToolSearch feature toggle (#23389 ) ## Summary - mark `ToolSearch` as removed and ignore stale config writes for its legacy key - make search tool exposure depend only on model capability, not a feature toggle - remove app-server enablement support and prune now-obsolete test coverage/setup ## Verification - `cargo test -p codex-features` - `cargo test -p codex-tools` - `cargo test -p codex-core search_tool_requires_model_capability` - `cargo test -p codex-app-server experimental_feature_enablement_set_` ## Notes - This keeps the legacy config key as a no-op for compatibility while removing the ability to toggle the behavior off cleanly. - No developer-facing docs update outside the touched app-server README was needed.	2026-05-19 01:24:39 +00:00
xl-openai	6b54ced108	cleanup: Remove skill env var dependency prompting (#22721 ) Deletes the skill env var dependency prompt feature and its runtime path. env_var entries in skill dependency metadata are now silently ignored during skill loading.	2026-05-19 01:24:19 +00:00
pakrym-oai	afa0101ae2	[codex] Move pending input into input queue (#22728 ) ## Why Pending model input was split across `Session`, `TurnState`, and the agent mailbox. That made it easy for new paths to manage queued user input or mailbox delivery outside the intended ownership boundary. This PR consolidates the model-facing input lifecycle behind the session input queue so turn-local pending input, next-turn queued items, and mailbox delivery coordination are owned in one place. ## What Changed - Added `session/input_queue.rs` to own pending input queues and mailbox delivery coordination. - Removed the standalone `agent/mailbox.rs` channel wrapper and store mailbox items directly in the input queue. - Moved pending-input mutations off `TurnState`; `TurnState` now exposes the queue-owned storage directly for now. - Routed abort cleanup, mailbox delivery phase changes, next-turn queued items, and active-turn pending input through `InputQueue`. - Boxed stack-heavy agent resume/fork startup futures that the refactor pushed over the default test stack. - Updated session, task, goal, stream-event, and multi-agent call sites and tests to use the new queue ownership. ## Verification - `cargo test -p codex-core --lib agent::control::tests` - `cargo test -p codex-core --lib agent::control::tests::resume_closed_child_reopens_open_descendants -- --exact` - `cargo test -p codex-core --lib agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns -- --exact` - `cargo test -p codex-core --lib agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role -- --exact` - `cargo test -p codex-core` was also run; it completed with 1814 passed, 4 ignored, and one timeout in `agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role`, which passed when rerun in isolation.	2026-05-18 15:43:01 -07:00
jif-oai	c69cde3547	Add tool lifecycle extension contributor (#23309 ) ## Why Extensions that need to track runtime progress currently have no typed host signal for tool execution. The goal extension in particular needs to observe tool attempts without inspecting tool payloads, owning tool implementations, or staying coupled to core-only runtime plumbing. This adds a narrow lifecycle contributor API for host-owned tool execution: extensions can observe when an accepted tool call starts and how it finishes, while policy hooks and tool handlers continue to own payload rewriting, blocking, and execution. Relevant code: - [`ToolLifecycleContributor`](`3ad2850ffc/codex-rs/ext/extension-api/src/contributors.rs (L119)`) defines the extension-facing observer contract. - [`tool_lifecycle.rs`](`3ad2850ffc/codex-rs/ext/extension-api/src/contributors/tool_lifecycle.rs`) defines the typed start/finish inputs, source, and outcome enums. - [`notify_tool_start` / `notify_tool_finish`](`3ad2850ffc/codex-rs/core/src/tools/lifecycle.rs`) bridges core tool dispatch into the extension registry. ## What Changed - Added `ToolLifecycleContributor` to `codex-extension-api`, including: - `ToolStartInput` - `ToolFinishInput` - `ToolCallSource` - `ToolCallOutcome` - Added registration and lookup support on `ExtensionRegistryBuilder` / `ExtensionRegistry`. - Wired core tool dispatch to notify lifecycle contributors for: - accepted tool starts - completed tool calls, including the tool output success marker - pre-tool-use blocks - failures before or after the handler runs - cancellation/abort in the parallel tool path - Registered the goal extension as a lifecycle contributor and added the outcome filter it will use for goal progress accounting. ## Test Coverage - Added `dispatch_notifies_tool_lifecycle_contributors` to cover lifecycle notification ordering and outcomes for successful and handler-failed tool calls.	2026-05-18 21:55:57 +02:00
Celia Chen	4dbca61e20	fix: default unknown tool schemas to empty schemas (#22380 ) ## Why Some tool providers, especially MCP servers and dynamic tool sources, can supply schema nodes that omit `type` and have no recognized JSON Schema shape hints. Previously, `sanitize_json_schema` filled those unknown nodes in as `string`, which made the schema parseable but invented a scalar constraint that the provider did not specify. For description-only fields, that could incorrectly steer tool arguments away from the provider's actual accepted shape. The Responses API accepts permissive empty schemas such as `{}` at nested property positions, so Codex should preserve that permissive meaning instead of coercing unknown schema nodes into a misleading scalar type. ## What Changed - Changed the no-hints fallback in `codex-rs/tools/src/json_schema.rs` to clear unrecognized object schema nodes to `{}`. - Empty schemas now remain `{}` rather than becoming `type: "string"`. - Description-only or otherwise metadata-only nested property schemas now become `{}` while surrounding object/array/string/number inference still applies when recognized hints are present. - Updated `codex-tools` and `codex-core` tests to cover top-level empty schemas, nested empty schemas, metadata-only malformed schemas, dynamic tools, and MCP tool specs. ## Verification - `cargo test -p codex-tools` - `cargo test -p codex-core test_mcp_tool_property_missing_type_defaults_to_empty_schema` - Manually verified the real Responses API behavior for both empty-schema positions: - Top-level function `parameters: {}` is accepted and echoed back as `{"type":"object","properties":{}}`; when forced to call the tool, Responses emitted empty object arguments: `"arguments": "{}"`. - Nested property schema `{}` is accepted and preserved as `{}`; when forced to call a tool with `metadata.extra`, Responses emitted `"arguments": "{\"metadata\":{\"extra\":\"codex schema sanitizer behavior\"}}"`.	2026-05-18 12:41:10 -07:00
Eric Traut	0d344aca9b	goal: pause continuation loops on usage limits and blockers (#23094 ) Addresses #22833, #22245, #23067 ## Why `/goal` can keep synthesizing turns even when the next turn cannot make meaningful progress. Hard usage exhaustion can replay failing turns, and repeated permission or external-resource blockers can keep burning tokens while waiting for user or system intervention. ## What changed - Add resumable `blocked` and `usageLimited` goal states. As with `paused`, goal continuation stops with these states. - Move to `usageLimited` after usage-limit failures. - Allow the built-in `update_goal` tool to set `blocked` only under explicit repeated-impasse guidance. Updated goal continuation prompt to specify that agent should use `blocked` only when it has made at least three attempts to get past an impasse. Most of the files touched by this PR are because of the small app server protocol update. ## Validation I manually reproduced a number of situations where an agent can run into a true impasse and verified that it properly enters `blocked` state. I then resumed and verified that it once again entered `blocked` state several turns later if the impasse still exists. I also manually reproduced the usage-limit condition by creating a simulated responses API endpoint that returns 429 errors with the appropriate error message. Verified that the goal runtime properly moves the goal into `usageLimited` state and TUI UI updates appropriately. Verified that `/goal resume` resumes (and immediately goes back into `ussageLImited` state if appropriate). ## Follow-up PRs Small changes will be needed to the GUI clients to properly handle the two new states.	2026-05-18 11:28:53 -07:00
pakrym-oai	82061660ae	[codex] Remove legacy shell output formatting paths (#22706 ) ## Why The client and tool pipeline still carried compatibility code for legacy structured shell output. Current shell and apply_patch responses are already plain text for model consumption, so keeping a JSON-serialization path plus shell-item rewrite logic makes the request formatter and tests preserve a format we do not need anymore. ## What Changed - Removed the client-side shell output rewrite from `core/src/client_common.rs`. - Removed the structured exec-output formatter and the shell `freeform` switch so tool emitters use one model-facing formatter. - Collapsed apply_patch/shell serialization tests around the remaining plain-text output expectations and removed duplicate one-variant parameterized cases. - Kept the `ApplyPatchModelOutput::ShellCommandViaHeredoc` compatibility input shape, but no longer treats it as a separate output-format mode. ## Validation - `cargo test -p codex-core client_common` - `cargo test -p codex-core shell_serialization` - `cargo test -p codex-core apply_patch_cli` - `just fix -p codex-core` ## Documentation No external Codex documentation update is needed.	2026-05-18 09:57:54 -07:00
Michael Bolin	0a83353ca3	test: reduce core sandbox policy test setup (#23036 ) ## Why `SandboxPolicy` is a legacy compatibility shape, but several core tests still used it for ordinary turn setup even when the runtime path now carries `PermissionProfile`. With the first cleanup PR merged, this follow-up trims more core test scaffolding so remaining `SandboxPolicy` matches are easier to classify as production compatibility, legacy-boundary coverage, or explicit conversion tests. ## What Changed - Updated apply-patch handler and runtime tests to pass `PermissionProfile` directly. - Changed sandboxing test helpers to build permission profiles without first creating `SandboxPolicy` values. - Converted request-permissions integration turns to pass `PermissionProfile` through the test helper, leaving legacy sandbox projection at the `Op::UserTurn` boundary. - Converted unified exec integration helpers and direct turn submissions to use `PermissionProfile` values instead of `SandboxPolicy` setup. - Removed now-unused `SandboxPolicy` imports from the touched core tests. ## Test Plan - `just fmt` - `cargo test -p codex-core --lib tools::sandboxing::tests` - `cargo test -p codex-core --lib tools::runtimes::apply_patch::tests` - `cargo test -p codex-core --lib tools::handlers::apply_patch::tests` - `cargo test -p codex-core --lib unified_exec::process_manager::tests` - `cargo test -p codex-core --test all request_permissions::` - `cargo test -p codex-core --test all unified_exec::` - `just fix -p codex-core`	2026-05-17 08:39:41 -07:00
jif-oai	545ede569c	Make multi-agent v2 tool namespace configurable (#23147 ) ## Summary - Add `features.multi_agent_v2.tool_namespace` with config/schema validation for Responses-compatible namespace values. - Thread the resolved namespace into `ToolsConfig` for normal turns and review turns. - Wrap MultiAgentV2 tool specs and registry names in the configured namespace when namespace tools are supported, while falling back to the plain tool names when they are not. ## Validation - `just fmt` - `just write-config-schema` - `cargo test -p codex-features multi_agent_v2_feature_config -- --nocapture` - `cargo test -p codex-core test_build_specs_multi_agent_v2 -- --nocapture` - `cargo test -p codex-core multi_agent_v2_config -- --nocapture` - `cargo test -p codex-core multi_agent_v2_rejects_invalid_tool_namespace -- --nocapture` - `cargo test -p codex-tools` - `git diff --check`	2026-05-17 15:27:43 +02:00
sayan-oai	061a614d85	multiagent: trim model-visible description, cap to 5 models (#23069 ) ## Why The `spawn_agent` model override guidance is uncapped and bloating context. We need to trim down each entry and cap total entries. picked 5 as cap, we can change ## What changed - Cap the model override summaries shown in `spawn_agent` to the first 5 picker-visible models, preserving the existing priority ordering from the models manager. - Condense each rendered entry to the actionable pieces the model needs: - use the model slug as the label - render compact reasoning effort lists with the default marked inline - render only service tier IDs, and omit the clause when no tiers are available - Update coverage so the compact formatter shape and the top-5 cap are exercised, and keep the end-to-end request assertion aligned with real model metadata. ## Example Before: `- gpt-5.4 ('gpt-5.4\'): Strong model for everyday coding. Default reasoning effort: medium. Supported reasoning efforts: low (Fast responses with lighter reasoning), medium (Balances speed and reasoning depth for everyday tasks), high (Greater reasoning depth for complex problems), xhigh (Extra high reasoning depth for complex problems). Supported service tiers: priority (Fast: 1.5x speed, increased usage).` After: `- 'gpt-5.4': Strong model for everyday coding. Reasoning efforts: low, medium (default), high, xhigh. Service tiers: priority.`	2026-05-16 13:43:30 -07:00
Michael Bolin	d91bc15618	test: construct permission profiles directly (#23030 ) ## Why `SandboxPolicy` is now a legacy compatibility shape, but several tests still built a `SandboxPolicy` only to immediately convert it into `PermissionProfile` for APIs that already accept canonical runtime permissions. Those detours make it harder to audit where legacy sandbox policy is still required, because boundary-only usages are mixed together with ordinary test setup. ## What Changed - Updated tests in `codex-core`, `codex-exec`, `codex-analytics`, and `codex-config` to construct `PermissionProfile` values directly when the code under test takes a permission profile. - Changed exec-policy, request-permissions, session, and sandbox test helpers to pass `PermissionProfile` through instead of converting from `SandboxPolicy` internally. - Left `SandboxPolicy` in place where tests are explicitly exercising legacy compatibility or request/response boundaries. ## Test Plan - `cargo test -p codex-analytics -p codex-config` - `cargo test -p codex-core --lib safety::tests` - `cargo test -p codex-core --lib exec_policy::tests::` - `cargo test -p codex-core --lib exec::tests` - `cargo test -p codex-core --lib guardian_review_session_config` - `cargo test -p codex-core --lib tools::network_approval::tests` - `cargo test -p codex-core --lib tools::runtimes::shell::unix_escalation::tests` - `cargo test -p codex-core --lib managed_network` - `cargo test -p codex-core --test all request_permissions::` - `cargo test -p codex-exec sandbox` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23030). * #23036 * __->__ #23030	2026-05-16 12:12:37 -07:00
Eric Traut	941e7f825e	Improve goal completion usage reporting (#22907 ) ## Why Goal completion follow-up turns currently receive a preformatted English usage sentence such as `time used: 2586 seconds`. That nudges the model to echo an awkward raw seconds count in the final reply, even though the tool result already exposes structured usage fields like `goal.timeUsedSeconds`, `goal.tokensUsed`, and `goal.tokenBudget`. ## What changed - Replace the preformatted completion usage sentence with guidance to read the structured goal fields from the tool result. - Preserve token-budget reporting while allowing the model to phrase elapsed time in a concise, human-friendly way that fits the response language. - Update core coverage for both the generated completion guidance and the session flow that forwards it back to the model. ## Verification Previously, it would have output a final message indicating that it "worked for 303 seconds". Now it shows the following: <img width="286" height="35" alt="image" src="https://github.com/user-attachments/assets/d7011880-9449-46a7-856f-4e50ae00eb45" />	2026-05-16 11:49:40 -07:00
Curtis 'Fjord' Hawthorne	8543e39885	Preserve image detail in app-server inputs (#20693 ) ## Summary - Add optional image detail to user image inputs across core, app-server v2, thread history/event mapping, and the generated app-server schemas/types. - Preserve requested detail when serializing Responses image inputs: omitted detail stays on the existing `high` default, while explicit `original` keeps local images on the original-resolution path. - Support `high`/`original` consistently for tool image outputs, including MCP `codex/imageDetail`, code-mode image helpers, and `view_image`.	2026-05-15 15:04:04 -07:00
jif-oai	c03cea4ca2	Remove zombie tools spec module (#22820 ) ## Summary - move tool_user_shell_type out of the old tools::spec module and call it from tools directly - attach the remaining spec planning model tests under spec_plan - delete core/src/tools/spec.rs ## Tests - just fmt - cargo test -p codex-core tools::spec_plan Note: a broader cargo test -p codex-core run on the earlier PR-head worktree still hit the pre-existing stack overflow in agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns.	2026-05-15 13:44:58 +02:00
jif-oai	6f1a01fbdd	Simplify tool executor and registry plumbing (#22636 ) ## Why The tool runtime path still had a typed output associated type on `ToolExecutor`, plus a core-only `RegisteredTool` adapter and extension-only executor aliases. That made every new shared tool runtime carry extra adapter plumbing before it could participate in core dispatch, extension tools, hook payloads, telemetry, and model-visible spec generation. This PR moves output erasure to the shared executor boundary so core and extension tools can use the same execution contract directly. ## What Changed - Changed `codex_tools::ToolExecutor` to return `Box<dyn ToolOutput>` instead of an associated `Output` type. - Removed the extension-specific `ExtensionToolExecutor` / `ExtensionToolOutput` aliases and exposed `ToolExecutor<ToolCall>` plus `ToolOutput` through `codex-extension-api`. - Reworked core tool registration around `CoreToolRuntime` and `ToolRegistry::from_tools`, removing the extra `RegisteredTool` / `ToolRegistryBuilder` layer. - Consolidated model-visible spec planning and registry construction in `core/src/tools/spec_plan.rs`, including deferred tool search and code-mode-only filtering. - Added `ToolOutput` helpers for post-tool-use hook ids and inputs so MCP, unified exec, extension, and other boxed outputs preserve the same hook payload behavior. - Updated core handlers, memories tools, and the related registry/spec/router tests to use the simplified contract. ## Test Coverage - Updated coverage for tool spec planning, registry lookup, deferred tool search registration, extension tool routing, post-tool-use hook payloads, dispatch tracing, guardian output extraction, and memories extension tool execution.	2026-05-15 11:47:54 +02:00
Michael Bolin	3c6d727810	permissions: resolve profile identity with constraints (#22683 ) ## Why This PR is the invariant-cleanup layer that follows the workspace-roots base merged in [#22610](https://github.com/openai/codex/pull/22610). #22610 adds `[permissions.<id>.workspace_roots]` and keeps runtime workspace roots separate from the raw permission profile, but its in-memory representation is intentionally transitional: `Permissions` still carries the selected profile identity next to a constrained `PermissionProfile`. That makes APIs such as `set_constrained_permission_profile_with_active_profile()` fragile because the id and value only mean the right thing when every caller keeps them in sync. This PR introduces a single resolved profile state so profile identity, `extends`, the profile value, and profile-declared workspace roots travel together. The next PR, [#22611](https://github.com/openai/codex/pull/22611), builds on this by changing the app-server turn API to select permission profiles by id plus runtime workspace roots. ## Stack Context - #22610, now merged: adds profile-declared `workspace_roots`, runtime workspace roots, and `:workspace_roots` materialization. - This PR: replaces the parallel active-profile/profile-value fields with `PermissionProfileState`. - #22611: switches app-server turn updates toward profile ids plus runtime workspace roots. - #22612: updates TUI/exec summaries to show the effective workspace roots. Keeping this separate from #22611 is deliberate: reviewers can validate the internal state invariant before reviewing the app-server protocol migration. ## What Changed - Added `ResolvedPermissionProfile::{Legacy, BuiltIn, Named}` and `PermissionProfileState`. - Typed built-in profile ids with `BuiltInPermissionProfileId`. - Moved selected profile identity and profile-declared workspace roots into the resolved state. - Replaced `Permissions` parallel profile fields with one `permission_profile_state`. - Removed `set_constrained_permission_profile_with_active_profile()` from session sync paths. - Kept trusted session replay/`SessionConfigured` compatibility through explicit session snapshot helpers. - Updated session configuration, MCP initialization, app-server, exec, TUI, and guardian call sites to consume `&PermissionProfile` directly. ## Review Guide Start with `codex-rs/core/src/config/resolved_permission_profile.rs`; it is the new invariant boundary. Then review `codex-rs/core/src/config/mod.rs` to see how config loading records active profile identity and profile workspace roots. The remaining call-site changes are mostly mechanical fallout from `Permissions::permission_profile()` returning `&PermissionProfile` instead of `&Constrained<PermissionProfile>`. ## Verification The existing config/session coverage now constructs and asserts through `PermissionProfileState`. The workspace-root config test also asserts that profile-declared roots are preserved in the resolved state, which is the behavior #22611 relies on when runtime roots become mutable through the app-server API. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22683). * #22612 * #22611 * __->__ #22683	2026-05-14 18:47:44 -07:00
Michael Bolin	c25d905f61	permissions: support workspace roots in profiles (#22610 ) ## Why This is the configuration/model half of the alternative permissions migration we discussed as a comparison point for [#22401](https://github.com/openai/codex/pull/22401) and [#22402](https://github.com/openai/codex/pull/22402). The old `workspace-write` model mixes three concerns that we want to keep separate: - reusable profile rules that should stay immutable once selected - user/runtime workspace roots from `cwd`, `--add-dir`, and legacy workspace-write config - internal Codex writable roots such as memories, which should not be shown as user workspace roots This PR gives permission profiles first-class `workspace_roots` so users can opt multiple repositories into the same `:workspace_roots` rules without using broad absolute-path write grants. It also starts separating the raw selected profile from the effective runtime profile by making `Permissions` expose explicit accessors instead of public mutable fields. A representative `config.toml` looks like this: ```toml default_permissions = "dev" [permissions.dev.workspace_roots] "~/code/openai" = true "~/code/developers-website" = true [permissions.dev.filesystem.":workspace_roots"] "." = "write" ".codex" = "read" ".git" = "read" ".vscode" = "read" ``` If Codex starts in `~/code/codex` with that profile selected, the effective workspace-root set becomes: - `~/code/codex` from the runtime `cwd` - `~/code/openai` from the profile - `~/code/developers-website` from the profile The `:workspace_roots` rules are materialized across each root, so `.git`, `.codex`, and `.vscode` stay scoped the same way everywhere. Runtime additions such as `--add-dir` can still layer on later stack entries without mutating the selected profile. ## Stack Shape This PR intentionally stops before the profile-identity cleanup in [#22683](https://github.com/openai/codex/pull/22683) so the base review stays focused on config loading, workspace-root materialization, and compatibility with legacy `workspace-write`. The representation in this PR is therefore transitional: `Permissions` carries enough state to distinguish the raw constrained profile from the effective runtime profile, and there are still call sites that must keep the active profile identity and constrained profile value in sync. The follow-up PR replaces that with a single resolved profile state (`ResolvedPermissionProfile` / `PermissionProfileState`) that keeps the profile id, immutable `PermissionProfile`, and profile-declared workspace roots together. That follow-up removes APIs such as `set_constrained_permission_profile_with_active_profile()` where separate arguments could drift out of sync. Downstream PRs then build on this base to switch app-server turn updates to profile ids plus runtime workspace roots and to finish the user-visible summary behavior. Reviewers should judge this PR as the workspace-roots foundation, not as the final in-memory shape of selected permission profiles. ## Review Guide Suggested review order: 1. Start with `codex-rs/core/src/config/mod.rs`. This is the main shape change in the base slice. `Permissions` now stores a private raw `Constrained<PermissionProfile>` plus runtime `workspace_roots`. Callers use `permission_profile()` when they need the raw constrained value and `effective_permission_profile()` when they need a materialized runtime profile. As noted above, [#22683](https://github.com/openai/codex/pull/22683) replaces this transitional shape with a resolved profile state that keeps identity and profile data together. 2. Review `codex-rs/config/src/permissions_toml.rs` and `codex-rs/core/src/config/permissions.rs`. These add `[permissions.<id>.workspace_roots]`, resolve enabled entries relative to the policy cwd, and keep `:workspace_roots` deny-read glob patterns symbolic until the actual roots are known. 3. Review `codex-rs/protocol/src/permissions.rs` and `codex-rs/protocol/src/models.rs`. These add the policy/profile materialization helpers that expand exact `:workspace_roots` entries and scoped deny-read globs over every workspace root. This is also where `ActivePermissionProfileModification` is removed from the core model. 4. Review the legacy bridge in `Config::load_from_base_config_with_overrides` and `Config::set_legacy_sandbox_policy`. This is where legacy `workspace-write` roots become runtime workspace roots, while Codex internal writable roots stay internal and do not appear as user-facing workspace roots. 5. Then skim downstream call sites. The interesting pattern is raw-vs-effective access: state/proxy/bwrap paths keep the raw constrained profile, while execution, summaries, and user-visible status use the effective profile and workspace-root list. ## What Changed - added `[permissions.<id>.workspace_roots]` to the config model and schema - added runtime `workspace_roots` state to `Config`/`Permissions` and `ConfigOverrides` - made `Permissions` profile fields private and replaced direct mutation with accessors/setters - added `PermissionProfile` and `FileSystemSandboxPolicy` helpers for materializing `:workspace_roots` exact paths and deny-read globs across all roots - moved legacy additional writable roots into runtime workspace-root state instead of active profile modifications - removed `ActivePermissionProfileModification` and its app-server protocol/schema export - updated sandbox/status summary paths so internal writable roots are not reported as user workspace roots ## Verification Strategy The targeted tests cover the behavior at the layers where regressions are most likely: - `codex-rs/core/src/config/config_tests.rs` verifies config loading, legacy workspace-root seeding, effective profile materialization, and memory-root handling. - `codex-rs/core/src/config/permissions_tests.rs` verifies profile `workspace_roots` parsing and `:workspace_roots` scoped/glob compilation. - `codex-rs/protocol/src/permissions.rs` unit tests verify exact and glob materialization over multiple workspace roots. - `codex-rs/tui/src/status/tests.rs` and `codex-rs/utils/sandbox-summary/src/sandbox_summary.rs` verify the user-facing summaries show effective workspace roots and hide internal writes. I also ran `cargo check --tests` locally after the latest stack refresh to catch cross-crate API breakage from the private-field/accessor changes. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22610). * #22612 * #22611 * #22683 * __->__ #22610	2026-05-14 18:25:23 -07:00
Dylan Hurd	51b0e94105	chore(features) rm Feature::ApplyPatchFreeform (#22711 ) ## Summary Removes the feature since this is effectively on by default in all cases where we should use it, or can be configured via models.json. ## Testing - [x] unit tests pass	2026-05-14 16:15:56 -07:00
jif-oai	6d65686313	feat: make ToolExecutor an async trait (#22560 ) ## Why `codex_tools::ToolExecutor` keeps a tool spec attached to its runtime handler, but extension tools still carried a parallel `ExtensionToolFuture` / `ExtensionToolExecutor` shape. That made extension-owned tools look different from host tools even though routing, registration, and execution need the same abstraction. This PR makes the shared executor contract directly async and lets extension tools implement it too, so host tools and extension tools can move through the same registration path. ## What changed - Changed `ToolExecutor::handle` to an `async fn` using `async-trait`, and updated built-in tool handlers to implement the async trait directly. - Replaced the bespoke `ExtensionToolFuture` contract with a marker `ExtensionToolExecutor` over `ToolExecutor<ToolCall, Output = JsonToolOutput>`, re-exporting `ToolExecutor` from `codex-extension-api`. - Updated the memories extension tools to implement the shared executor trait. - Split tool-router construction into collected executors plus hosted model specs, keeping hosted tools like web search and image generation separate from executable handlers. - Updated spec/router tests and extension-tool stubs for the new executor shape. ## Verification - Not run locally.	2026-05-14 11:23:57 +02:00
jif-oai	e6939e3969	feat: namespace in ext (#22556 )	2026-05-14 00:37:48 +02:00
Andrey Mishchenko	7c57a59f51	Make multi_agent_v2 wait_agent timeouts configurable (#22528 ) ## Why `multi_agent_v2` already allowed configuring the minimum `wait_agent` timeout, but the default timeout and upper bound were still hard-coded. That made it hard to tune waits for subagent mailbox activity in sessions that need either faster wakeups or longer waits, and it meant the model-visible `wait_agent` schema could not fully reflect the resolved runtime limits. ## What Changed - Added `features.multi_agent_v2.max_wait_timeout_ms` and `features.multi_agent_v2.default_wait_timeout_ms` alongside the existing `min_wait_timeout_ms` setting. - Validated all three timeouts in config as `0..=3_600_000`, with `min_wait_timeout_ms <= default_wait_timeout_ms <= max_wait_timeout_ms`. - Thread and review session tool config now passes the resolved min/default/max values into the `wait_agent` tool schema. - `wait_agent` now uses the configured default when `timeout_ms` is omitted and rejects explicit values outside the configured min/max range instead of silently clamping them. - Updated the generated config schema and config-lock test coverage for the new fields.	2026-05-13 14:43:06 -07:00
iceweasel-oai	8ae0c837f0	Avoid PowerShell profiles in elevated Windows sandbox (#21400 ) ## Why On Windows, elevated sandboxed commands run under a dedicated sandbox account while `HOME` / `USERPROFILE` can still point at the real user's profile directory. For PowerShell login shells, that combination can make the sandbox account try to load the real user's PowerShell profile script. If the sandbox account's execution policy differs from the real user's policy, startup can emit profile-loading errors before the requested command runs. For this backend, loading the profile is not a faithful user login shell: it is cross-account profile execution. Treating these PowerShell invocations as non-login shells avoids that invalid startup path. ## Why This Happens Late The normal `login` decision is resolved when shell argv is created, but that point is too early to make this Windows sandbox-specific decision. At argv creation time we do not yet know the actual sandbox attempt that will run the command. A turn can include sandboxed and unsandboxed attempts, and a broad turn-level override would also affect Full Access commands where the user's profile should remain available. Instead, this change carries the selected `ShellType` alongside the argv and applies the `-NoProfile` adjustment in the shell runtimes once the `SandboxAttempt` is known. That keeps the override scoped to actual `WindowsRestrictedToken` attempts with `WindowsSandboxLevel::Elevated`. The runtime uses the selected shell metadata rather than re-detecting PowerShell from argv. That avoids brittle parsing and covers PowerShell invocation shapes such as `-EncodedCommand`. ## What Changed - Carry selected shell metadata through `exec_command` / unified exec requests and shell tool requests. - Insert `-NoProfile` for PowerShell commands only when the runtime is about to execute a sandboxed elevated Windows attempt. - Add focused unit coverage for elevated Windows PowerShell, `-EncodedCommand`, existing `-NoProfile`, legacy restricted-token attempts, unsandboxed attempts, and non-PowerShell commands. ## Verification - `cargo test -p codex-core disable_powershell_profile_tests` - `cargo test -p codex-core test_get_command` - `cargo clippy --fix --tests --allow-dirty --allow-no-vcs -p codex-core` A full `cargo test -p codex-core` run was also attempted during development, but it still hit an unrelated stack overflow in `agent::control` tests before reaching this area.	2026-05-13 21:37:50 +00:00
sayan-oai	3de4d7f238	clean up instructions (#22543 ) rm behavioral steering in tool docs for code mode.	2026-05-13 14:28:57 -07:00
pakrym-oai	3ac1d15598	Use selected environment cwd for filesystem helpers (#22542 ) ## Why `TurnContext::cwd` is deprecated in favor of resolving paths from the selected turn environment cwd. A few filesystem-oriented paths were still constructing sandbox context from the legacy cwd and then mutating it afterward, or resolving local file paths through the deprecated helper. ## What changed - Make `TurnContext::file_system_sandbox_context` take the trusted cwd explicitly. - Pass the selected turn environment cwd directly from `apply_patch` and `view_image` call sites. - Restrict `spawn_agents_on_csv` to exactly one local environment and resolve input/output CSV paths from that local environment cwd. - Remove a redundant test setup assignment that only synchronized deprecated `TurnContext::cwd` with a replaced config. ## Validation - `cargo test -p codex-core view_image` - `cargo test -p codex-core maybe_persist_mcp_tool_approval_writes_project_config_for_project_server` - `cargo test -p codex-core parse_csv_supports_quotes_and_commas` - `git diff --check`	2026-05-13 13:18:56 -07:00
pakrym-oai	4454e1411b	Deprecate TurnContext cwd and resolve_path (#22519 ) ## Why `TurnContext::cwd` and `TurnContext::resolve_path` are being phased out in favor of using the selected turn environment cwd directly. Deprecating both APIs makes any new direct dependency visible while preserving the existing migration path for current callers. ## What Changed - Marked `TurnContext::cwd` and `TurnContext::resolve_path` as deprecated with guidance to use the selected turn environment cwd instead. - Added exact `#[allow(deprecated)]` suppressions at each existing direct usage site, including tests, rather than adding crate-wide suppression. - Kept the change behavior-preserving: current cwd reads, writes, and path resolution continue to use the same values. ## Verification - `just fmt` - `cargo check -p codex-core` - `cargo check -p codex-core --tests` - `git diff --check`	2026-05-13 11:15:25 -07:00
jif-oai	fc26af377f	feat: expose multi-agent v2 as model-only tools (#22514 ) ## Why `code_mode_only` filters code-mode nested tools out of the top-level tool list. For multi-agent v2, we need a rollout shape where the collaboration tools remain callable as normal model tools without also being embedded into the code-mode `exec` tool declaration. Related to this: https://openai-corpws.slack.com/archives/C0AQLHB4U75/p1778660267922549 ## What Changed - Adds `features.multi_agent_v2.non_code_mode_only`, including config resolution, profile override handling, and generated schema coverage. - Introduces `ToolExposure::DirectModelOnly` so a tool can be included in the initial model-visible list while staying out of the nested code-mode tool surface. - Applies that exposure to the multi-agent v2 tools when the new flag is set: `spawn_agent`, `send_message`, `followup_task`, `wait_agent`, `close_agent`, and `list_agents`. - Updates code-mode-only filtering so direct-model-only tools remain visible while ordinary nested code-mode tools are still hidden. ## Verification - Added config parsing/profile tests for `non_code_mode_only`. - Added tool spec coverage for the code-mode-only multi-agent v2 exposure behavior.	2026-05-13 19:49:47 +02:00
pakrym-oai	83decfa300	[codex] Remove unused legacy shell tools (#22246 ) ## Why Recent session history showed no active use of the raw `shell`, `local_shell`, or `container.exec` execution surfaces. Keeping those handlers/specs wired into core leaves duplicate shell execution paths alongside the supported `shell_command` and unified exec tools. ## What changed - Removed the raw `shell` handler/spec and its `ShellToolCallParams` protocol helper. - Removed the legacy `local_shell` and `container.exec` handler/spec plumbing while preserving persisted-history compatibility for old response items. - Normalized model/config `default` and `local` shell selections to `shell_command`. - Pruned tests that exercised removed raw-shell/local-shell/apply-patch variants and kept coverage on `shell_command`, unified exec, and freeform `apply_patch`. ## Verification - `git diff --check` - `cargo test -p codex-protocol` - `cargo test -p codex-tools` - `cargo test -p codex-core tools::handlers::shell` - `cargo test -p codex-core tools::spec` - `cargo test -p codex-core tools::router` - `cargo test -p codex-core active_call_preserves_triggering_command_context` - `cargo test -p codex-core guardian_tests` - `cargo test -p codex-core --test all shell_serialization` - `cargo test -p codex-core --test all apply_patch_cli` - `cargo test -p codex-core --test all shell_command_` - `cargo test -p codex-core --test all local_shell` - `cargo test -p codex-core --test all otel::` - `cargo test -p codex-core --test all hooks::` - `just fix -p codex-core` - `just fix -p codex-tools`	2026-05-13 16:43:25 +00:00
jif-oai	fdda59c00b	Introduce tool exposure for deferred registration (#22489 ) ## Why Deferred tools were tracked with separate side-channel filtering after tool specs had already been assembled. That made the registry responsible for executing tools while the router/spec planner separately decided whether those same tools should be exposed to the model up front. This PR makes exposure part of the tool handler contract so direct versus deferred availability travels with the executable tool registration. Next step will be to simplify registration ## What Changed - Adds `ToolExposure` to `codex-tools` and exposes it through `ToolExecutor`, defaulting tools to `Direct`. - Teaches dynamic tools and MCP handlers to mark deferred tools as `Deferred` at construction time. - Renames the registry object-safe wrapper from `AnyToolHandler` to `RegisteredTool` and uses `ToolExposure` when deciding whether to include a handler's spec in the initial model-visible tool list. - Refactors tool spec planning to derive direct specs and deferred search entries from registered handlers, removing the router's special-case deferred dynamic tool filtering. ## Verification - Not run.	2026-05-13 18:16:51 +02:00
Ahmed Ibrahim	87de4e3290	Add service tier overrides to spawned agents (#22139 ) ## Why Spawned agents can already override `model` and `reasoning_effort`, but they have no equivalent way to opt into a model-supported service tier. That makes it impossible to preserve or intentionally select tiered execution behavior when delegating work to a sub-agent, even though the model catalog already advertises supported `service_tiers`. ## What changed - Add optional `service_tier` to both legacy and `MultiAgentV2` `spawn_agent` tool inputs. - Show each picker-visible model's supported service tier ids and descriptions in the `spawn_agent` tool guidance. - Resolve service tier selection after the child agent's effective model is known. - Inherit the parent tier when omitted and still supported by the final child model; otherwise clear it. - Reject explicit unsupported tier requests with a model-facing error. - Keep explicit `service_tier` usable on full-history forks, while still honoring the existing model/reasoning fork restrictions. - Hide `service_tier` alongside other spawn metadata when `hide_spawn_agent_metadata` is enabled. ## Verification Added focused coverage for: - v1/v2 `spawn_agent` schema exposure for `service_tier` - tier descriptions in spawn guidance - hidden-metadata suppression - explicit supported tier selection - explicit unknown and unsupported tier rejection - inherited tier preservation or clearing based on child-model support - full-history fork acceptance for explicit service tiers in both v1 and v2 Local Rust tests were not run in this workspace per repo guidance; the new coverage is included for CI.	2026-05-13 18:11:50 +03:00
jif-oai	9c5dfa7b1a	Refactor extension tools onto shared ToolExecutor (#22369 ) ## Why Extension tools were split across two public runtime contracts: `codex-tool-api` exposed `ToolBundle` plus its own call/spec/error types, while core native tools used `codex_tools::ToolExecutor`. That made contributed tool specs and execution behavior easy to drift apart and added another crate boundary for what should be one executable-tool seam. This PR makes `ToolExecutor` the single runtime contract and keeps extension-specific pinning in `codex-extension-api`. ## Remaining todo https://github.com/openai/codex/pull/22369/changes#diff-b935ea8245c3ce568a30cff660175fa6390b66b872ae409e1e2e965738250741R5 Either generic `Invocation` or sub-extract the `ToolCall` and clean `ToolInvocation` ## What changed - Removed the `codex-tool-api` workspace crate and its dependencies from core and `codex-extension-api`. - Made `codex_tools::ToolExecutor` object-safe with `async_trait` so extension contributors can return a dyn executor. - Added the extension-facing aliases under `ext/extension-api/src/contributors/tools.rs`, including `ExtensionToolExecutor = dyn ToolExecutor<ToolCall, Output = ExtensionToolOutput>`. - Changed `ToolContributor::tools` to return extension executors directly instead of `ToolBundle`s. - Updated core’s extension tool handler/registry/router path to adapt those extension executors into the existing native `ToolInvocation` runtime path. - Added focused coverage for extension tools being registered, model-visible, dispatchable, and not replacing built-in tools. ## Verification - `cargo test -p codex-tools` - `cargo test -p codex-extension-api`	2026-05-13 12:12:06 +02:00
jif-oai	1824685a00	feat: extract shared tool executor interface (#22359 ) ## Why Codex still models model-visible tools and executable behavior largely inside `codex-core`, which makes it harder to evolve the tool system toward a single reusable abstraction for built-ins, MCP-backed tools, dynamic tools, and later tools injected from outside core. This PR takes the next incremental step in that direction by moving the common execution-facing pieces out of core and separating them from core-only orchestration. The intent is to let shared tool abstractions improve in one place, while `codex-core` keeps the parts that are still inherently host-specific today, such as `ToolInvocation`, dispatch wiring, and hook integration. This PR is mostly moving things around. The only interesting piece is this abstraction: https://github.com/openai/codex/pull/22359/changes#diff-81af519002548ba51ed102bdaaf77e081d40a1e73a6e5f9b104bbbc96a6f1b3dR13 ## What changed - Added `codex_tools::ToolExecutor<Invocation>` as the shared execution trait for model-visible tools. - Moved the reusable execution support types from `codex-core` into `codex-tools`: - `FunctionCallError` - `ToolPayload` - `ToolOutput` - Refactored core tool implementations so that execution behavior lives on `ToolExecutor<ToolInvocation>`, while `ToolHandler` remains the core-local extension point for hook payloads, telemetry tags, diff consumers, and other orchestration concerns. - Kept the registry and dispatch flow behaviorally unchanged while making the shared/extracted boundary explicit across built-in, MCP, dynamic, extension-backed, shell, and multi-agent tool handlers. ## Verification - `cargo test -p codex-tools` - `just fix -p codex-tools` - `just fix -p codex-core` - `cargo test -p codex-core` progressed through the updated tool surfaces and then hit the existing unrelated multi-agent stack overflow in `tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.	2026-05-13 11:31:27 +02:00
jif-oai	7e97da7c13	chore: Keep view_image sandbox test in temp dir (#22355 ) ## Summary - move the `view_image` sandbox filesystem-read unit test onto a temporary cwd - keep the turn cwd and selected turn environment cwd aligned inside the test - avoid leaving `core/image.png` behind in the repo checkout after the test runs ## Root cause The test wrote `image.png` beneath `turn.cwd`, and the shared session test helper defaults that cwd to the current repo directory when no override is provided. ## Validation - `just fmt` - `cargo test -p codex-core tools::handlers::view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads`	2026-05-13 10:39:07 +02:00

1 2 3 4 5 ...

776 Commits