codex

mirror of https://github.com/openai/codex.git synced 2026-05-23 20:44:50 +00:00

Author	SHA1	Message	Date
Michael Bolin	ab1d2082e6	permissions: move workspace roots onto thread state	2026-05-12 07:59:00 -07:00
Michael Bolin	429366ef78	core: box multi-agent handler futures	2026-05-12 07:58:59 -07:00
pakrym-oai	c9e46ed639	[codex] Make handlers own parallel tool support (#22254 ) ## Why `ToolRouter::tool_supports_parallel()` was still consulting configured specs when a handler lookup missed, even though parallel schedulability is really a property of the executable handler. Keeping that metadata on `ConfiguredToolSpec` duplicated state between the model-visible spec layer and the runtime handler layer. This change makes handlers the sole source of truth for parallel tool support and removes the extra spec wrapper that only existed to carry duplicated metadata. ## What changed - removed `ConfiguredToolSpec` and store plain `ToolSpec` values in the registry/router builder path - changed `ToolRouter::tool_supports_parallel()` to consult only the handler registry and fall back to `false` - simplified spec collection and test helpers to operate directly on `ToolSpec` - updated router/spec tests to cover handler-owned parallel behavior and the no-handler fallback ## Validation - `cargo test -p codex-tools` - `cargo test -p codex-core mcp_parallel_support_uses_handler_data` - `cargo test -p codex-core deferred_responses_api_tool_serializes_with_defer_loading` - `cargo test -p codex-core tools_without_handlers_do_not_support_parallel` - `cargo test -p codex-core request_plugin_install_can_be_registered_without_search_tool` ## Docs No documentation updates needed.	2026-05-11 22:26:33 -07:00
pakrym-oai	79c65f816c	[codex] Filter legacy warning messages during compaction (#22243 ) ## Why Older sessions can contain model-warning records persisted as `user` messages, including the unified exec process-limit warning, the `apply_patch`-via-`exec_command` warning, and the model-mismatch high-risk cyber fallback warning. Those warnings are no longer produced as conversation history items, but when old sessions compact they should still be recognized as injected context rather than preserved as real user turns. ## What changed - Removed `record_model_warning` and the production paths that emitted these warning messages into conversation history. - Added `LegacyUnifiedExecProcessLimitWarning`, `LegacyApplyPatchExecCommandWarning`, and `LegacyModelMismatchWarning` contextual fragments that are used only for matching old persisted messages. - Registered the legacy fragments with contextual user message detection so compaction filters them through the existing fragment path. - Added focused compaction coverage for old warning messages being dropped during compacted-history processing. ## Testing - `cargo test -p codex-core warning` - `just fix -p codex-core`	2026-05-11 19:51:51 -07:00
Abhinav	d08906a944	Support PreToolUse updatedInput rewrites (#20527 ) ## Why `PreToolUse` already exposes `updatedInput` in its hook output schema, but Codex currently rejects it instead of applying the rewrite. That leaves hook authors unable to make the documented pre-execution adjustment to a tool call before it runs. ## What - Accept `updatedInput` from `PreToolUse` hooks when paired with `permissionDecision: "allow"`. - Apply the rewritten input before dispatch so the tool executes the updated payload, not the original one. - Preserve the stable hook-facing compatibility shapes that participating tool handlers expose: - Bash-like tools (`shell`, `container.exec`, `local_shell`, `shell_command`, `exec_command`) use `{ "command": ... }`. - `apply_patch` exposes its patch body through the same command-shaped hook contract. - MCP tools expose their JSON argument object directly. - Keep each participating tool handler responsible for translating hook-facing `updatedInput` back into its concrete invocation shape. ## Verification Direct Bash-like rewrite coverage: - `pre_tool_use_rewrites_shell_before_execution` - `pre_tool_use_rewrites_container_exec_before_execution` - `pre_tool_use_rewrites_local_shell_before_execution` - `pre_tool_use_rewrites_shell_command_before_execution` - `pre_tool_use_rewrites_exec_command_before_execution` These cases assert that each supported Bash-like surface runs only the rewritten command while the hook still observes the original `{ "command": ... }` input. `pre_tool_use_rewrites_apply_patch_before_execution` - Model emits one patch. - Hook swaps in a different patch. - Asserts only the rewritten file is created, and the hook saw the original patch. `pre_tool_use_rewrites_code_mode_nested_exec_command_before_execution` - Model runs one nested shell command from code mode. - Hook rewrites it. - Asserts only the rewritten command runs, and the hook saw the original nested input. `pre_tool_use_rewrites_mcp_tool_before_execution` - Model calls the RMCP echo tool. - Hook rewrites the MCP arguments. - Asserts the MCP server receives and returns the rewritten message, not the original one.	2026-05-11 22:27:24 -04:00
starr-openai	17ed5ad0b0	Apply sandbox context to local view_image reads (#21861 ) ## Summary - create a selected-cwd filesystem sandbox context for view_image metadata and file reads in both local and remote environments - add a local restricted-profile regression test for the previously unsandboxed read path ## Validation - just fmt - bazel test --bes_backend= --bes_results_url= --test_output=errors --test_filter=view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads //codex-rs/core:core-unit-tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 18:48:43 -07:00
pakrym-oai	ed5944ba1d	Simplify MCP tool handler plumbing (#21595 ) ## Why The MCP tool path had accumulated a few core-owned special cases: a dedicated payload variant, resolver plumbing, a legacy `AfterToolUse` translation path, and a side channel for parallel-call metadata. That made `ToolRegistry` and the spec builder know more about MCP than they needed to. This change moves MCP-specific execution details back onto `ToolInfo` and `McpHandler` so `codex-core` can treat MCP calls like normal function calls while still preserving MCP-specific dispatch and telemetry behavior where it belongs. ## What changed - removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and the remaining registry-side MCP resolver path - stored MCP routing metadata directly on `McpHandler` and `ToolInfo`, including `supports_parallel_tool_calls` - deleted the legacy `AfterToolUse` consumer in `core`, which removes the need for handler-specific `after_tool_use_payload` implementations - switched tool-result telemetry to handler-provided tags and kept MCP-specific dispatch payload construction inside the handler - simplified tool spec planning/building by passing `ToolInfo` directly and dropping the direct/deferred MCP wrapper structs and the parallel-server side table ## Testing - `cargo check -p codex-core -p codex-mcp -p codex-otel` - `cargo test -p codex-core mcp_parallel_support_uses_exact_payload_server` - `cargo test -p codex-core direct_mcp_tools_register_namespaced_handlers` - `cargo test -p codex-core search_tool_description_lists_each_mcp_source_once` - `cargo test -p codex-mcp list_all_tools_uses_startup_snapshot_while_client_is_pending` - `just fix -p codex-core -p codex-mcp -p codex-otel`	2026-05-12 00:11:31 +00:00
starr-openai	22e84c49d0	Support multi-environment apply_patch selection (#21617 ) ## Summary - add multi-environment apply_patch routing for both freeform and function-call tool flows - parse and reconcile the optional environment selector in the main apply_patch parser, then verify against the selected environment in the handler - carry environment_id through runtime and approval surfaces so remote-targeted patches stay explicit end to end ## Testing - just fmt - remote exec-server e2e: `cargo test -p codex-core --test all apply_patch_multi_environment_uses_remote_executor -- --nocapture` on dev via `scripts/test-remote-env.sh` --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 16:33:44 -07:00
viyatb-oai	6506765168	fix(permissions): preserve managed deny-read during escalation (#15977 ) ## Why Managed filesystem `deny_read` requirements are administrator-enforced restrictions on specific paths. Once those requirements are active, Codex should not drop them just because an execution path would otherwise leave the sandbox. Before this change, an explicit escalation, a prefix-rule allow, a sandbox-denial retry, or an app-server legacy sandbox override could rebuild the runtime policy without those managed read-deny entries and expose a path the administrator had marked unreadable. This is narrower than general sandbox-mode constraints. If an enterprise only sets `allowed_sandbox_modes`, a trusted `prefix_rule(..., decision = "allow")` can still run its matching command unsandboxed; this PR only preserves managed filesystem `deny_read` restrictions across those paths. ## What Changed - Mark filesystem policies built from managed `deny_read` requirements so callers can tell when those deny entries must survive escalation. - Preserve managed deny-read entries when runtime permission profiles are rebuilt through protocol, app-server, or legacy sandbox-policy compatibility paths. - Keep managed deny-read attempts inside the selected sandbox on the first attempt and after sandbox-denial retries. - Preserve the same behavior in the zsh-fork escalation path, including prefix-rule-driven escalation. - Add a regression test showing the opposite case too: without managed deny-read, a prefix-rule allow still chooses unsandboxed execution. ## Verification Targeted automated verification: ```shell cargo test -p codex-core shell_request_escalation_execution_is_explicit -- --nocapture cargo test -p codex-core prefix_rule_uses_unsandboxed_execution_without_managed_deny_read -- --nocapture cargo test -p codex-core prefix_rule_preserves_managed_deny_read_escalation -- --nocapture cargo test -p codex-protocol permission_profile_round_trip_preserves_filesystem_policy_metadata -- --nocapture cargo test -p codex-protocol preserving_deny_entries_keeps_unrestricted_policy_enforceable -- --nocapture cargo test -p codex-app-server-protocol permission_profile_file_system_permissions_preserves_policy_metadata -- --nocapture cargo check -p codex-app-server -p codex-tui ``` Smoke-test invocations: ```shell # macOS exact deny + allowed control codex exec --skip-git-repo-check -C "$ROOT" \ -c 'default_permissions="deny_read_smoke"' \ -c 'permissions.deny_read_smoke.filesystem={":minimal"="read",":project_roots"={"."="write","secrets"="none","future-secret"="none","*/.env"="none"}}' \ 'Run shell commands only. Print the contents of allowed.txt. Then test whether reading secrets/exact-secret.txt succeeds without printing that file if it does. End with exactly two lines: allowed=<contents> and exact_secret=<BLOCKED or READABLE>.' # Linux exact deny + allowed control codex exec --skip-git-repo-check -C "$ROOT" \ -c 'default_permissions="deny_read_smoke"' \ -c 'permissions.deny_read_smoke.filesystem={":minimal"="read",glob_scan_max_depth=3,":project_roots"={"."="write","secrets"="none","future-secret"="none","*/.env"="none"}}' \ 'Run shell commands only. Print the contents of allowed.txt. Then test whether reading secrets/exact-secret.txt succeeds without printing that file if it does. End with exactly two lines: allowed=<contents> and exact_secret=<BLOCKED or READABLE>.' ``` Observed manual smoke matrix: \| Case \| macOS Seatbelt \| Linux bubblewrap \| \| --- \| --- \| --- \| \| `cat allowed.txt` \| Pass \| Pass \| \| `cat secrets/exact-secret.txt` \| Blocked \| Blocked \| \| `cat envs/root.env` \| Blocked \| Blocked \| \| `cat envs/nested/one.env` \| Blocked \| Blocked \| \| `cat envs/nested/two.env` \| Blocked \| Blocked \| \| `cat alias-to-secrets/exact-secret.txt` \| Blocked \| Blocked \| \| Missing denied path \| A file created after sandbox setup remained unreadable \| Creation was blocked by the reserved missing-path placeholder, and the placeholder was cleaned up after exit \| \| Real `codex exec` shell turn \| Pass \| Pass \| Notes: - The Linux smoke run used the fallback glob walker because the devbox did not have `rg` installed. - The smoke matrix verifies the end-to-end filesystem behavior on macOS and Linux; the escalation-specific behavior is covered by the focused tests above. --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Charlie Marsh <charliemarsh@openai.com>	2026-05-11 11:49:44 -07:00
jif-oai	8e12c12a07	feat: move extensions tool (#22163 ) This PR is just moving stuff around	2026-05-11 17:14:43 +02:00
jif-oai	672cc1f669	feat: wire extension tool bundles into core (#22147 ) ## Why This is the next narrow step toward moving concrete tool families out of core. After #22138 introduced `codex-tool-api`, we still needed a real end-to-end seam that lets an extension own an executable tool definition once and have core install it without the temporary `extension-api` wrapper or a dependency on `codex-tools`. `codex-tool-api` is the small extension-facing execution contract, while `codex-tools` still has a different job: host-side shared tool metadata and planning logic that is not “run this contributed tool”, like spec shaping, namespaces, discovery, code-mode augmentation, and MCP/dynamic-to-Responses API conversion ## What changed - Moved the shared leaf tool-spec and JSON Schema types into `codex-tool-api`, so the executable contract now lives with [`ToolBundle`](`c538758095/codex-rs/tool-api/src/bundle.rs (L19-L70)`). - Replaced the temporary extension-side tool wrapper with direct `ToolBundle` use in `codex-extension-api`. - Taught core to collect contributed bundles, include them in spec planning, register them through [`ToolRegistryBuilder::register_tool_bundle`](`c538758095/codex-rs/core/src/tools/registry.rs (L653-L667)`), and dispatch them through the existing router/runtime path. - Added focused coverage for contributed tools becoming model-visible and dispatchable, plus spec-planning coverage for contributed function and freeform tools. ## Verification - Added `extension_tool_bundles_are_model_visible_and_dispatchable` in `core/src/tools/router_tests.rs`. - Added spec-plan coverage in `core/src/tools/spec_plan_tests.rs` for contributed extension bundles. ## Related - Follow-up to #22138	2026-05-11 16:42:29 +02:00
jif-oai	436c0df658	extension: wire extension registries into sessions (#21737 ) ## Why [#21736](https://github.com/openai/codex/pull/21736) introduces the typed extension API, but the runtime does not yet carry a registry through thread/session startup or give contributors host-owned stores to read from. This PR wires that host-side path so later feature migrations can move product-specific behavior behind typed contributions without adding another bespoke seam directly to `codex-core`. ## What changed - Thread `ExtensionRegistry<Config>` through `ThreadManager`, `CodexSpawnArgs`, `Session`, and sub-agent spawn paths. - Wire `ThreadStartContributor` and `ContextContributor` - Expose the small supporting surface needed by non-core callers that construct threads directly, including `empty_extension_registry()` through `codex-core-api`. This PR lands the host plumbing only: the app-server registry is still empty, and concrete feature migrations are intended to follow separately.	2026-05-11 11:38:18 +02:00
Charlie Marsh	7c9731c9af	Enable `--deny-warnings` for `cargo shear` (#21616 ) ## Summary In https://github.com/openai/codex/pull/21584, we disabled doctests for crates that lack any doctests. We can enforce that property via `cargo shear --deny-warnings`: crates that lack doctests will be flagged if doctests are enabled, and crates with doctests will be flagged if doctests are disabled. A few additional notes: - By adding `--deny-warnings`, `cargo shear` also flagged a number of modules that were not reachable at all. Some of those have been removed. - This PR removes a usage of `windows_modules!` (since `cargo shear` and `rustfmt` couldn't see through it) in favor of simple `#[cfg(target_os = "windows")]` macros. As a consequence, many of these files exhibit churn in this PR, since they weren't being formatted by `rustfmt` at all on main. - Again, to make the code more analyzable, this PR also removes some usages of `#[path = "cwd_junction.rs"]` in favor of a more standard module structure. The bin sidecar structure is still retained, but, e.g., `windows-sandbox-rs/src/bin/command_runner.rs‎` was moved to `windows-sandbox-rs/src/bin/command_runner/main.rs`, and so on. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-08 20:29:00 +00:00
pakrym-oai	46e2250bcf	[codex] Remove legacy after tool use hooks (#21805 ) ## Why The legacy `AfterToolUse` hook path was still wired through core tool dispatch even though the hooks registry never populated any handlers for it. The supported hook surface is `PostToolUse`, so the old infrastructure was dead code on the hot path. ## What changed - Removed the legacy `AfterToolUse` dispatch from `codex-core` tool execution. - Removed the unused legacy hook payload types and exports from `codex-hooks`. - Simplified legacy notify handling now that `HookEvent` only carries `AfterAgent`. ## Validation - `cargo test -p codex-hooks` - `cargo test -p codex-core registry`	2026-05-08 13:20:05 -07:00
pakrym-oai	e783341b70	[codex] Delete function-style apply_patch (#21651 ) ## Why `apply_patch` is now a freeform/custom tool. Keeping the old JSON/function-style registration and parsing path left another way for models and tests to invoke `apply_patch`, which made the tool surface harder to reason about. ## What changed - Removed the `ApplyPatchToolType::Function` variant, JSON `apply_patch` spec, and handler support for function payloads. - Kept `apply_patch_tool_type = freeform` as the supported model metadata path, including Bedrock catalog metadata. - Migrated `apply_patch` tests and SSE fixtures to custom/freeform tool calls. ## Verification - `cargo test -p codex-tools -p codex-protocol -p codex-model-provider` - `cargo test -p codex-core tools::handlers::apply_patch --lib` - `cargo test -p codex-core --test all apply_patch_tool_executes_and_emits_patch_events` - `cargo test -p codex-core --test all apply_patch_reports_parse_diagnostics` - `cargo test -p codex-exec test_apply_patch_tool` - `just fix -p codex-core` - `just fix -p codex-tools -p codex-protocol -p codex-model-provider -p codex-exec`	2026-05-08 13:00:57 -07:00
Jiaming Zhang	5f4d0ec343	[codex] request desktop attestation from app (#20619 ) ## Summary TL;DR: teaches `codex-rs` / app-server to request a desktop-provided attestation token and attach it as `x-oai-attestation` on the scoped ChatGPT Codex request paths. ![DeviceCheck attestation interface](https://raw.githubusercontent.com/openai/codex/dev/jm/devicecheck-diagram-assets/pr-assets/devicecheck-attestation-interface.png) ## Details This PR teaches the Codex app-server runtime how to request and attach an attestation token. It does not generate DeviceCheck tokens directly; instead, it relies on the connected desktop app to advertise that it can generate attestation and then asks that app for a fresh header value when needed. The flow is: 1. The Codex desktop app connects to app-server. 2. During `initialize`, the app can advertise that it supports `requestAttestation`. 3. Before app-server calls selected ChatGPT Codex endpoints, it sends the internal server request `attestation/generate` to the app. 4. app-server receives a pre-encoded header value back. 5. app-server forwards that value as `x-oai-attestation` on the scoped outbound requests. The code in this repo is mostly protocol and runtime plumbing: it adds the app-server request/response shape, introduces an attestation provider in core, wires that provider into Responses / compaction / realtime setup paths, and covers the intended scoping with tests. The signed macOS DeviceCheck generation remains owned by the desktop app PR. ## Related PR - Codex desktop app implementation: https://github.com/openai/openai/pull/878649 ## Validation <details> <summary>Tests run</summary> ```sh cargo test -p codex-app-server-protocol cargo test -p codex-core attestation --lib cargo test -p codex-app-server --lib attestation ``` Also ran: ```sh just fix -p codex-core just fix -p codex-app-server just fix -p codex-app-server-protocol just fmt just write-app-server-schema ``` </details> <details> <summary>E2E DeviceCheck validation</summary> First validated the signed desktop app boundary directly: launched a packaged signed `Codex.app`, sent `attestation/generate`, decoded the returned `v1.` attestation header, and validated the extracted DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using bundle ID `com.openai.codex`. Apple returned `status_code: 200` and `is_ok: true`. Then ran the fuller app + app-server flow. The packaged `Codex.app` launched a current-branch app-server via `CODEX_CLI_PATH`, and a local MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server requested `attestation/generate` from the real Electron app process, and the intercepted `/backend-api/codex/responses` traffic included `x-oai-attestation` on both routes: ```text GET /backend-api/codex/responses Upgrade: websocket x-oai-attestation: present POST /backend-api/codex/responses Upgrade: none x-oai-attestation: present ``` The captured header decoded to a DeviceCheck token that also validated with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`, team `2DC432GLL2`). </details> --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-08 12:36:02 -07:00
pakrym-oai	61142b6169	Remove ToolName display helper (#21465 ) ## Why `ToolName::display()` made it too easy to flatten tool identity and accidentally compare rendered strings. Tool identity should stay structural until a legacy string boundary actually requires the flattened spelling. ## What - Removes `ToolName::display()` and relies on the existing `Display` impl for messages and errors. - Adds structural ordering for `ToolName` and uses it for sorting/deduping deferred tools. - Carries `ToolName` through tool/sandbox plumbing, flattening only at legacy boundaries such as hook payloads, telemetry tags, and Responses tool names. - Updates MCP normalization tests to assert `ToolName` structure instead of rendered strings. ## Testing - `cargo test -p codex-mcp test_normalize_tools` - `cargo test -p codex-core unavailable_tool` - `just fix -p codex-protocol` - `just fix -p codex-mcp` - `just fix -p codex-core`	2026-05-08 12:17:48 -07:00
starr-openai	1bfc3d9773	Route view_image through selected environments Route view_image through selected environments so image reads use the selected turn environment and cwd, with schema exposure limited to multi-environment toolsets.\n\nCo-authored-by: Codex <noreply@openai.com>	2026-05-08 01:29:03 +00:00
pakrym-oai	566f2cb612	[codex] Move tool specs onto handlers (#21461 ) ## Why This is the next stacked step after deleting the tool-handler kind indirection. Specs should come from the registered handlers themselves so registry construction has a single source of truth for handler behavior and exposed tool definitions. ## What changed - Added `ToolHandler::spec()` plus handler-provided parallel/code-mode metadata, and made `ToolRegistryBuilder::register_handler` automatically collect specs from registered handlers. - Moved builtin tool spec construction into the corresponding handlers and their adjacent `_spec` modules, including shell, unified exec, apply patch, view image, request plugin install, tool search, MCP resource, goals, planning, permissions, agent jobs, and multi-agent tools. - Reworked configurable handlers to receive their tool-building options through constructors, with non-optional handler options where the handler is always spec-backed. Shell fallback handlers keep an explicit no-spec mode because they are also registered as hidden dispatch aliases. - Kept `CodeModeExecuteHandler` on the explicit configured wrapper so the code-mode exec spec can still be built from the nested registry. ## Verification - `cargo check -p codex-core` - `cargo test -p codex-core tools::spec_plan::tests` - `cargo test -p codex-core tools::spec::tests` - `cargo test -p codex-core tools::handlers::multi_agents_spec::tests` - `RUST_MIN_STACK=16777216 cargo test -p codex-core tools::handlers::multi_agents::tests` - `cargo test -p codex-core tools::handlers::apply_patch::tests` - `cargo test -p codex-core tools::handlers::unified_exec::tests` - `just fix -p codex-core` - `git diff --check`	2026-05-07 10:48:36 -07:00
pakrym-oai	857e731478	[codex] Remove string-keyed MCP tool maps (#21454 ) ## Summary This PR removes the synthetic `HashMap<String, ToolInfo>` keys from MCP tool discovery. `McpConnectionManager::list_all_tools()` now returns normalized `Vec<ToolInfo>`, and downstream code derives identity from `ToolInfo::canonical_tool_name()`. The motivation is to keep model-visible tool identity on `ToolName`/`ToolInfo` instead of parallel string map keys, so future namespace changes do not have to preserve otherwise-unused lookup keys. ## Changes - Rename the MCP normalization path from `qualify_tools` to `normalize_tools_for_model` and return tool values directly. - Flow MCP tool lists through connectors, plugin injection, router/spec building, code mode, and tool search as vectors/slices. - Keep direct/deferred subtraction local to `mcp_tool_exposure`, using `ToolName` values. - Update tests to compare `ToolName` instances where MCP identity matters. ## Validation - `cargo test -p codex-mcp test_normalize_tools` - `cargo test -p codex-core mcp_tool_exposure` - `cargo test -p codex-core direct_mcp_tools_register_namespaced_handlers` - `cargo test -p codex-core search_tool_registers_namespaced_mcp_tool_aliases` - `just fix -p codex-mcp` - `just fix -p codex-core`	2026-05-07 10:16:10 -07:00
jif-oai	9b6c6f7a01	fix: preserve exact turn diffs after partial apply_patch failures (#21518 ) ## Why Follow-up to #21180: turn diffs are operation-backed now, but a failed `apply_patch` can still leave exact filesystem mutations behind. For example, a move can write the destination file before failing to remove the source. Treating the whole call as unknowable then drops a change that Codex actually knows happened, so the emitted turn diff can drift from the workspace. ## What changed - [`apply-patch`](`f55724e027/codex-rs/apply-patch/src/lib.rs (L248-L345)`) now returns `ApplyPatchFailure` with the exact committed prefix accumulated before an error. If a write failure may already have mutated the target, the delta is marked inexact instead of being reused blindly. - Move handling now records the destination write before attempting source removal, so a partially failed move can still report the destination file that definitely landed ([code](`f55724e027/codex-rs/apply-patch/src/lib.rs (L463-L521)`)). - [`ApplyPatchRuntime`](`f55724e027/codex-rs/core/src/tools/runtimes/apply_patch.rs (L49-L67)`) now accumulates committed deltas across attempts and forwards them even when the visible tool result is failed or sandbox-denied ([runtime path](`f55724e027/codex-rs/core/src/tools/runtimes/apply_patch.rs (L223-L250)`), [event path](`f55724e027/codex-rs/core/src/tools/events.rs (L215-L225)`)). - `TurnDiffTracker` now consumes committed exact deltas rather than only fully successful patches; exact-empty failures leave the aggregate unchanged, while inexact deltas still invalidate it. ## Verification - Added a regression test covering a failed move that still emits the committed destination diff: [`apply_patch_failed_move_preserves_committed_destination_diff`](`f55724e027/codex-rs/core/tests/suite/apply_patch_cli.rs (L1517-L1586)`). - Kept explicit coverage that an inexact delta clears the aggregate instead of publishing a guessed diff: [`apply_patch_clears_aggregated_diff_after_inexact_delta`](`f55724e027/codex-rs/core/tests/suite/apply_patch_cli.rs (L1589-L1655)`). --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-07 18:05:45 +02:00
jif-oai	f7e8ff8e50	Make turn diff tracking operation backed (#21180 ) ## Summary - replace filesystem-based turn diff tracking with an operation-backed accumulator - preserve enough verified apply_patch state to render move-overwrite cases correctly - keep the turn/diff/updated contract intact while removing remote-only turn-diff test skips This takes the assumption that no 3P services rely on the output format of `apply_patch` ## Why For the CCA file system isolation push --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-07 11:33:47 +02:00
jif-oai	b2268999fe	feat: make built-in MCPs first-class runtime servers (#21356 ) ## DISCLAIMER This is experimental and no production service must rely on this ## Why Built-in MCPs are product-owned runtime capabilities, but they were previously flattened into the same config-backed stdio path as user-configured servers. That made them depend on a hidden `codex builtin-mcp` re-exec path, exposed them through config-oriented CLI flows, and erased distinctions the runtime needs to preserve—most notably whether an MCP call should count as external context for memory-mode pollution. ## What changed - Model product-owned built-ins separately from config-backed MCP servers via `BuiltinMcpServer` and `EffectiveMcpServer`. - Launch built-ins in process through a reusable async transport instead of the hidden `builtin-mcp` stdio subcommand. - Keep config-oriented CLI operations such as `codex mcp list/get/login/logout` scoped to configured servers, while merging built-ins only into the effective runtime server set. - Retain server metadata after launch so parallel-tool support and context classification come from the live server set; built-in `memories` is now classified as local Codex state rather than external context. ## Test plan - `cargo test -p codex-mcp` - `cargo test -p codex-core --test suite builtin_memories_mcp_call_does_not_mark_thread_memory_mode_polluted_when_configured` --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-07 10:36:32 +02:00
pakrym-oai	a8488fec5e	Revert state DB injection and agent graph store (#21481 ) ## Why Reverts #20689 to restore the previous optional state DB plumbing. The conflict resolution keeps the newer installation ID and session/thread identity changes that landed after #20689, while removing the mandatory state DB and agent graph store dependency from ThreadManager construction. ## What changed - Restored `Option<StateDbHandle>` through app-server, MCP server, prompt debug, and test entry points. - Removed the `codex-core` dependency on `codex-agent-graph-store` and reverted descendant lookup back to the existing state DB path when available. - Kept newer `installation_id` forwarding by passing it beside the optional DB handle. - Kept local thread-name updates working when the optional state DB handle is absent. ## Validation - `git diff --check` - `cargo test -p codex-thread-store` - `cargo test -p codex-state -p codex-rollout -p codex-app-server-protocol` - Attempted `env CARGO_INCREMENTAL=0 cargo test -p codex-core -p codex-app-server -p codex-app-server-client -p codex-mcp-server -p codex-thread-manager-sample -p codex-tui`; blocked locally by a rustc ICE while compiling `v8 v146.4.0` with `rustc 1.93.0 (254b59607 2026-01-19)` on `aarch64-apple-darwin`.	2026-05-06 22:48:29 -07:00
pakrym-oai	e394625ea2	[codex] Delete tool handler plan indirection (#21427 ) ## Why The spec split in the parent PR still left an intermediate registry plan that recorded `ToolHandlerKind` values and translated them into concrete handlers later. That kept tool registration dependent on static enum bookkeeping instead of registering handlers from the same code that assembles their specs. ## What Changed - Make `build_tool_registry_builder` register concrete handlers directly while adding specs. - Add small `ToolRegistryBuilder` helpers for spec augmentation and nested code-mode inspection. - Remove `ToolHandlerKind`, `ToolHandlerSpec`, and `ToolRegistryPlan`. - Update spec-plan tests to assert against the built `ToolRegistry` instead of static handler descriptors. ## Validation - `cargo check -p codex-core` - `cargo test -p codex-core tools::spec_plan::tests` - `cargo test -p codex-core tools::spec::tests` - `just fix -p codex-core`	2026-05-06 20:36:24 -07:00
pakrym-oai	9417cf9696	[codex] Move tool specs into core handlers (#21416 ) ## Why This is the first mechanical slice of moving tool spec ownership toward the handlers. `codex-tools` should keep shared primitives and conversion helpers, while builtin tool specs and registration planning live in `codex-core` with the handlers that own those tools. Keeping this PR to relocation and import updates isolates the copy/move review from the later logic change that wires specs through registered handlers. ## What changed - Moved builtin tool spec constructors from `codex-rs/tools/src` into `codex-rs/core/src/tools/handlers/*_spec.rs` or nearby core tool modules. - Moved the registry planning code into `codex-rs/core/src/tools/spec_plan.rs` and its associated types/tests into core. - Kept shared primitives in `codex-tools`, including `ToolSpec`, schema/types, discovery/config primitives, dynamic/MCP conversion helpers, and code-mode collection helpers. - Updated handlers that referenced moved argument types or tool-name constants to use the core spec modules. - Moved spec tests next to the moved spec modules. ## Verification - `cargo check -p codex-tools` - `cargo check -p codex-core` - `cargo test -p codex-tools` - `cargo test -p codex-core _spec::tests` - `cargo test -p codex-core tools::spec_plan::tests` - `just fix -p codex-tools` - `just fix -p codex-core` Note: I also tried the broader `cargo test -p codex-core tools::`; it reached the moved spec-plan/spec tests successfully, then aborted with a stack overflow in `tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`, which is outside this spec relocation.	2026-05-06 15:40:50 -07:00
pakrym-oai	b9c50a53d7	[codex] Split tool handlers into separate files (#21395 ) ## Why Several tool handler modules still bundled multiple `ToolHandler` implementations in one file. That made the handler directory harder to navigate and made otherwise local handler edits land in large shared modules. ## What - Split grouped tool handlers into one handler file each for agent jobs, goals, MCP resources, shell tools, and unified exec. - Kept shared parsing, payload, and runtime helpers in the existing parent modules, with re-exports preserving the existing handler import paths. - Updated the shell handler tests to construct `ShellCommandHandler` through the existing `ShellCommandBackendConfig` conversion now that the backend detail lives with the shell-command handler. ## Validation - `cargo check -p codex-core` - `cargo clippy -p codex-core --lib -- -D warnings` - `git diff --check -- codex-rs/core/src/tools/handlers` Targeted `codex-core` handler tests did not run locally because `core_test_support` currently fails to compile before reaching these tests due to an unresolved `similar` import.	2026-05-06 13:12:24 -07:00
jif-oai	8f3bb355f4	Move installation ID resolution out of core startup (#21182 ) ## Summary - resolve or inject the installation ID before core startup and pass it through `ThreadManager`, `CodexSpawnArgs`, and `Session` as a plain `String` - keep child sessions on the parent installation ID instead of rediscovering it inside core - propagate installation ID startup failures in `mcp-server` instead of panicking ## Why Core was still touching the filesystem on the session startup path to discover `installation_id`. This moves that work to the outer host boundary so core no longer depends on `codex_home` reads during session construction. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-06 10:48:54 +00:00
Rasmus Rygaard	7e310bc7f3	Inject state DB, agent graph store (#20689 ) ## Why We want the agent graph store to be passed down the stack as a real dependency, the same way we already treat the thread store. This will let us inject the agent graph store as a real dependency and support implementations other than the local SQLite-backed one. Right now most code instantiates a state DB and an agent graph store just-in-time. Ideally, we would not depend on the state DB directly but only read through the higher-level interfaces. This change makes the dependency boundaries explicit and moves state DB initialization to process bootstrap instead of hiding it inside local store implementations. ## What changed - `ThreadManager` now requires a `StateDbHandle` and an `AgentGraphStore` at construction time instead of treating them as optional internals. - The local store constructors no longer lazily initialize SQLite. Callers now initialize the state DB once per process and use that shared handle to build: - `LocalThreadStore` - `LocalAgentGraphStore` - App bootstraps (`app-server`, `mcp-server`, `prompt_debug`, and the thread-manager sample) now initialize the state DB up front and inject the resulting handle down the stack. - `app-server` now consistently uses its process-scoped state DB handle instead of reopening SQLite or trying to recover it from loaded threads. - Device-key storage now reuses the shared state DB handle instead of maintaining its own lazy opener. - The thread archive / descendant traversal paths now use the injected `AgentGraphStore` instead of reaching through local thread-store-specific state. ## Verification - `cargo check -p codex-core -p codex-thread-store -p codex-app-server -p codex-mcp-server -p codex-thread-manager-sample --tests` - `cargo test -p codex-thread-store` - `cargo test -p codex-core thread_manager_accepts_separate_agent_graph_store_and_thread_store -- --nocapture` - `cargo test -p codex-app-server thread_archive_archives_spawned_descendants -- --nocapture`	2026-05-05 21:45:29 +00:00
pakrym-oai	f593323ef1	[codex] Split tool handlers by tool name (#20687 ) ## Why Tool registration used to bind a tool name to a handler externally, which left ownership split between the registry plan and the handler implementation. Some built-in handlers also multiplexed multiple in-core tools by switching on the invoked tool name internally. This moves the registry identity onto the handler itself and makes built-in multi-tool areas use separate concrete handlers, so each registered handler instance owns exactly one tool name and one dispatch path. ## What Changed - Added `ToolHandler::tool_name()` and changed `ToolRegistryBuilder::register_handler` to derive the registry key from the handler. - Split built-in multiplexed handlers into concrete per-tool handlers for unified exec, shell/local shell/container exec, MCP resources, goal tools, and agent job tools. - Kept name-carrying handler instances only where the runtime target is inherently external or dynamic, such as MCP tools, dynamic tools, and unavailable placeholders. - Updated `ToolHandlerKind` and registry-plan construction so plan entries map directly to concrete handler registrations. ## Verification - `cargo test -p codex-tools tool_registry_plan` - `cargo test -p codex-core --lib tools::registry_tests` - `just fix -p codex-tools` - `just fix -p codex-core`	2026-05-05 13:46:45 -07:00
Abhinav	0452dca986	hook trust metadata and enforcement (#20321 ) # Why We want shared hook trust that both the app and the TUI can build on, but the metadata is only useful if runtime behavior agrees with it. This PR adds a single backend trust model for hooks so unmanaged hooks cannot run until the current definition has been reviewed, while managed hooks remain runnable and non-configurable. # What - persist `trusted_hash` alongside hook state in `config.toml` - expose `currentHash` and derived `trustStatus` through `hooks/list` - derive trust from normalized hook definitions so equivalent hooks from `config.toml` and `hooks.json` share the same trust identity - gate unmanaged hooks on trust before they enter the runnable handler set # Reviewer Notes - key file to review is `codex-rs/hooks/src/engine/discovery.rs` - the only core change is schema related	2026-05-05 19:13:55 +00:00
starr-openai	78421face0	Route process tools to selected environments (#20647 ) ## Why When a turn exposes multiple selected environments, shell-style tools need a model-facing way to identify the intended target environment and handlers need to resolve that target before parsing cwd-relative permission fields or launching processes. This PR scopes that rollout to process tools. Filesystem-oriented tools such as `apply_patch`, `view_image`, and `list_dir` are intentionally left for follow-up slices. ## What Changed - Adds an `include_environment_id` option to shell-style tool schema builders. - Exposes optional `environment_id` on `shell`, `shell_command`, and `exec_command` only when `ToolEnvironmentMode::Multiple` is active. - Adds a shared handler helper that parses `environment_id` and `workdir` from JSON function-call arguments and returns the selected `Environment` plus effective absolute cwd. - Uses that helper in `shell`, `shell_command`, and `exec_command` handling so process execution uses the selected environment filesystem and cwd. - Changes `ExecCommandRequest` to carry a required resolved `cwd`, removing the process-manager fallback to the primary turn cwd for new exec commands. - Leaves `write_stdin` unchanged because it targets an existing process id, not a new environment. ## Testing - Added unit coverage for process-tool schema exposure, selected environment resolution, primary fallback, no-environment handling, unknown environment ids, and resolving cwd-relative permission paths against the selected environment cwd. - Added a remote-suite e2e coverage case for `exec_command` routing across explicit zero environments, one local environment, and local+remote environments. - Ran `just fmt` and `git diff --check`. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-05 12:12:03 -07:00
Abhinav	af86be529c	Support PreToolUse additionalContext (#20692 ) # Why `PreToolUse` already exposes `hookSpecificOutput.additionalContext` in the generated hook schema, but the runtime still rejected it as unsupported. That leaves `PreToolUse` out of step with the other context-injecting hooks and prevents hook authors from attaching model-visible guidance to a pending tool call before it runs. # What - Parse `PreToolUse.additionalContext` and carry it through the hook event pipeline. - Record `PreToolUse` context at the hook boundary so successful context is preserved for both allowed and blocked calls without widening the tool registry surface. - Preserve existing deny behavior when context is combined with either `permissionDecision: "deny"` or the legacy `decision: "block"` shape.	2026-05-05 10:29:30 -07:00
jif-oai	70807730f5	tools: remove unused experimental `list_dir` tool (#21170 ) ## Why `list_dir` still carries a full spec/handler/test path, but nothing in the current model catalog advertises it via `experimental_supported_tools`. That leaves us maintaining an environment-backed tool surface that is effectively unused. ## What changed - delete the `list_dir` handler and its tests from `codex-core` - remove the `list_dir` spec builder, handler kind, and registry wiring from `codex-tools` - clean up the remaining internal README and registry tests so they no longer mention the removed tool	2026-05-05 13:11:07 +02:00
rhan-oai	aee1fe2659	[codex-analytics] add item lifecycle timing (#20514 ) ## Why Tool families already disagree on what their existing `duration` fields mean, so lifecycle latency should live on the shared item envelope instead of being inferred from per-tool execution fields. Carrying that envelope through app-server notifications gives downstream consumers one reusable timing signal without pretending every tool has the same execution semantics. ## What changed - Adds `started_at_ms` to core `ItemStartedEvent` values and `completed_at_ms` to core `ItemCompletedEvent` values. - Populates those timestamps in the shared session lifecycle emitters, so protocol-native items get timing without each producer tracking its own clock state. - Exposes `startedAtMs` on app-server `item/started` notifications and `completedAtMs` on `item/completed` notifications. - Maps the lifecycle timestamps through the app-server boundary while leaving legacy-converted notifications nullable when no lifecycle timestamp exists. - Regenerates the app-server JSON schema and TypeScript fixtures for the notification-envelope change and updates downstream fixtures that construct those notifications directly. - Extends the existing web-search and image-generation integration flows to assert the new lifecycle timestamps on the native item events. ## Verification - `cargo check -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-tui -p codex-exec -p codex-app-server-client` - `cargo test -p codex-core --test all web_search_item_is_emitted` - `cargo test -p codex-core --test all image_generation_call_event_is_emitted` - `cargo test -p codex-app-server-protocol` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20514). * #18748 * #18747 * #17090 * #17089 * __->__ #20514	2026-05-04 22:33:20 +00:00
sayan-oai	b9e8df47da	Use MCP server instructions in deferred namespace descriptions (#21053 ) ## Why MCP servers can provide `instructions` that explain what their tools are for. Directly exposed MCP namespaces already use those instructions when a connector description is not available, but deferred `tool_search` results did not preserve that fallback. The direct path falls back from connector metadata to server instructions, while the deferred path only carried `connector_description` and otherwise fell back to generic namespace text. That meant a plain MCP server could provide useful model-facing guidance and still appear as `Tools in the X namespace.` whenever it was discovered lazily through `tool_search`. ## What changed - Store one model-facing `namespace_description` on `ToolInfo`, using connector descriptions for connector-backed tools and server instructions for plain MCP servers. - Thread that namespace description through the `tool_search` source list, search indexing, and returned namespace metadata. - Add an end-to-end regression test for deferred non-app MCP search results exposing server instructions as the namespace description. ## Verification - `cargo test -p codex-tools search_tool_description_lists_each_mcp_source_once --lib` - `cargo test -p codex-core --test all tool_search_uses_non_app_mcp_server_instructions_as_namespace_description`	2026-05-04 19:36:07 +00:00
Ruslan Nigmatullin	4d201e340e	state: pass state db handles through consumers (#20561 ) ## Why SQLite state was still being opened from consumer paths, including lazy `OnceCell`-backed thread-store call sites. That let one process construct multiple state DB connections for the same Codex home, which makes SQLite lock contention and `database is locked` failures much easier to hit. State DB lifetime should be chosen by main-like entrypoints and tests, then passed through explicitly. Consumers should use the supplied `Option<StateDbHandle>` or `StateDbHandle` and keep their existing filesystem fallback or error behavior when no handle is available. The startup path also needs to keep the rollout crate in charge of SQLite state initialization. Opening `codex_state::StateRuntime` directly bypasses rollout metadata backfill, so entrypoints should initialize through `codex_rollout::state_db` and receive a handle only after required rollout backfills have completed. ## What Changed - Initialize the state DB in main-like entrypoints for CLI, TUI, app-server, exec, MCP server, and the thread-manager sample. - Pass `Option<StateDbHandle>` through `ThreadManager`, `LocalThreadStore`, app-server processors, TUI app wiring, rollout listing/recording, personality migration, shell snapshot cleanup, session-name lookup, and memory/device-key consumers. - Remove the lazy local state DB wrapper from the thread store so non-test consumers use only the supplied handle or their existing fallback path. - Make `codex_rollout::state_db::init` the local state startup path: it opens/migrates SQLite, runs rollout metadata backfill when needed, waits for concurrent backfill workers up to a bounded timeout, verifies completion, and then returns the initialized handle. - Keep optional/non-owning SQLite helpers, such as remote TUI local reads, as open-only paths that do not run startup backfill. - Switch app-server startup from direct `codex_state::StateRuntime::init` to the rollout state initializer so app-server cannot skip rollout backfill. - Collapse split rollout lookup/list APIs so callers use the normal methods with an optional state handle instead of `_with_state_db` variants. - Restore `getConversationSummary(ThreadId)` to delegate through `ThreadStore::read_thread` instead of a LocalThreadStore-specific rollout path special case. - Keep DB-backed rollout path lookup keyed on the DB row and file existence, without imposing the filesystem filename convention on existing DB rows. - Verify readable DB-backed rollout paths against `session_meta.id` before returning them, so a stale SQLite row that points at another thread's JSONL falls back to filesystem search and read-repairs the DB row. - Keep `debug prompt-input` filesystem-only so a one-off debug command does not initialize or backfill SQLite state just to print prompt input. - Keep goal-session test Codex homes alive only in the goal-specific helper, rather than leaking tempdirs from the shared session test helper. - Update tests and call sites to pass explicit state handles where DB behavior is expected and explicit `None` where filesystem-only behavior is intended. ## Validation - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout -p codex-thread-store -p codex-app-server -p codex-core -p codex-tui -p codex-exec -p codex-cli --tests` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout state_db_` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout find_thread_path` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout find_thread_path -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout try_init_ -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo clippy -p codex-rollout --lib -- -D warnings` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-thread-store read_thread_falls_back_when_sqlite_path_points_to_another_thread -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-thread-store` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core shell_snapshot` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all personality_migration` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find` - `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find::find_prefers_sqlite_path_by_id -- --nocapture` - `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core interrupt_accounts_active_goal_before_pausing` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-app-server get_auth_status -- --test-threads=1` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-app-server --lib` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout -p codex-app-server --tests` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p codex-thread-store -p codex-core -p codex-app-server -p codex-tui -p codex-exec -p codex-cli` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p codex-app-server` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-core` - `just argument-comment-lint -p codex-core` - `just argument-comment-lint -p codex-rollout` Focused coverage added in `codex-rollout`: - `recorder::tests::state_db_init_backfills_before_returning` verifies the rollout metadata row exists before startup init returns. - `state_db::tests::try_init_waits_for_concurrent_startup_backfill` verifies startup waits for another worker to finish backfill instead of disabling the handle for the process. - `state_db::tests::try_init_times_out_waiting_for_stuck_startup_backfill` verifies startup does not hang indefinitely on a stuck backfill lease. - `tests::find_thread_path_accepts_existing_state_db_path_without_canonical_filename` verifies DB-backed lookup accepts valid existing rollout paths even when the filename does not include the thread UUID. - `tests::find_thread_path_falls_back_when_db_path_points_to_another_thread` verifies DB-backed lookup ignores a stale row whose existing path belongs to another thread and read-repairs the row after filesystem fallback. Focused coverage updated in `codex-core`: - `rollout_list_find::find_prefers_sqlite_path_by_id` now uses a DB-preferred rollout file with matching `session_meta.id`, so it still verifies that valid SQLite paths win without depending on stale/empty rollout contents. `cargo test -p codex-app-server thread_list_respects_search_term_filter -- --test-threads=1 --nocapture` was attempted locally but timed out waiting for the app-server test harness `initialize` response before reaching the changed thread-list code path. `bazel test //codex-rs/thread-store:thread-store-unit-tests --test_output=errors` was attempted locally after the thread-store fix, but this container failed before target analysis while fetching `v8+` through BuildBuddy/direct GitHub. The equivalent local crate coverage, including `cargo test -p codex-thread-store`, passes. A plain local `cargo check -p codex-rollout -p codex-app-server --tests` also requires system `libcap.pc` for `codex-linux-sandbox`; the follow-up app-server check above used `CODEX_SKIP_VENDORED_BWRAP=1` in this container.	2026-05-04 11:46:03 -07:00
starr-openai	905987c08f	Prepare selected environment plumbing (#20669 ) ## Why This is a prep PR in the multi-environment process-tool stack. It separates ownership/config cleanup from the behavior change that teaches process tools to route by selected environment, so the follow-up PR can focus on model-facing `environment_id` behavior. ## Stack 1. https://github.com/openai/codex/pull/20646 - `EnvironmentContext` rendering for selected environments 2. https://github.com/openai/codex/pull/20669 - selected-environment ownership and tool config prep (this PR) 3. https://github.com/openai/codex/pull/20647 - process-tool `environment_id` routing ## What Changed - keep the resolved turn environment list wrapped in `ResolvedTurnEnvironments` through `TurnContext` instead of unwrapping it back to a raw `Vec` - add `TurnContext::resolve_path_against` so cwd-relative path resolution has one shared helper - replace the old tool config boolean with `ToolEnvironmentMode::{None, Single, Multiple}` ## Testing - Tests not run locally; this prep refactor is covered by GitHub CI for the stack. Co-authored-by: Codex <noreply@openai.com>	2026-05-04 17:55:49 +00:00
pakrym-oai	c8c30d9d75	[codex] Emit MCP tool calls as turn items (#20677 ) ## Why `McpToolCall` was still an app-server item synthesized from deprecated legacy begin/end events. Recent item migrations moved this ownership into core `TurnItem`s, so MCP tool calls now follow the same canonical lifecycle and leave legacy events as compatibility fanout. Keeping the core item close to the v2 `ThreadItem::McpToolCall` shape also avoids spreading MCP result semantics across app-server conversion code. Core now owns whether a completed call is `completed` or `failed`, and whether the payload is a tool result or an error. ## What changed - Added core `TurnItem::McpToolCall` with flattened `server`, `tool`, `arguments`, `status`, `result`, and `error` fields. - Updated MCP tool call emitters, including MCP resource tools, to emit `ItemStarted`/`ItemCompleted` around directly constructed core MCP items. - Updated app-server v2 conversion to project the core MCP item into `ThreadItem::McpToolCall` without deriving status or splitting `Result` locally. - Ignored live deprecated MCP legacy fanout in app-server v2 to avoid duplicate item notifications, while keeping thread history replay on the legacy event path. ## Verification - `cargo test -p codex-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core --lib mcp_tool_call` - `cargo check -p codex-app-server` - `cargo test -p codex-app-server mcp_tool_call_completion_notification_contains_truncated_large_result`	2026-05-03 22:50:13 -07:00
Matthew Zeng	f88701f5c8	[tool_suggest] More prompt polishes. (#20566 ) Tool suggest still misfires when model needs tool_search, updating the prompts to further disambiguate it: - [x] rename it from `tool_suggest` to `request_plugin_install` - [x] rephrase "suggestion" to "install" in the tool descriptions. - [x] disambiguate "the tool" vs "the plugin/connector". Tested with the Codex App and verified it still works.	2026-05-02 04:22:12 +00:00
pakrym-oai	aed74e5ee4	[codex] Emit image view as core item (#20512 ) ## Why Image-view results should be represented as a core-produced turn item instead of being reconstructed by app-server. At the same time, existing rollout/history paths still understand the legacy `ViewImageToolCall` event, so this keeps that event as compatibility output generated from the new item lifecycle. ## What changed - Added `TurnItem::ImageView` to `codex-protocol`. - Emitted image-view item start/completion directly from the core `view_image` handler. - Kept `ViewImageToolCall` as a legacy event and generate it from completed `TurnItem::ImageView` items. - Kept `thread_history.rs` on the legacy `ViewImageToolCall` replay path, with `ImageView` item lifecycle events ignored there. - Updated app-server protocol conversion, rollout persistence, and affected exhaustive event matches for the new item plus legacy fan-out shape. ## Verification - `cargo test -p codex-protocol -p codex-app-server-protocol -p codex-rollout -p codex-rollout-trace -p codex-mcp-server -p codex-app-server --lib` - `cargo test -p codex-core --test all view_image_tool_attaches_local_image` - `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-rollout -p codex-rollout-trace -p codex-mcp-server` - `git diff --check`	2026-05-01 11:28:30 -07:00
starr-openai	be71b6fcd1	Use selected turn environments for runtime context (#20281 ) ## Summary - make selected turn environments the source of truth for session runtime cwd and MCP runtime environment selection - keep local/no-selection fallback behavior intact - add coverage for duplicate selected environments, cwd resolution, and MCP runtime environment selection ## Validation - git diff --check - rustfmt was run on touched Rust files during the implementation workflow CI should provide the full Bazel/test signal. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-01 11:00:14 -07:00
pakrym-oai	f476338f93	Move apply-patch file changes into turn items (#20540 ) ## Why Apply-patch file changes are now part of the core turn item stream, so v2 clients can consume the same first-class item lifecycle path used by other turn items instead of relying on app-server-specific remapping from legacy patch events. ## What changed - Added a core `TurnItem::FileChange` carrying apply-patch changes and completion metadata. - Updated the apply-patch tool emitter to send `ItemStarted` / `ItemCompleted` with the new `FileChange` item while preserving legacy `PatchApplyBegin` / `PatchApplyEnd` fan-out. - Updated app-server v2 conversion to render the new core item directly and stopped `event_mapping` from remapping old patch begin/end events into item notifications. - Kept thread history reconstruction based on the existing old apply-patch events for rollout compatibility. ## Verification - `cargo test -p codex-protocol -p codex-app-server-protocol` - `cargo test -p codex-core --test all apply_patch_tool_executes_and_emits_patch_events` - `cargo test -p codex-app-server bespoke_event_handling`	2026-05-01 08:47:18 -07:00
Tom	fe05acad23	Make thread store process-scoped (#19474 ) - Build one app-server process ThreadStore from startup config and share it with ThreadManager and CodexMessageProcessor. - Remove per-thread/fork store reconstruction so effective thread config cannot switch the persistence backend. - Add params to ThreadStore create/resume for specifying thread metadata, since otherwise the metadata from store creation would be used (incorrectly).	2026-04-30 21:24:59 -07:00
pakrym-oai	f50c02d7bc	[codex] Remove unused event messages (#20511 ) ## Why Several legacy `EventMsg` variants were still emitted or mapped even though clients either ignored them or had moved to item/lifecycle events. `Op::Undo` had also degraded to an unavailable shim, so this removes that dead task path instead of preserving a command that cannot do useful work. `McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are intentionally kept because useful consumers still depend on them: MCP startup completion drives readiness behavior, and the begin events let app-server/core consumers surface in-progress web-search and image-generation items before the final payload arrives. ## What Changed - Removed weak legacy event variants and payloads from `codex-protocol`, including legacy agent deltas, background events, and undo lifecycle events. - Kept/restored `EventMsg::McpStartupComplete`, `EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with serializer and emission coverage. - Updated core, rollout, MCP server, app-server thread history, review/delegate filtering, and tests to rely on the useful replacement events that remain. - Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI slash-command comments. - Stopped agent job/background progress and compaction retry notices from emitting `BackgroundEvent` payloads. ## Verification - `cargo check -p codex-protocol -p codex-app-server-protocol -p codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server` - `cargo test -p codex-protocol -p codex-app-server-protocol -p codex-rollout -p codex-rollout-trace -p codex-mcp-server` - `cargo test -p codex-core --test all suite::items` - `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server` - Earlier coverage on this PR also included `codex-mcp`, `codex-tui`, core library tests, MCP/plugin/delegate/review/agent job tests, and MCP startup TUI tests.	2026-04-30 20:03:26 -07:00
iceweasel-oai	4f96001fa7	execpolicy: unwrap PowerShell -Command wrappers on Windows (#20336 ) ## Why On Windows, Codex runs shell commands through a top-level `powershell.exe -NoProfile -Command ...` wrapper. `execpolicy` was matching that wrapper instead of the inner command, so prefix rules like `["git", "push"]` did not fire for PowerShell-wrapped commands even though the same normalization already happens for `bash -lc` on Unix. This change makes the Windows shell wrapper transparent to rule matching while preserving the existing Windows unmatched-command safelist and dangerous-command heuristics. ## What changed - add `parse_powershell_command_plain_commands()` in `shell-command/src/powershell.rs` to unwrap the top-level PowerShell `-Command` body with `extract_powershell_command()` and parse it with the existing PowerShell AST parser - update `core/src/exec_policy.rs` so `commands_for_exec_policy()` treats top-level PowerShell wrappers like `bash -lc` and evaluates rules against the parsed inner commands - carry a small `ExecPolicyCommandOrigin` through unmatched-command evaluation and expose `is_safe_powershell_words()` / `is_dangerous_powershell_words()` so Windows safelist and dangerous-command checks still work after unwrap - add Windows-focused tests for wrapped PowerShell prompt/allow matches, wrapper parsing, and unmatched safe/dangerous inner commands, and re-enable the end-to-end `execpolicy_blocks_shell_invocation` test on Windows ## Testing - `cargo test -p codex-shell-command`	2026-05-01 00:56:20 +00:00
Akshay Nathan	8426edf71e	Stateful streaming apply_patch parser	2026-04-30 21:41:15 +00:00
xl-openai	7b3de63041	Move plugin out of core. (#20348 )	2026-04-30 14:26:14 -07:00
pakrym-oai	b52083146c	Stop emitting item/fileChange/outputDelta output delta notifications (#20471 ) ## Why `item/fileChange/outputDelta` text output was only the tool's summary or error text and not used by client surfaces. We keep `item/fileChange/outputDelta` in the app-server protocol as a deprecated compatibility entry, but the server no longer emits it. ## What changed - stop the `apply_patch` runtime from emitting `ExecCommandOutputDelta` events - simplify `item_event_to_server_notification` so command output deltas always map to `item/commandExecution/outputDelta` - remove the app-server bookkeeping that tried to detect whether an output delta belonged to a file change - mark `item/fileChange/outputDelta` as a deprecated legacy protocol entry in the v2 types, schema, and README - simplify the file-change approval tests so they only wait for completion instead of expecting output-delta notifications ## Testing - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-thread-manager-sample` - `cargo test -p codex-app-server-protocol protocol::event_mapping::tests::exec_command_output_delta_maps_to_command_execution_output_delta -- --exact` - `cargo test -p codex-app-server turn_start_file_change_approval_accept_for_session_persists_v2 -- --exact` (failed before the test assertions because the wiremock `/responses` mock received 0 requests in setup)	2026-04-30 11:42:07 -07:00
Owen Lin	3516cb9751	fix(core): truncate large mcp tool outputs in rollouts (#20260 ) ## Why Large MCP tool call outputs can make rollout JSONL files enormous. In the session that motivated this change, the biggest JSONL records were: - `event_msg/mcp_tool_call_end` - `response_item/function_call_output` both containing the same unbounded MCP payloads - just 3 MCP tool calls that each were multi-hundred MBs 😱 This PR truncates both of those JSONL records. ## How #### For `response_item/function_call_output` Unified exec already bounds tool output before it is injected into model-facing history, which also keeps the corresponding rollout `response_item/function_call_output` records small. MCP should follow the same pattern: truncate the model-facing tool output at the tool-output boundary, while leaving code-mode/raw hook consumers alone. #### For `event_msg/mcp_tool_call_end` `McpToolCallEnd` also needs its own bounded event copy because it is the app-server/replay/UI event shape that backs `ThreadItem::McpToolCall`. Unfortunately this is _not_ downstream of the `ToolOutput` trait. ## Model behavior Model behavior is actually unchanged as a result of this PR. Before this PR, MCP output was: 1. Converted to `FunctionCallOutput`. 2. Recorded into in-memory history. 3. Truncated by `ContextManager::record_items()` before later model turns saw it. After this branch, MCP output is truncated earlier, in `McpToolOutput::response_payload()`, using the same helper. Then `ContextManager::record_items()` sees an already-truncated output and effectively has little/no additional work to do. So the model should still see the same kind of truncated function-call output. The practical difference is where truncation happens: earlier, before rollout persistence/app-server emission can see the giant payload. ## Verification - `cargo test -p codex-core mcp_tool_output` - `cargo test -p codex-core mcp_tool_call::tests::truncate_mcp_tool_result_for_event` - `cargo test -p codex-core mcp_post_tool_use_payload_uses_model_tool_name_args_and_result` - `just fmt` - `just fix -p codex-core` - `git diff --check`	2026-04-30 16:30:43 +00:00

1 2 3 4 5 ...

718 Commits