codex

mirror of https://github.com/openai/codex.git synced 2026-05-23 20:44:50 +00:00

Author	SHA1	Message	Date
anp-oai	c83ba22359	Allow parallel MCP tool calls when annotated readOnly (#23750 ) ## Summary - Treat MCP tools with `readOnlyHint: true` as parallel-safe even when `supports_parallel_tool_calls` is unset or `false`. - Keep server-level `supports_parallel_tool_calls` as an additive override for non-read-only tools. - Add focused unit coverage for the MCP handler eligibility decision. - Update RMCP integration coverage to keep the serial baseline on a mutable tool, verify read-only concurrency without server opt-in, and preserve the server opt-in concurrency path separately. ## Testing - `just fmt` - `cargo test -p codex-core --lib tools::handlers::mcp::tests::` - `cargo test -p codex-core --test all stdio_mcp_read_only_tool_calls_run_concurrently_without_server_opt_in` - `cargo test -p codex-core --test all stdio_mcp_parallel_tool_calls_opt_in_runs_concurrently` - `cargo test -p codex-rmcp-client`	2026-05-21 20:40:34 -07:00
sayan-oai	7e802b22f1	Expose conversation history to extension tools (#23963 ) ## Why Extension tools that need conversation context should be able to read it from the live tool invocation instead of reaching into thread persistence themselves. ## What changed - Add a `ConversationHistory` snapshot to extension `ToolCall`s and populate it from the current raw in-memory response history. - Expose all history items at this boundary so each extension can filter and bound the subset it needs before consuming or forwarding it. - Cover the adapter and registry dispatch paths and update existing extension tests that construct `ToolCall` literals. ## Test plan - `cargo test -p codex-tools` - `cargo test -p codex-extension-api` - `cargo test -p codex-goal-extension` - `cargo test -p codex-memories-extension` - `cargo test -p codex-core passes_turn_fields_to_extension_call` - `cargo test -p codex-core extension_tool_executors_are_model_visible_and_dispatchable`	2026-05-22 01:11:47 +00:00
Abhinav	16d85e2708	Add subagent identity to hook inputs (#22882 ) # What When a normal hook fires inside a thread-spawned subagent, Codex now includes these optional top-level fields in the hook input: - `agent_id`: the child thread id - `agent_type`: the subagent role Root-agent hook inputs omit these fields. `SubagentStart` and `SubagentStop` keep their existing required `agent_id` and `agent_type` fields because those events are inherently subagent-scoped. This does not change matcher behavior. Tool hooks still match on tool name, compact hooks still match on trigger, and `UserPromptSubmit` still ignores matchers. Only `SubagentStart` and `SubagentStop` match on `agent_type`.	2026-05-21 14:54:01 -07:00
Abhinav	24faf49b2a	Remove plugin hooks feature flag (#22552 ) # Why This is a follow-up stacked on top of the `plugin_hooks` default-on change. Once we are comfortable making plugin hooks part of the normal plugin behavior, the separate feature flag stops buying us much and leaves extra branching/cache state behind. # What - remove the `PluginHooks` feature and generated config-schema entries - make plugin hook loading/listing follow plugin enablement directly - drop plugin-manager cache/state that only existed to distinguish hook-flag toggles - remove tests and fixtures that modeled `plugin_hooks = true/false`	2026-05-21 19:15:18 +00:00
starr-openai	298e5cfce1	Route MCP servers through explicit environments (#23583 ) ## Summary - route each configured MCP server through an explicit per-server `environment_id` instead of a manager-wide remote toggle - default omitted `environment_id` to `local`, resolve named ids through `EnvironmentManager`, and fail only the affected MCP server when an explicit id is unknown - keep local stdio on the existing local launcher path for now, while named-environment stdio uses the selected environment backend and requires an absolute `cwd` - allow local HTTP MCP servers to keep using the ambient HTTP client when no local `Environment` is configured; named-environment HTTP MCPs use that environment's HTTP client ## Validation - devbox Bazel build: `bazel build --bes_backend= --bes_results_url= //codex-rs/cli:codex //codex-rs/rmcp-client:test_stdio_server //codex-rs/rmcp-client:test_streamable_http_server` - devbox app-server config matrix with real `config.toml` / `environments.toml` files covering omitted local, explicit local, omitted local under remote default, explicit remote stdio, local HTTP without local env, explicit remote HTTP, local stdio without local env, unknown explicit env, and remote stdio without `cwd`	2026-05-21 17:19:54 +02:00
jif-oai	8a511d5881	cli: rename profile v2 flag to --profile (#23883 ) ## Why Profile v2 is taking over the user-facing profile selection path, so the CLI no longer needs to expose the transitional `--profile-v2` spelling. This switches the public args surface to `--profile` before the remaining legacy profile plumbing is removed separately. ## What - Rebind `--profile` and `-p` to the v2 profile name argument that selects `$CODEX_HOME/<name>.config.toml`. - Stop parsing the legacy shared CLI profile argument while keeping its implementation path in place for follow-up cleanup. - Update CLI validation, profile-name parse errors, and the legacy-profile collision message/tests to refer to `--profile`. ## Testing - `cargo test -p codex-cli -p codex-config -p codex-protocol -p codex-utils-cli`	2026-05-21 16:45:27 +02:00
jif-oai	e6c8371e4e	refactor: centralize tool exposure planning (#23876 ) ## Why Tool exposure is a planning concern, but the deferred MCP path and dispatch-only legacy shell path were carrying those decisions in handler constructors and a shell-only tool-family builder. Keeping those decisions in `spec_plan` makes the core tool plan easier to follow and keeps handlers focused on runtime behavior. ## What changed - add `PlannedTools` helpers for ordinary runtimes, exposure overrides, dispatch-only runtimes, and hosted specs - inline shell tool assembly into `core/src/tools/spec_plan.rs` and remove the shell-only `tool_family` module - remove exposure state and special exposure constructors from `McpHandler` and `ShellCommandHandler` - keep hidden runtime behavior centralized in `ExposureOverride`, including disabling parallel tool calls for hidden handlers ## Testing - Not run (refactor only)	2026-05-21 16:21:23 +02:00
jif-oai	2a25602783	[codex] Stabilize subagent start hook test (#23882 ) ## What Remove the exact captured request-count assertion from the `SubagentStart` hook integration test while still waiting for the child request that matches the injected hook context. ## Why The test owns the start-hook behavior and already verifies that the child request reaches the context matcher plus that the start/session hook logs have the expected invocations. Counting every request captured by the response mock makes the test sensitive to lifecycle timing outside that contract and has been flaky in CI. ## Testing - `cargo test -p codex-core --test all suite::subagent_notifications::subagent_start_replaces_session_start_and_injects_context -- --exact`	2026-05-21 15:54:23 +02:00
jif-oai	516f134641	Make tool executor specs mandatory (#23870 ) ## Why `ToolExecutor` is the runtime contract that keeps a callable tool and its model-visible spec together. Leaving `spec()` optional lets a registered runtime silently omit that half of the contract, and it also overloads a missing spec as an exposure decision for tools that should stay dispatchable without being shown to the model. ## What - Make `ToolExecutor::spec()` required and update core, extension, and test tool executors to return a concrete `ToolSpec`. - Add `ToolExposure::Hidden` for dispatch-only tools. The legacy `shell_command` runtime in unified-exec sessions now uses that explicit exposure instead of hiding itself by omitting a spec. - Build MCP tool specs when `McpHandler` is constructed so invalid MCP specs are skipped before the handler is registered. - Keep tool planning aligned with the new contract for direct, deferred, hidden, code-mode, dynamic, and namespaced tool paths. ## Testing - Added tool-plan coverage that invalid MCP tool specs are not registered. - Updated shell-family coverage for the hidden legacy `shell_command` runtime and the affected tool executor test fixtures.	2026-05-21 15:25:56 +02:00
jif-oai	94442b7f95	feat: retain remote compaction truncation parity in v2 (#23728 ) ## Why Remote compaction now has two implementations: the existing server-rebuilt v1 path and the newer client-rebuilt v2 path behind `remote_compaction_v2`. The v1 path bounds retained user/developer/system history before installing the compaction item, while v2 was previously carrying the full retained history forward. That made the two paths diverge for large pre-compaction transcripts even though they are meant to preserve the same compaction contract. This aligns v2 with the retained-history budget expected from v1 so switching the feature flag does not materially change which pre-compaction messages survive into the rebuilt history. ## What changed - Apply a retained-message character budget while rebuilding v2 compacted history in `core/src/compact_remote_v2.rs`. - Keep newest retained messages first, truncate the boundary message with the shared `truncate_text(...)` helper, and drop older retained messages once the budget is exhausted. - Preserve non-text retained message content such as images while truncating text content. - Use the current `64_000` token retained-message default translated to the existing `4x` character budget. ## Testing - `cargo test -p codex-core compact_remote_v2::tests::` - Added focused coverage for newest-first retention and truncating multipart retained messages without dropping images.	2026-05-21 15:07:03 +02:00
jif-oai	791b69dd53	[codex] Steer budget-limited goal extension turns (#23718 ) ## What - Add a small extension capability for injecting model-visible response items into the active turn - Have the goal extension inject hidden goal-context steering when tool-finish accounting reaches `BudgetLimited` - Cover the extension backend path with an assertion on the injected steering item ## Why PR #23696 persists and emits the budget-limited goal update from tool-finish accounting, but it leaves the model unaware of that transition. The existing core runtime steers the model to wrap up in this case; the extension path should do the same through an explicit host capability. ## Testing - `just fmt` - `cargo test -p codex-goal-extension` - `cargo test -p codex-extension-api`	2026-05-21 12:54:00 +02:00
jif-oai	20fedafff8	Trace logical websocket request after untraced warmup (#23581 ) ## Why `prewarm_websocket` intentionally stays out of rollout inference tracing, but the next traced websocket request can still reuse the warmup `response_id` and send an empty `input` delta. If tracing records that wire payload verbatim, replay sees an incremental request whose parent was never traced and cannot reconstruct the conversation. This fixes that at the producer boundary instead of relaxing `rollout-trace` replay semantics around unresolved `previous_response_id` values. ## What - track whether the last websocket response came from an untraced warmup and clear that state when the websocket session is reset or reconnected - when a traced websocket request reuses that warmup parent, keep sending the compressed websocket request on the wire but record the logical `ResponsesApiRequest` in the rollout trace - add a regression test that proves replay reconstructs the logical user message even though the websocket follow-up carries `previous_response_id = warm-1` with empty `input` - update `InferenceTraceAttempt::record_started` docs to reflect that callers may record a logical request rather than the exact transport payload ## Testing - `cargo test -p codex-core --test all responses_websocket_request_prewarm_traces_logical_request`	2026-05-21 11:13:23 +02:00
Michael Bolin	63a72e6b78	core: pass permission profiles to Windows runner (#23715 ) ## Why This is the functional handoff PR for the Windows sandbox `PermissionProfile` migration. After #23714, the Windows elevated backend can accept a profile-native request, but core still sent a compatibility `SandboxPolicy` into the elevated command-runner path. That meant profile-only details such as deny globs had to be translated through side channels instead of being preserved in the runner `SpawnRequest`. Passing the real `PermissionProfile` completes the command-runner handoff while leaving the unelevated restricted-token fallback on the legacy policy-string API. ## What - Updates one-shot Windows elevated execution in `core/src/exec.rs` to call `run_windows_sandbox_capture_for_permission_profile_elevated`. - Updates unified exec in `core/src/unified_exec/process_manager.rs` to call `spawn_windows_sandbox_session_elevated_for_permission_profile`. - Passes `request.permission_profile` / `exec_request.permission_profile` and the stored Windows sandbox policy cwd to the elevated backend. - Keeps compatibility `SandboxPolicy` serialization only for the non-elevated restricted-token fallback. ## Verification - `cargo test -p codex-core --test all --no-run`	2026-05-20 17:57:36 -07:00
viyatb-oai	713a5b1b00	feat: support managed permission profiles in requirements.toml (#23433 ) ## Why Cloud-managed `requirements.toml` should be able to define the managed permission profiles a client may select and constrain that selectable set without requiring local user config to recreate the profile catalog. This keeps requirements focused on restrictions. The selected default remains a config or session choice, while requirements contribute the managed profile bodies and `allowed_permissions` allowlist that the config-loading boundary validates before a resolved runtime `PermissionProfile` is installed. ## What changed - Add `requirements.toml` support for a managed permission-profile catalog plus its allowlist: ```toml allowed_permissions = ["review", "build"] [permissions.review] extends = ":read-only" [permissions.build] extends = ":workspace" ``` - Merge requirements-defined profile bodies into the effective permission catalog and reject profile ids that collide with config-defined profiles. - Validate that every `allowed_permissions` entry resolves to a built-in or catalog profile before selection uses it. - Preserve allowed configured named-profile selections. When a configured named profile is disallowed, fall back to the first allowed requirements profile with a startup warning. - Keep built-in selections and the stock trust-based `:read-only` / `:workspace` fallback path intact when no permission profile is explicitly selected. - Centralize the managed catalog and allowlist selection path in `EffectivePermissionSelection` so the requirements boundary is visible in config loading. - Surface `allowedPermissions` through `configRequirements/read`, and update the generated app-server schema fixtures plus the app-server README. ## Validation - `cargo test -p codex-config` - `cargo test -p codex-core system_requirements_` - `cargo test -p codex-core system_allowed_permissions_` - `cargo test -p codex-app-server-protocol` - `just write-app-server-schema` ## Related work - Uses merged permission-profile inheritance support from #22270 and #23705. - Kept separate from the in-flight permission profile listing API in #23412.	2026-05-20 17:33:01 -07:00
viyatb-oai	a27d3847b5	[codex] Reject read-only fallback with approvals disabled (#23774 ) ## Why If a user configures `approval_policy = "never"` with `sandbox_mode = "danger-full-access"`, managed requirements can reject full access and force the existing permission fallback to read-only. That leaves Codex in a dead-end session: writes are blocked by the sandbox, while approvals are disabled so the session cannot ask to proceed. This PR rejects that constrained configuration during startup instead of letting the TUI enter a read-only session that cannot make progress. The rejection is attached to the requirement-constrained permission path in [`Config`](`39f0abc0a7/codex-rs/core/src/config/mod.rs (L3301-L3318)`). ## What changed - Reject the `danger-full-access` to read-only managed-requirements fallback when the effective approval policy is `never`. - Explain in the startup config error why the fallback is invalid and how to fix it. - Add a regression test for the managed requirements path.	2026-05-20 17:17:59 -07:00
evawong-oai	3cae84009a	Use named MITM permissions config (#18240 ) ## Stack 1. Parent PR: #18868 adds MITM hook config and model only. 2. Parent PR: #20659 wires hook enforcement into the proxy request path. 3. This PR changes the user facing PermissionProfile TOML shape. ## Why 1. The broader goal is to make MITM clamping usable from the same permission profile that already controls network behavior. 2. This PR is the config UX layer for the stack. It moves MITM policy into `[permissions.<profile>.network.mitm]` instead of exposing the flat runtime shape to users. 3. The named hook and action tables belong here because users need reusable policy blocks that are easy to review, while the proxy runtime only needs a flat hook list. 4. This PR validates action refs during config parsing so mistakes in the user facing policy fail before a proxy session starts. 5. Keeping the lowering here lets the proxy keep its simpler runtime model and lets PermissionProfile remain the single source of network permission policy. ## Summary 1. Keep MITM policy inside `[permissions.<profile>.network.mitm]` so the selected PermissionProfile owns network proxy policy. 2. Use named MITM hooks under `[permissions.<profile>.network.mitm.hooks.<name>]`. 3. Put host, methods, path prefixes, query, headers, body, and action refs on the hook table. 4. Define reusable action blocks under `[permissions.<profile>.network.mitm.actions.<name>]`. 5. Represent action blocks with `NetworkMitmActionToml`, then lower them into the proxy runtime action config. 6. Reject unknown refs, empty refs, and empty action blocks during config parsing. 7. Keep the runtime hook model unchanged by lowering config into the existing proxy hook list. 8. Preserve the #20659 activation fix for nested MITM policy. ## Example ```toml [permissions.workspace.network.mitm] enabled = true [permissions.workspace.network.mitm.hooks.github_write] host = "api.github.com" methods = ["POST", "PUT"] path_prefixes = ["/repos/openai/"] action = ["strip_auth"] [permissions.workspace.network.mitm.actions.strip_auth] strip_request_headers = ["authorization"] ``` ## Validation 1. Regenerated the config schema. 2. Ran the core MITM config parsing and validation tests. 3. Ran the core PermissionProfile MITM proxy activation tests. 4. Ran the core config schema fixture test. 5. Ran the network proxy MITM policy tests. 6. Ran the scoped Clippy fixer for the network proxy crate. 7. Ran the scoped Clippy fixer for the core crate. --------- Co-authored-by: Winston Howes <winston@openai.com>	2026-05-20 17:10:37 -07:00
Matthew Zeng	0a4179bb19	[codex] Add plugin id to MCP tool call items (#23737 ) Add owning plugin id to MCP tool call items so we can better filter them at plugin level. ## Summary - add optional `plugin_id` to MCP tool-call items and legacy begin/end events - propagate plugin metadata into emitted core items and app-server v2 `ThreadItem::McpToolCall` - preserve plugin ids through app-server replay/redaction paths and regenerate v2 schema fixtures ## Testing - `just write-app-server-schema` - `just fmt` - `just fix -p codex-core` - `cargo test -p codex-protocol -p codex-app-server-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib` - `cargo check -p codex-tui --tests` - `cargo check -p codex-app-server --tests` - `git diff --check` ## Notes - `just fix -p codex-core` completed with two non-fatal `too_many_arguments` warnings on the touched MCP notification helpers. - A broader `cargo test -p codex-core` run passed core unit tests, then hit shell/sandbox/snapshot failures in the integration target. - A broader app-server downstream run hit the existing `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack overflow; `cargo test -p codex-exec` also hit the existing sandbox expectation mismatch in `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.	2026-05-20 17:02:10 -07:00
guinness-oai	d6d03d42ea	[codex] Fix realtime v1 websocket compatibility (#23771 ) ## Why Realtime v1 websocket sessions now expect a slightly different boundary shape for text input, completed input transcripts, and connection headers. Codex was still using the older shape, so some v1 text appends could be rejected before the existing conversation flow could handle them. ## What changed - Send v1 user text items with `input_text` content - Accept v1 turn-marked input transcript events as completed transcripts - Add the v1 alpha header only for v1 realtime sessions - Cover the outbound text shape, transcript parsing, and versioned headers ## Test plan - `cargo test -p codex-api endpoint::realtime_websocket::methods::tests` - `cargo test -p codex-core quicksilver_alpha_header`	2026-05-20 16:03:51 -07:00
Shijie Rao	370b13afc9	Honor client-resolved service tier defaults (#23537 ) ## Why Model catalog responses can now advertise a nullable `default_service_tier` for each model. Codex needs to preserve three distinct states all the way from config/app-server inputs to inference: - no explicit service tier, so the client may apply the current model catalog default when FastMode is enabled - explicit `default`, meaning the user intentionally wants standard routing - explicit catalog tier ids such as `priority`, `flex`, or future tiers Keeping those states distinct prevents the UI from showing one tier while core sends another, especially after model switches or app-server `thread/start` / `turn/start` updates. ## What Changed - Plumbed `default_service_tier` through model catalog protocol types, app-server model responses, generated schemas, model cache fixtures, and provider/model-manager conversions. - Added the request-only `default` service tier sentinel and normalized legacy config spelling so `fast` in `config.toml` still materializes as the runtime/request id `priority`. - Moved catalog default resolution to the TUI/client side, including recomputing the effective service tier when model/FastMode-dependent surfaces change. - Updated app-server thread lifecycle config construction so `serviceTier: null` preserves explicit standard-routing intent by mapping to `default` instead of internal `None`. - Kept core responsible for validating explicit tiers against the current model and stripping `default` before `/v1/responses`, without applying catalog defaults itself. ## Validation - `CARGO_INCREMENTAL=0 cargo build -p codex-cli` - `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list` - `cargo test -p codex-tui service_tier` - `cargo test -p codex-protocol service_tier_for_request` - `cargo test -p codex-core get_service_tier` - `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core service_tier`	2026-05-20 15:57:50 -07:00
Eric Traut	0e9d222178	Make goals feature on by default and no longer experimental (#23732 ) ## Why The `goals` feature is ready to be available without requiring users to opt into experimental features. Keeping it behind the beta flag leaves persisted thread goals and automatic goal continuation disabled by default. This PR also marks the goal-related app server APIs and events as no longer experimental. ## What changed - Mark `goals` as `Stage::Stable`. - Enable `goals` by default in `codex-rs/features/src/lib.rs`.	2026-05-20 15:07:35 -07:00
Abhinav	eee3e60db3	Add SubagentStop hook (#22873 ) # What <img width="1792" height="1024" alt="image" src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e" /> `SubagentStop` runs when a thread-spawned subagent turn is about to finish. Thread-spawned subagents use `SubagentStop` instead of the normal root-agent `Stop` hook. Configured handlers match on `agent_type`. Hook input includes the normal stop fields plus: - `agent_id`: the child thread id. - `agent_type`: the resolved subagent type. - `agent_transcript_path`: the child subagent transcript path. - `transcript_path`: the parent thread transcript path. - `last_assistant_message`: the final assistant message from the child turn, when available. - `stop_hook_active`: `true` when the child is already continuing because an earlier stop-like hook blocked completion. `SubagentStop` shares the same completion-control semantics as `Stop`, scoped to the child turn: - No decision allows the child turn to finish. - `decision: "block"` with a non-empty `reason` records that reason as hook feedback and continues the child with that prompt. - `continue: false` stops the child turn. If `stopReason` is present, Codex surfaces it as the stop reason. # Lifecycle Scope Only thread-spawned subagents run `SubagentStop`. Internal/system subagents such as Review, Compact, MemoryConsolidation, and Other do not run normal `Stop` hooks and do not run `SubagentStop`. This avoids exposing synthetic matcher labels for internal implementation paths. # Stack 1. #22782: add `SubagentStart`. 2. This PR: add `SubagentStop`. 3. #22882: add subagent identity to normal hook inputs.	2026-05-20 14:59:41 -07:00
viyatb-oai	40ad7be2b5	core: refresh active permission profiles at runtime (#22931 ) ## Why Once a named permission profile is selected, runtime state has to keep that profile identity intact instead of collapsing back to anonymous effective permissions. The session refresh path also needs to rebuild profile-derived network proxy state so active profile switches take effect consistently. ## What changed - Preserve the active permission profile through session updates. - Rebuild profile-derived runtime/network configuration when the active profile changes. - Keep the runtime path aligned with the current session configuration APIs. - Tighten the affected tests, including the Windows delete-pending memory-file case that was intermittently tripping CI. ## Stack 1. This PR: runtime/session/network propagation for active permission profiles. 2. [#23708](https://github.com/openai/codex/pull/23708): TUI selection plumbing and guardrail flow. 3. [#21559](https://github.com/openai/codex/pull/21559): profile-aware `/permissions` menu and custom profile display. <img width="1296" height="906" alt="image" src="https://github.com/user-attachments/assets/077fa3a7-80cb-4925-80b1-d2395018d90a" />	2026-05-20 21:55:21 +00:00
Michael Bolin	896ee672cc	windows-sandbox: feed setup from resolved permissions (#23167 ) ## Why This is the next step in the Windows sandbox migration away from the legacy `SandboxPolicy` abstraction. #22923 moved write-root and token decisions onto `ResolvedWindowsSandboxPermissions`, but setup and identity still accepted `SandboxPolicy` and converted internally. This PR pushes that conversion outward so the setup path consumes the resolved Windows permission view directly. ## What Changed - Changed `SandboxSetupRequest` to carry `ResolvedWindowsSandboxPermissions` instead of `SandboxPolicy` plus policy cwd. - Updated setup refresh/elevation and identity credential preparation to use resolved permissions for read roots, write roots, network identity, and deny-write payload planning. - Removed the production `allow.rs` legacy wrapper; allow-path computation now takes resolved permissions directly. - Added a permissions-based world-writable audit entry point while keeping the existing legacy wrapper for compatibility. - Updated legacy ACL setup and the core Windows setup bridge to construct resolved permissions at the boundary. - Hardened the Windows sandbox integration test helper staging so Bazel retries can reuse an already-staged helper if a prior sandbox helper process still has the executable open. ## Verification - `cargo test -p codex-windows-sandbox` - `cargo test -p codex-core --test all --no-run` - `just fix -p codex-windows-sandbox` - `just fix -p codex-core` - Attempted `cargo check -p codex-windows-sandbox --target x86_64-pc-windows-gnullvm`, but the local machine is missing `x86_64-w64-mingw32-clang`; Windows CI should cover that target. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23167). * #23715 * #23714 * __->__ #23167	2026-05-20 14:52:38 -07:00
Michael Bolin	e1ec0eee5f	windows-sandbox: drive write roots from resolved permissions (#22923 ) ## Why This is the third PR in the Windows sandbox `SandboxPolicy` -> `PermissionProfile` migration stack. #22896 introduced `ResolvedWindowsSandboxPermissions`, and #22918 moved elevated runner IPC to carry `PermissionProfile`. This PR starts moving the remaining setup/spawn helpers away from asking legacy enum questions like “is this `WorkspaceWrite`?” and toward resolved runtime permission questions like “does this profile require write capability roots?” ## What changed - Added resolved-permissions helpers for network identity and write-capability detection. - Moved setup write-root gathering to operate on `ResolvedWindowsSandboxPermissions`, with the legacy `SandboxPolicy` wrapper left in place for existing call sites. - Updated identity setup, elevated capture setup, and world-writable audit denies to use resolved write roots. - Updated spawn preparation to carry resolved permissions in `SpawnContext` and use them for network blocking, setup write roots, elevated capability SID selection, and legacy capability roots. - Removed a now-unused legacy write-root helper. ## Verification - `cargo test -p codex-windows-sandbox` - `just fix -p codex-windows-sandbox` - Existing stack checks are green on #22896 and #22918; CI has started for this PR. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22923). * #23715 * #23714 * #23167 * __->__ #22923	2026-05-20 14:30:42 -07:00
Abhinav	af49d38373	Support compact SessionStart hooks (#21272 ) # Why Compaction replaces the live conversation history, so hooks that use `SessionStart` to re-inject durable model context need a way to run again after that rewrite. Related - #19905 adds dedicated compact lifecycle hooks # What - add `compact` as a supported `SessionStart` source and matcher value - change pending `SessionStart` state from a single slot to a small FIFO queue so `resume` / `startup` / `clear` can be preserved alongside a later `compact` - drain all queued `SessionStart` sources before the next model request, preserving their original order # Testing The new integration coverage verifies both the basic `compact` matcher path and the stacked `resume` -> `compact` case where both hooks contribute `additionalContext` to the next model turn.	2026-05-20 20:46:19 +00:00
viyatb-oai	fe7c069fe6	feat(permissions): resolve permission profile inheritance (#22270 ) ## Stack This is the foundation PR for the permission-profile inheritance stack. - This PR adds config-level `extends` resolution and merge semantics. - Follow-up: #23705 applies resolved profiles at runtime and updates the active-profile protocol surfaces. ## Why Permission profiles are starting to carry enough policy that copy-pasting near-identical definitions becomes hard to review and easy to drift. Before the runtime can consume inherited profiles, the config layer needs one explicit resolver that can merge parent chains and reject unsafe or invalid inheritance shapes. ## What changed - Add `extends` to permission-profile TOML and resolve parent chains in inheritance order. - Merge inherited profile TOML with the existing config merge behavior while preserving the permission-specific normalization needed for network domain keys. - Keep parent descriptions out of resolved child profiles and record inherited profile names separately for downstream consumers. - Reject undefined parents, unsupported built-in parents, and inheritance cycles with targeted errors. - Cover resolver behavior with TOML fixture tests and refresh the generated config schema. ## Validation - `cargo test -p codex-config` - `cargo test -p codex-core permissions_profiles_`	2026-05-20 20:12:07 +00:00
evawong-oai	3d94e24a3d	Add MITM hook config model (#18868 ) ## Stack 1. This PR adds MITM hook config and model only. 2. Runtime follow up: #20659 wires hook enforcement into the proxy request path. 3. User facing config follow up: #18240 moves MITM policy into the PermissionProfile network tree. ## Why 1. Viyat asked for the original parent PR to be split so reviewers can inspect the policy model before request behavior changes. 2. This PR gives the proxy a typed MITM hook model, validation, matcher compilation, permissions TOML plumbing, schema support, and config tests. 3. This PR deliberately does not change CONNECT or MITM request handling. 4. Keeping runtime behavior out of this PR makes the review boundary simple: does the policy model parse, validate, compile, and lower correctly. ## Summary 1. Add the MITM hook config model and matcher compilation. 2. Validate hosts, methods, paths, query matchers, header matchers, secret sources, and reserved body matching. 3. Add wildcard matcher support for path, query value, and header value matching. 4. Add permissions TOML and schema support for flat runtime hook config. 5. Add config loader tests for MITM hook overlay behavior. ## Validation 1. Regenerated the config schema. 2. Ran the network proxy MITM hook unit tests. 3. Ran the core permission profile MITM hook parsing tests. 4. Ran the core config schema fixture test. 5. Ran the scoped Clippy fixer for the network proxy crate. 6. Ran the scoped Clippy fixer for the core crate. ## Notes 1. Runtime enforcement moved to #20659. 2. User facing PermissionProfile TOML shape remains in #18240.	2026-05-20 12:51:12 -07:00
jif-oai	c5bd131567	feat: add turn_id and truncation_policy to extension tool calls (#23666 ) ## Why Extension-owned tools currently receive a stripped `ToolCall` with only `call_id`, `tool_name`, and `payload`. That makes extension work that needs turn-local execution context awkward, especially web-search extension work that needs the active `truncation_policy` at tool invocation time. Reconstructing that value from config or `ExtensionData` would be indirect and could drift from the actual turn context, so the cleaner fix is to pass the needed turn metadata directly on the extension-facing invocation type. ## What changed - added `turn_id` and `truncation_policy` to `codex_tools::ToolCall` - populated those fields when core adapts `ToolInvocation` into an extension tool call - added a focused adapter test that verifies extension executors receive the forwarded turn metadata - updated the memories extension tests to construct the richer `ToolCall` - added the `codex-utils-output-truncation` dependency to `codex-tools` and refreshed lockfiles ## Testing - `cargo test -p codex-tools` - `cargo test -p codex-memories-extension` - `cargo test -p codex-core passes_turn_fields_to_extension_call` - `just bazel-lock-update` - `just bazel-lock-check`	2026-05-20 20:14:41 +02:00
jif-oai	d4f842f3b3	feat: account active goal progress in the goal extension (#23696 ) ## Why The goal extension can create and surface goals, but the live turn-accounting path still stopped short of persisting active-goal progress. That leaves token and wall-clock usage, plus `ThreadGoalUpdated` events, out of sync with the extension boundary once work actually advances or a goal transitions out of active state. ## What changed - Teach `GoalAccountingState` to track the current turn, active goal, token deltas, and wall-clock progress snapshots against the persisted goal id. - Flush active-goal accounting from tool-finish, turn-stop, and turn-abort lifecycle hooks, and emit `ThreadGoalUpdated` events when persisted progress changes. - Route `create_goal` and `update_goal` through the same accounting state so new goals start from the right baseline, final progress is flushed before status changes, and `update_goal` can mark a goal `blocked` as well as `complete`. - Keep budget-limited goals accruing through the end of the turn while clearing local active-goal state once a turn or explicit update is finished. - Expand backend and lifecycle coverage around store ids, baseline reset, tool-finish accounting, budget-limited carry-through, and blocked-goal updates. ## Testing - Added focused backend coverage in `codex-rs/ext/goal/tests/goal_extension_backend.rs` for baseline reset, tool-finish accounting, budget-limited turns, and blocked-goal updates. - Extended `codex-rs/core/src/session/tests.rs` to assert that lifecycle inputs expose the expected session, thread, and turn store ids.	2026-05-20 18:36:37 +02:00
pakrym-oai	a52c91d8b5	[codex] Hide deferred tools from code mode prompt (#23605 ) ## Why `code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools` was failing because code-mode prompt generation used the same nested tool spec list for both the model-visible `exec` guide and the runtime `ALL_TOOLS` surface. That allowed deferred MCP/app tools, such as `calendar_timezone_option_99`, to leak into the `exec` description even though they should only be discoverable through `ALL_TOOLS` at runtime. ## What changed Split code-mode nested tool planning into two sets in `core/src/tools/spec_plan.rs`: - runtime nested tool specs still include deferred tools, so `tools[...]` and `ALL_TOOLS` can call them - `exec` prompt docs only render non-deferred tools, so deferred app tools stay out of the model-visible guide ## Validation - `cargo test -p codex-core --test all code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools -- --nocapture` - looped the same focused test 5 additional times with `cargo test -q -p codex-core --test all code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`	2026-05-20 08:09:45 -07:00
jif-oai	59507b8491	feat: expose turn-start metadata to extensions (#23688 ) ## Why The goal extension needs more context when a turn starts than `turn_store` alone provides. In particular, goal accounting needs the stable turn id, the effective collaboration mode, and the cumulative token-usage baseline captured at turn start so it can: - suppress goal accounting for plan-mode turns - compute exact per-turn deltas from cumulative `total_token_usage` snapshots instead of relying on the most recent usage event alone - keep the extension-owned accounting path aligned with the host turn lifecycle ## What - extend `codex_extension_api::TurnStartInput` to expose `turn_id`, `collaboration_mode`, and `token_usage_at_turn_start` - pass the full `TurnContext` plus the captured token-usage baseline through the turn-start lifecycle emission path - initialize goal turn accounting from the turn-start baseline and collaboration mode - switch goal token accounting to compute deltas from cumulative `total_token_usage` snapshots - add coverage for the new turn-start lifecycle fields and for goal-accounting baseline behavior ## Testing - added `turn_start_lifecycle_exposes_turn_metadata_and_token_baseline` in `codex-rs/core/src/session/tests.rs` - added `ext/goal/tests/accounting.rs` coverage for baseline-aware goal accounting and plan-mode suppression	2026-05-20 15:54:29 +02:00
jif-oai	1392a2a770	feat: async turn item process (#23692 ) Mechanical change	2026-05-20 15:30:01 +02:00
jif-oai	9483b09ea4	feat: rename 2 (#23668 ) Just a mechanical renaming	2026-05-20 12:11:44 +02:00
jif-oai	66d5edf825	feat: rename 3 (#23669 ) Just a mechanical renaming	2026-05-20 12:07:06 +02:00
jif-oai	93456320ef	feat: rename 1 (#23667 ) Just a mechanical renaming	2026-05-20 12:05:58 +02:00
jif-oai	18cefba922	Add timeout for remote compaction requests (#23451 ) ## Why Remote compaction currently sends a unary `POST /responses/compact` and waits for the full response before replacing history or emitting the completed `ContextCompaction` item. Unlike normal `/responses` streaming requests, this unary compact request had no timeout boundary. If the backend accepts the request and then stalls before returning a body, the existing request retry policy never sees a transport error, so the compact turn can remain stuck after the started item with no completion or actionable error. That matches the reported hang shape in issues such as #18363, where logs show `responses/compact` was posted but no corresponding compact completion followed. A bounded request timeout gives the existing retry policy a concrete timeout error to retry instead of letting the user sit indefinitely on automatic context compaction. ## What - Add a request timeout to legacy `/responses/compact` calls. - Size that timeout from the provider stream idle timeout with a conservative multiplier, so the default compact attempt gets 20 minutes rather than the 5 minute stream idle window. - Map API transport timeouts to a request timeout error instead of the child-process timeout message. ## Testing - Not run (per request; CI will cover).	2026-05-20 11:56:00 +02:00
richardopenai	000bf5ce6d	Migrate exec-server remote registration to environments (#23633 ) ## Summary - migrate exec-server remote registration naming from executor to environment - align CLI, public Rust exports, registry error messages, and relay test fixtures with the environment registry contract - keep the live registration path and response model consistent with `/cloud/environment/{environment_id}/register` ## Verification - `cargo test -p codex-exec-server remote::tests::register_environment_posts_with_auth_provider_headers --manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml` - `cargo test -p codex-exec-server --test relay multiplexed_remote_environment_routes_independent_virtual_streams --manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml` - `cargo check -p codex-cli --manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml` (still running when PR opened; will update after completion if needed)	2026-05-20 00:25:04 -07:00
sayan-oai	34aad43684	add encryptedcontent to functioncalloutput (#23500 ) add new `EncryptedContent` variant to `FunctionCallOutputContentItem` ahead of standalone websearch. we need to be able to receive and pass encrypted function call output from the new web search endpoint back to responsesapi, as we cannot expose direct search results.	2026-05-19 23:47:48 -07:00
Eric Traut	9dda71dbae	Warn on invalid UTF-8 in AGENTS.md files (#23232 ) Fixes #23223. ## Why Malformed AGENTS instructions should not fail silently. The reported issue had invalid UTF-8 in a global `AGENTS.md`; before this change, Codex treated that decode failure like a missing file, so the personal instructions disappeared without a user-visible explanation and the rollout had no `# AGENTS.md instructions` block. Project-level AGENTS files already used lossy decoding, so their instructions still appeared, but invalid bytes were replaced without telling the user. Global and project AGENTS files should behave consistently: keep usable instruction text when possible, and surface a diagnostic when bytes had to be replaced. ## What changed Global `AGENTS.override.md` and `AGENTS.md` loading now reads bytes and decodes with replacement characters on invalid UTF-8, matching project-level AGENTS behavior. Both global and project AGENTS loading now emit a startup warning when invalid UTF-8 is found, and both keep the instruction text with invalid byte sequences replaced. Missing files, non-file candidates, empty files, and the existing `AGENTS.override.md` before `AGENTS.md` precedence keep their current behavior. ## How users see it The warnings flow through the existing startup warning surface. App-server clients receive config-time startup warnings as `configWarning` notifications during initialization, and thread startup emits startup warnings as thread-scoped `warning` notifications. Global AGENTS invalid UTF-8 warnings can appear on both surfaces. Project-level AGENTS invalid UTF-8 warnings are discovered while building thread instructions, so they appear as thread-scoped `warning` notifications. Clients that render warning notifications in the conversation surface show the message as a visible diagnostic instead of silently hiding or altering instructions.	2026-05-19 21:56:46 -07:00
Ahmed Ibrahim	5a4202ad90	[codex] Preserve raw code-mode exec output by default (#23564 ) ## Why Code mode can use nested unified exec calls as data sources. When those calls omit `max_output_tokens`, code mode should receive raw command output so the script can parse or summarize it itself. When code mode does provide `max_output_tokens`, that explicit nested budget should be respected, including values above the default unified exec limit, rather than being capped before code mode sees the result. ## What - Preserve direct unified exec truncation behavior, while letting code-mode exec/write_stdin keep `max_output_tokens` as `None` unless explicitly supplied. - Make code-mode tool results use raw output when no explicit limit is present, and use the explicit nested limit directly when one is specified. - Refactor unified exec output formatting so `truncated_output` takes the caller-selected token budget. - Add e2e integration coverage for explicit nested exec limits, omitted nested exec limits, outer exec limit propagation, omitted-limit outputs that exceed both the default and a small truncation policy, explicit nested limits above those caps, and high explicit limits that still compact larger command output. - Reuse the code-mode turn setup helper while directly asserting the exact exec output item in each test. ## Testing - `just fmt` - `git diff --check` - Not run locally per repo guidance; CI should validate the e2e integration tests.	2026-05-20 04:02:14 +00:00
Eric Traut	e43a2e297f	Fix stale background terminal poll events (#23231 ) ## Why Issue #23214 reports `/ps` showing no background terminals while the status line still says it is waiting for a background terminal. The race is in core: `write_stdin` can poll a process that exits before the response returns. The process manager correctly returns `process_id: None`, but the handler still emitted a `TerminalInteraction` event using the requested session id, causing clients to believe a dead process was still being polled. Fixes #23214. ## What changed - Suppress `TerminalInteraction` events for empty `write_stdin` polls once `response.process_id` is `None`. - Continue emitting interactions for non-empty stdin, even if that input causes the process to exit before the response returns. - Extend the unified exec integration test to assert completed empty polls do not emit terminal interactions. ## Verification - `cargo test -p codex-core --test all unified_exec_emits_one_begin_and_one_end_event` - `cargo test -p codex-core --test all unified_exec_emits_terminal_interaction_for_write_stdin` `cargo test -p codex-core` currently aborts in unrelated `agent::control::tests::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale` with a reproducible stack overflow.	2026-05-19 20:48:37 -07:00
Ahmed Ibrahim	532b9c83ae	Move plugin and skill warmup into session startup (#23535 ) ## Why Plugin and skill loading is useful as warmup and early validation, but session startup does not need to wait for that work before it can continue building the session. Keeping it on the serial startup path adds avoidable latency to every fresh thread start. We still want invalid skill configurations to show up quickly, and we want the warmup to exercise the same plugin and skill manager caches that the normal turn path uses. ## What changed - moved plugin and skill warmup into the session startup async path instead of eagerly awaiting it on the serial setup path - kept the warmup using the session's resolved filesystem/environment context so skill loading still sees the right roots - preserved early skill-load error logging so broken skill configurations still surface during startup - left the per-turn plugin and skill loading path unchanged, so turns still use the normal cached managers ## Testing - Not run locally; relying on CI for validation.	2026-05-19 20:05:52 -07:00
viyatb-oai	c3faea0b09	feat: add permission profile list api (#23412 ) ## Why Clients need a typed permission-profile catalog instead of reconstructing that state from config internals. ## What changed - Added `permissionProfile/list` to the app-server v2 protocol with cursor pagination and optional `cwd`. - The list response includes built-in permission profiles plus config-defined `[permissions.<id>]` profiles from the effective config for the request context. - Permission profiles keep optional `description` metadata for display purposes. - App-server docs and schema fixtures are updated for the new RPC.	2026-05-20 02:42:56 +00:00
Michael Bolin	c58c84d6ee	test: fix multi-agent service tier assertion (#23576 ) ## Why `openai/codex#22169` added a regression test that expects an invalid child `service_tier` to be rejected, but the test used `Result::expect_err` on `SpawnAgentHandler::handle`. That requires the `Ok` type to implement `Debug`, and this handler returns `Box<dyn ToolOutput>`, so Bazel failed while compiling `codex-core` tests before it could run them. ## What changed - Capture the handler result and assert on `result.err()` instead of calling `expect_err`. - Keep the same `FunctionCallError::RespondToModel` assertion for the rejected service tier. ## Verification - `cargo test -p codex-core spawn_agent_role_service_tier_does_not_hide_invalid_spawn_request`	2026-05-19 16:47:20 -07:00
Matthew Zeng	b019a678d8	Remove unused ARC monitor path (#23573 ) ## Summary - remove the unreachable ARC monitor path from MCP tool approval handling - delete the unused ARC monitor module/tests and trim the orphaned safety-monitor decision plumbing - keep `always allow` approvals on the existing auto-approval short-circuit without a dead monitor hop ## Testing - `cargo test -p codex-core mcp_tool_call` - `just fmt` - `just fix -p codex-core` - `git diff --check` ## Additional validation - Attempted `cargo test -p codex-core`; the library test target passed, then the integration target failed in this local environment. - The narrower MCP-focused rerun passed its unit coverage and only hit missing local `test_stdio_server` binaries in filtered integration cases.	2026-05-19 16:23:25 -07:00
adams-oai	d86352d520	Add CUA requirements subsection for locked computer use (#23555 ) Adds a new top-level section for "CUA" requirements that can allow for disablement of specific features as needed for enterprises.	2026-05-19 15:41:44 -07:00
Ahmed Ibrahim	c53da029bc	[codex] Honor role-defined spawn service tiers (#22169 ) ## Why Custom agent roles are ordinary config layers, so a role file can already express `service_tier` just like other config values. The spawned-agent tier path needs to preserve that effective role config and follow the same precedence pattern as model/reasoning. ## What changed - Apply an explicit spawn-time `service_tier` onto the child config before role application, so a role config layer can override it just like role-defined model/reasoning settings do. - Validate the final effective child tier after the final child model is known, while still falling back to the parent tier when no child tier survives. - Add focused integration coverage for both v1 and v2 proving role TOML loads a service tier, spawned children keep that role-configured tier, and a role tier wins over a conflicting spawn-time tier. ## Validation - `just fmt` - `git diff --check` - Local Rust tests not run, per repo guidance; CI should exercise the new coverage.	2026-05-19 22:40:41 +00:00
Matthew Zeng	8335b56c33	Split plugin install discovery into list and request tools (#23372 ) ## Summary - Add `list_available_plugins_to_install` as the inventory step for plugin and connector install suggestions. - Slim `request_plugin_install` so it only handles the actual elicitation, instead of carrying the full discoverable list in its prompt. - Emit send-time telemetry when an install elicitation is dispatched, including requested tool identity in the event payload. - Emit install-result telemetry through `SessionTelemetry`, including tool type, user response action, and completion status. - Update registration and tests to cover the new two-step flow while keeping the existing `tool_suggest` feature gate unchanged. ## Testing - `just fmt` - `cargo test -p codex-tools` - `cargo test -p codex-core request_plugin_install` - `cargo test -p codex-core list_available_plugins_to_install` - `cargo test -p codex-core install_suggestion_tools_can_be_registered_without_search_tool` - `cargo test -p codex-otel manager_records_plugin_install_suggestion_metric` - `cargo test -p codex-otel manager_records_plugin_install_elicitation_sent_metric` - `just fix -p codex-core` - `just fix -p codex-tools` - `just fix -p codex-otel` - `cargo check -p codex-core`	2026-05-19 14:45:37 -07:00
starr-openai	5c43a64e2b	Make local environment optional in EnvironmentManager (#23369 ) ## Summary - make `EnvironmentManager` local environment/runtime paths optional - simplify constructor surface around snapshot materialization - rename local env accessors to `require_local_environment` / `try_local_environment` ## Validation - devbox Bazel build for touched crate surfaces - `//codex-rs/exec-server:exec-server-unit-tests` - `//codex-rs/app-server-client:app-server-client-unit-tests` - filtered touched `//codex-rs/core:core-unit-tests` cases	2026-05-19 12:55:34 -07:00
Abhinav	d661ab70ed	Add SubagentStart hook (#22782 ) # What `SubagentStart` runs once when Codex creates a thread-spawned subagent, before that child sends its first model request. Thread-spawned subagents use `SubagentStart` instead of the normal root-agent `SessionStart` hook. Configured handlers match on the subagent `agent_type`, using the same value passed to `spawn_agent`. When no agent type is specified, Codex uses the default agent type. Hook input includes the normal session-start fields plus: - `agent_id`: the child thread id. - `agent_type`: the resolved subagent type. `SubagentStart` may return `hookSpecificOutput.additionalContext`. That context is added to the child conversation before the first model request. # Lifecycle Scope Only thread-spawned subagents run `SubagentStart`. Internal/system subagents such as Review, Compact, MemoryConsolidation, and Other do not run normal `SessionStart` hooks and do not run `SubagentStart`. This avoids exposing synthetic matcher labels for internal implementation paths. Also the `SessionStart` hook no longer fires for subagents, this matches behavior with other coding agents' implementation # Stack 1. This PR: add `SubagentStart`. 2. #22873: add `SubagentStop`. 3. #22882: add subagent identity to normal hook inputs.	2026-05-19 12:45:08 -07:00

1 2 3 4 5 ...

3391 Commits