codex

mirror of https://github.com/openai/codex.git synced 2026-06-01 19:02:59 +00:00

Author	SHA1	Message	Date
Michael Bolin	4b55979755	permissions: remove cwd special path (#19841 ) ## Why The experimental `PermissionProfile` API had both `:cwd` and `:project_roots` special filesystem paths, which made the permission root ambiguous. This PR removes the unstable `current_working_directory` special path before the permissions API is stabilized, so callers use `:project_roots` for symbolic project-root access. ## What changed - Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol and app-server protocol models, plus regenerated app-server JSON/TypeScript schemas. - Replaces internal `:cwd` permission entries with `:project_roots` entries. - Keeps the existing cwd-update behavior for legacy-shaped workspace-write profiles, while removing the deleted `CurrentWorkingDirectory` case from that compatibility path. - Keeps `PermissionProfile::workspace_write()` as the reusable symbolic workspace-write helper, with docs noting that `:project_roots` entries resolve at enforcement time. - Updates app-server docs/examples and approval UI labeling to stop advertising `:cwd` as a permission token. ## Compatibility Persisted rollout items may contain the old `{"kind":"current_working_directory"}` tag from earlier experimental `permissionProfile` snapshots. This PR keeps that tag as a deserialize-only alias for `ProjectRoots { subpath: None }`, while continuing to serialize only the new `project_roots` tag. ## Follow-up This PR intentionally does not introduce an explicit project-root set on `SessionConfiguration` or runtime sandbox resolution. Today, the resolver still uses the active cwd as the single implicit project root. A follow-up should model project roots separately from tool cwd so `:project_roots` entries can resolve against the configured project roots, and resolve to no entries when there are no project roots. ## Verification - `cargo test -p codex-protocol permissions:: --lib` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-sandboxing -p codex-exec-server --lib` - `cargo test -p codex-core session_configuration_apply_ --lib` - `cargo test -p codex-app-server command_exec_permission_profile_project_roots_use_command_cwd --test all` - `cargo test -p codex-tui thread_read_session_state_does_not_reuse_primary_permission_profile --lib` - `cargo test -p codex-tui preset_matching_accepts_workspace_write_with_extra_roots --lib` - `cargo test -p codex-config --lib`	2026-04-27 13:41:27 -07:00
sayan-oai	85c1500569	fix: filter dynamic deferred tools from model_visible_specs (#19771 ) fixes #19486 ### Problem Right now dynamic deferred tools are filtered at normal-turn prompt building time, rather than upstream while building the `ToolRouter` itself. This causes issues because dynamic deferred tools are then wrongly included in the router's `model_visible_specs`, which is what the compaction request-building flow relies on. ### Fix Move the dynamic deferred tool filtering to `ToolRouter` creation time to solve this problem for every request that relies on `ToolRouter` for `model_visible_specs`, which solves the issue generically. ### Tests Added unit + integration tests to ensure dynamic deferred tools are omitted from `model_visible_specs` and compaction request respectively. Tested against live `/compact` endpoint; raw deferred dynamic tools without `tool_search` returned `400` (current bug), while the filtered payload (this fix) returns `200`.	2026-04-27 19:09:02 +00:00
jif-oai	bb83eec825	chore: split memories part 1 (#19818 ) Extract memories into 2 different crates	2026-04-27 16:01:05 +02:00
jif-oai	f8c527e529	multi_agent_v2: move thread cap into feature config (#19792 ) ## Why `features.multi_agent_v2.max_concurrent_threads_per_session` is meant to be the MultiAgentV2-specific session thread cap: it counts the root thread and all open subagent threads. The previous implementation kept this surface tied to `agents.max_threads`, which made it a global subagent-only cap and allowed the legacy setting to coexist with MultiAgentV2. ## What Changed - Added `max_concurrent_threads_per_session` to `[features.multi_agent_v2]` with default `4`. - Removed the `[agents] max_concurrent_threads_per_session` alias to `agents.max_threads`. - When MultiAgentV2 is enabled, reject `agents.max_threads` and derive the existing internal subagent slot limit as `max_concurrent_threads_per_session - 1`. - Regenerated `core/config.schema.json` and added coverage for the new config semantics. ## Result ``` ➜ codex git:(jif/clean-multi-agent-v2-config) codex -c features.multi_agent_v2.enabled=true -c features.multi_agent_v2.max_concurrent_threads_per_session=3 ╭────────────────────────────────────────────────────╮ │ >_ OpenAI Codex (v0.0.0) │ │ │ │ model: gpt-5.5 xhigh fast /model to change │ │ directory: ~/code/codex │ ╰────────────────────────────────────────────────────╯ Tip: Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager. › Can you try to spawn 4 agents • I’ll try to start four lightweight agents at once and report exactly what the runtime accepts. • Spawned Russell [no-apps] (gpt-5.5 xhigh) └ Spawn probe 1: reply briefly that you started, then wait for further instructions. Do not do any repo work. • Spawned Descartes [no-apps] (gpt-5.5 xhigh) └ Spawn probe 2: reply briefly that you started, then wait for further instructions. Do not do any repo work. • Agent spawn failed └ Spawn probe 3: reply briefly that you started, then wait for further instructions. Do not do any repo work. • Agent spawn failed └ Spawn probe 4: reply briefly that you started, then wait for further instructions. Do not do any repo work. • The runtime accepted the first two and rejected the next two with agent thread limit reached. I’m checking whether the two accepted probes have returned cleanly, then I’ll close them if needed. ``` --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-27 13:31:56 +02:00
Michael Bolin	a6ca39c630	permissions: derive legacy exec policies at boundaries (#19737 ) ## Why After config and requirements store canonical profiles, exec requests should not cache a derived `SandboxPolicy`. The cached legacy value can drift from the richer profile state, and most execution paths already have the filesystem and network runtime policies they need. ## What Changed - Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest` and `codex_core::sandboxing::ExecRequest`. - Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper for the Windows and legacy call sites that still need a `SandboxPolicy` projection. - Updates Windows filesystem override setup and unified exec policy serialization to derive that compatibility policy at the boundary. - Updates Unix escalation reruns and direct shell requests to reconstruct exec requests from `PermissionProfile` plus runtime filesystem/network policy, without carrying a cached legacy policy. - Adjusts sandboxing manager tests to assert the effective profile rather than the removed legacy field. ## Verification - `cargo check -p codex-config -p codex-core -p codex-sandboxing -p codex-app-server -p codex-cli -p codex-tui` - `cargo test -p codex-sandboxing manager` - `cargo test -p codex-core exec_server_params_use_env_policy_overlay_contract` - `cargo test -p codex-core unix_escalation` - `cargo test -p codex-core exec::tests` - `cargo test -p codex-core sandboxing::tests`	2026-04-26 22:11:49 -07:00
Michael Bolin	0ccd659b4b	permissions: store only constrained permission profiles (#19735 )	2026-04-26 20:59:58 -07:00
Michael Bolin	2cb8746457	permissions: remove core legacy policy round trips (#19394 ) ## Why Several execution paths still converted profile-backed permissions into `SandboxPolicy` and then rebuilt runtime permissions from that legacy shape. Those round trips are unnecessary after the preceding PRs and can lose split filesystem semantics. Core approval and escalation should carry the resolved profile directly. ## What Changed - Removes `sandbox_policy` from `ResolvedPermissionProfile`; the resolved permission object now carries the canonical `PermissionProfile` directly. - Updates exec-policy fallback, shell/unified-exec interception, escalation reruns, and related tests to pass profiles instead of legacy policies. - Removes legacy additional-permission merge helpers that built an effective `SandboxPolicy` before rebuilding runtime permissions. - Keeps legacy projections only at compatibility boundaries that still require `SandboxPolicy`, not in core permission computation. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394). * #19737 * #19736 * #19735 * #19734 * #19395 * __->__ #19394	2026-04-26 17:43:32 -07:00
Andrey Mishchenko	35bc6e3d01	Delete unused ResponseItem::Message.end_turn (#19605 ) This field is unused. Delete it.	2026-04-26 17:18:09 -07:00
Michael Bolin	dda8199b73	permissions: migrate approval and sandbox consumers to profiles (#19393 ) ## Why Runtime decisions should not infer permissions from the lossy legacy sandbox projection once `PermissionProfile` is available. In particular, `Disabled` and `External` need to remain distinct, and managed profiles with split filesystem or deny-read rules should not be collapsed before approval, network, safety, or analytics code makes decisions. ## What Changed - Changes managed network proxy setup and network approval logic to use `PermissionProfile` when deciding whether a managed sandbox is active. - Migrates patch safety, Guardian/user-shell approval paths, Landlock helper setup, analytics sandbox classification, and selected turn/session code to profile-backed permissions. - Validates command-level profile overrides against the constrained `PermissionProfile` rather than a strict `SandboxPolicy` round trip. - Preserves configured deny-read restrictions when command profiles are narrowed. - Adds coverage for profile-backed trust, network proxy/approval behavior, patch safety, analytics classification, and command-profile narrowing. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393). * #19395 * #19394 * __->__ #19393	2026-04-26 15:30:40 -07:00
pakrym-oai	9c3abcd46c	[codex] Move config loading into codex-config (#19487 ) ## Why Config loading had become split across crates: `codex-config` owned the config types and merge logic, while `codex-core` still owned the loader that assembled the layer stack. This change consolidates that responsibility in `codex-config`, so the crate that defines config behavior also owns how configs are discovered and loaded. To make that move possible without reintroducing the old dependency cycle, the shell-environment policy types and helpers that `codex-exec-server` needs now live in `codex-protocol` instead of flowing through `codex-config`. This also makes the migrated loader tests more deterministic on machines that already have managed or system Codex config installed by letting tests override the system config and requirements paths instead of reading the host's `/etc/codex`. ## What Changed - moved the config loader implementation from `codex-core` into `codex-config::loader` and deleted the old `core::config_loader` module instead of leaving a compatibility shim - moved shell-environment policy types and helpers into `codex-protocol`, then updated `codex-exec-server` and other downstream crates to import them from their new home - updated downstream callers to use loader/config APIs from `codex-config` - added test-only loader overrides for system config and requirements paths so loader-focused tests do not depend on host-managed config state - cleaned up now-unused dependency entries and platform-specific cfgs that were surfaced by post-push CI ## Testing - `cargo test -p codex-config` - `cargo test -p codex-core config_loader_tests::` - `cargo test -p codex-protocol -p codex-exec-server -p codex-cloud-requirements -p codex-rmcp-client --lib` - `cargo test --lib -p codex-app-server-client -p codex-exec` - `cargo test --no-run --lib -p codex-app-server` - `cargo test -p codex-linux-sandbox --lib` - `cargo shear` - `just bazel-lock-check` ## Notes - I did not chase unrelated full-suite failures outside the migrated loader surface. - `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive failures on this machine, and Windows CI still shows unrelated long-running/timeouting test noise outside the loader migration itself.	2026-04-26 15:10:53 -07:00
Michael Bolin	deaa307fb2	permissions: derive compatibility policies from profiles (#19392 ) ## Why After #19391, `PermissionProfile` and the split filesystem/network policies could still be stored in parallel. That creates drift risk: a profile can preserve deny globs, external enforcement, or split filesystem entries while a cached projection silently loses those details. This PR makes the profile the runtime source and derives compatibility views from it. ## What Changed - Removes stored filesystem/network sandbox projections from `Permissions` and `SessionConfiguration`; their accessors now derive from the canonical `PermissionProfile`. - Derives legacy `SandboxPolicy` snapshots from profiles only where an older API still needs that field. - Updates MCP connection and elicitation state to track `PermissionProfile` instead of `SandboxPolicy` for auto-approval decisions. - Adds semantic filesystem-policy comparison so cwd changes can preserve richer profiles while still recognizing equivalent legacy projections independent of entry ordering. - Updates config/session tests to assert profile-derived projections instead of parallel stored fields. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392). * #19395 * #19394 * #19393 * __->__ #19392	2026-04-26 15:06:42 -07:00
Michael Bolin	4d7ce3447d	permissions: make runtime config profile-backed (#19606 ) ## Why This supersedes #19391. During stack repair, GitHub marked #19391 as merged into a temporary stack branch rather than into `main`, so the runtime-config change needed a fresh PR. `PermissionProfile` is now the canonical permissions shape after #19231 because it can distinguish `Managed`, `Disabled`, and `External` enforcement while also carrying filesystem rules that legacy `SandboxPolicy` cannot represent cleanly. Core config and session state still needed to accept profile-backed permissions without forcing every profile through the strict legacy bridge, which rejected valid runtime profiles such as direct write roots. The unrelated CI/test hardening that previously rode along with this PR has been split into #19683 so this PR stays focused on the permissions model migration. ## What Changed - Adds `Permissions.permission_profile` and `SessionConfiguration.permission_profile` as constrained runtime state, while keeping `sandbox_policy` as a legacy compatibility projection. - Introduces profile setters that keep `PermissionProfile`, split filesystem/network policies, and legacy `SandboxPolicy` projections synchronized. - Uses a compatibility projection for requirement checks and legacy consumers instead of rejecting profiles that cannot round-trip through `SandboxPolicy` exactly. - Updates config loading, config overrides, session updates, turn context plumbing, prompt permission text, sandbox tags, and exec request construction to carry profile-backed runtime permissions. - Preserves configured deny-read entries and `glob_scan_max_depth` when command/session profiles are narrowed. - Adds `PermissionProfile::read_only()` and `PermissionProfile::workspace_write()` presets that match legacy defaults. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606). * #19395 * #19394 * #19393 * #19392 * __->__ #19606	2026-04-26 13:29:54 -07:00
viyatb-oai	9aaa5d9358	[codex] Bypass managed network for escalated exec (#19595 ) ## Why `sandbox_permissions = "require_escalated"` is treated as an explicit request to approve the command and run it outside the filesystem/platform sandbox. Before this change, shell and unified exec still registered managed network approval context and could inject Codex-managed proxy state into the child process, which meant an approved escalated command could still hit a second network approval path. This PR makes that escalation boundary consistent: once a command is explicitly approved to run outside the sandbox, Codex does not also route that process through the managed network proxy. ## Security impact Command/filesystem sandbox approval now implies network approval for that command. If an untrusted command or script is allowed to run with `require_escalated`, its network calls are unsandboxed: Codex-managed network allowlists and denylists are not respected for that process, so the command can exfiltrate any data it can read. ## What changed - Skip managed network approval specs for `SandboxPermissions::RequireEscalated`. - Pass `network: None` into shell, zsh-fork shell, and unified exec sandbox preparation for explicitly escalated requests. - Strip Codex-managed proxy environment variables when `CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env when the Codex marker is absent. - Add regression coverage for the prepared exec request so the old behavior cannot silently reappear. ## Verification - `cargo test -p codex-core explicit_escalation` - `cargo clippy -p codex-core --all-targets -- -D warnings`	2026-04-25 23:23:58 +00:00
Eric Traut	4167628622	Add goal core runtime (4 / 5) (#18076 ) Adds the core runtime behavior for active goals on top of the model tools from PR 3. ## Why A long-running goal should be a core runtime concern, not something every client has to implement. Core owns the turn lifecycle, tool completion boundaries, interruptions, resume behavior, and token usage, so it is the right place to account progress, enforce budgets, and decide when to continue work. ## What changed - Centralized goal lifecycle side effects behind `Session::goal_runtime_apply(GoalRuntimeEvent::...)`. - Starts goal continuation turns only when the session is idle; pending user input and mailbox work take priority. - Accounts token and wall-clock usage at turn, tool, mutation, interrupt, and resume boundaries; `get_thread_goal` remains read-only. - Preserves sub-second wall-clock remainder across accounting boundaries so long-running goals do not drift downward over time. - Treats token budget exhaustion as a soft stop by marking the goal `budget_limited` and injecting wrap-up steering instead of aborting the active turn. - Suppresses budget steering when `update_goal` marks a goal complete. - Pauses active goals on interrupt and auto-reactivates paused goals when a thread resumes outside plan mode. - Suppresses repeated automatic continuation when a continuation turn makes no tool calls. - Added continuation and budget-limit prompt templates. ## Verification - Added focused core coverage for continuation scheduling, accounting boundaries, budget-limit steering, completion accounting, interrupt pause behavior, resume auto-activation, and wall-clock remainder accounting.	2026-04-24 21:16:00 -07:00
Eric Traut	32ace07ac5	Add goal model tools (3 / 5) (#18075 ) Adds the model-facing goal tools on top of the app-server API from PR 2. ## Why Once goals are persisted and exposed to clients, the model needs a small, constrained tool surface for goal workflows. The tool contract should let the model inspect goals, create them only when explicitly requested, and mark them complete without giving it broad control over user/runtime-owned state. ## What changed - Added `get_goal`, `create_goal`, and `update_goal` tool specs behind the `goals` feature flag. - Added core goal tool handlers that validate objectives and token budgets before mutating persisted state. - Constrained `create_goal` to create only when no goal exists, with optional `token_budget` only when a budget is explicitly provided. - Tightened the `create_goal` instructions so the model does not infer goals from ordinary task requests. - Constrained `update_goal` to expose only goal completion; pause, resume, clear, and budget-limited transitions remain user- or runtime-controlled. - Registered the goal tools in the tool registry and kept them out of review contexts where they should not appear. ## Verification - Added tool-registry coverage for feature gating and tool availability. - Added core session tests for create/get/update behavior, duplicate goal rejection, budget validation, and completion-only updates.	2026-04-24 20:54:40 -07:00
Curtis 'Fjord' Hawthorne	8a559e7938	Remove js_repl feature (#19410 )	2026-04-24 17:49:29 -07:00
Michael Bolin	789f387982	permissions: remove legacy read-only access modes (#19449 ) ## Why `ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`: `FullAccess` meant the historical read-only/workspace-write modes could read the full filesystem, while `Restricted` tried to carry partial readable roots. The partial-read model now belongs in `FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on `SandboxPolicy` makes every legacy projection reintroduce lossy read-root bookkeeping and creates unnecessary noise in the rest of the permissions migration. This PR makes the legacy policy model narrower and explicit: `SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent the old full-read sandbox modes only. Split readable roots, deny-read globs, and platform-default/minimal read behavior stay in the runtime permissions model. ## What changed - Removes `ReadOnlyAccess` from `codex_protocol::protocol::SandboxPolicy`, including the generated `access` and `readOnlyAccess` API fields. - Updates legacy policy/profile conversions so restricted filesystem reads are represented only by `FileSystemSandboxPolicy` / `PermissionProfile` entries. - Keeps app-server v2 compatible with legacy `fullAccess` read-access payloads by accepting and ignoring that no-op shape, while rejecting legacy `restricted` read-access payloads instead of silently widening them to full-read legacy policies. - Carries Windows sandbox platform-default read behavior with an explicit override flag instead of depending on `ReadOnlyAccess::Restricted`. - Refreshes generated app-server schema/types and updates tests/docs for the simplified legacy policy shape. ## Verification - `cargo check -p codex-app-server-protocol --tests` - `cargo check -p codex-windows-sandbox --tests` - `cargo test -p codex-app-server-protocol sandbox_policy_` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449). * #19395 * #19394 * #19393 * #19392 * #19391 * __->__ #19449	2026-04-24 17:16:58 -07:00
Michael Bolin	13e0ec1614	permissions: make legacy profile conversion cwd-free (#19414 ) ## Why The profile conversion path still required a `cwd` even when it was only translating a legacy `SandboxPolicy` into a `PermissionProfile`. That made profile producers invent an ambient `cwd`, which is exactly the anchoring we are trying to remove from permission-profile data. A legacy workspace-write policy can be represented symbolically instead: `:cwd = write` plus read-only `:project_roots` metadata subpaths. This PR creates that cwd-free base so the rest of the stack can stop threading cwd through profile construction. Callers that actually need a concrete runtime filesystem policy for a specific cwd still have an explicitly named cwd-bound conversion. ## What Changed - `PermissionProfile::from_legacy_sandbox_policy` now takes only `&SandboxPolicy`. - `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the symbolic, cwd-free projection for profiles. - The old concrete projection is retained as `FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for runtime/boundary code that must materialize legacy cwd behavior. - Workspace-write profiles preserve `CurrentWorkingDirectory` and `ProjectRoots` special entries instead of materializing cwd into absolute paths. ## Verification - `cargo check -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p codex-sandboxing -p codex-linux-sandbox -p codex-analytics --tests` - `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p codex-sandboxing -p codex-linux-sandbox -p codex-analytics` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414). * #19395 * #19394 * #19393 * #19392 * #19391 * __->__ #19414	2026-04-24 13:42:05 -07:00
jif-oai	28742866c7	Add agents.interrupt_message for interruption markers (#19351 ) ## Why Agent interruptions currently always persist a model-visible interrupted-turn marker before emitting `TurnAborted`. That marker is useful by default because it gives the next model turn context about a deliberately interrupted task, but some deployments need to suppress that history injection entirely while still keeping the client-visible interruption event. ## What changed - Add `[agents] interrupt_message = false` to disable the model-visible interrupted-turn marker. - Resolve the setting into `Config::agent_interrupt_message_enabled`, defaulting to `true` so existing behavior is unchanged. - Apply the setting to both live interrupted turns and interrupted fork snapshots. - Keep emitting `TurnAborted` even when the history marker is disabled. - Regenerate `core/config.schema.json` for the new `agents.interrupt_message` field. ## Testing - `cargo test -p codex-core load_config_resolves_agent_interrupt_message -- --nocapture` - `cargo test -p codex-core disabled_interrupted_fork_snapshot_appends_only_interrupt_event -- --nocapture` - `cargo test -p codex-core multi_agent_v2_interrupted_marker_uses_developer_input_message -- --nocapture` - `cargo test -p codex-core multi_agent_v2_followup_task_can_disable_interrupted_marker -- --nocapture` - `cargo test -p codex-core multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message -- --nocapture` - `cargo check -p codex-core`	2026-04-24 16:02:45 +02:00
jif-oai	deb4509302	feat: surface multi-agent thread limit in spawn description (#19360 ) ## Summary - Thread `agent_max_threads` into `ToolsConfig` and `SpawnAgentToolOptions`. - Render the configured `max_concurrent_threads_per_session` value in the MultiAgentV2 `spawn_agent` description. - Cover the description text in `codex-tools` unit tests and `codex-core` tool spec tests. ## Validation - `just fmt` - `cargo test -p codex-tools` - `cargo test -p codex-core spawn_agent_description` - `git diff --check` ## Notes - `cargo test -p codex-core` was also attempted, but unrelated environment-sensitive tests failed with the active local environment. Examples: approvals reviewer defaults observed `AutoReview` instead of `User`, request-permissions event tests did not emit events, and proxy-env tests saw `http://127.0.0.1:50604` from the active proxy environment. Co-authored-by: Codex <noreply@openai.com>	2026-04-24 15:13:54 +02:00
jif-oai	120aa07d81	Make MultiAgentV2 interruption markers assistant-authored (#19124 ) ## Why `MultiAgentV2` follow-up messages are delivered to agents as assistant-authored `InterAgentCommunication` envelopes. When `followup_task` used `interrupt: true`, the interrupted-turn guidance was still persisted as a contextual user message, so model-visible history made a system-generated interruption boundary look user-authored. This keeps interruption guidance consistent with the rest of the v2 inter-agent message stream while preserving the legacy marker shape for non-v2 sessions. ## What changed - Make `interrupted_turn_history_marker` feature-aware. - Record the interrupted-turn marker as an assistant `OutputText` message when `Feature::MultiAgentV2` is enabled. - Keep the existing user contextual fragment for non-v2 sessions. - Apply the same feature-aware marker to interrupted fork snapshots. - Add coverage for the live `followup_task` interrupt path and the helper-level v2 marker shape. ## Testing - `cargo test -p codex-core multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message -- --nocapture` - `cargo test -p codex-core multi_agent_v2_interrupted_marker_uses_assistant_output_message -- --nocapture` - `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`	2026-04-24 13:39:26 +02:00
sayan-oai	e083b6c757	chore: apply truncation policy to unified_exec (#19247 ) we were not respecting turn's `truncation_policy` to clamp output tokens for `unified_exec` and `write_stdin`. this meant truncation was only being applied by `ContextManager` before the output was stored in-memory (so it _was_ being truncated from model-visible context), but the full output was persisted to rollout on disk. now we respect that `truncation_policy` and `ContextManager`-level truncation remains a backup. ### Tests added tests, tested locally.	2026-04-24 00:17:39 -07:00
Eric Traut	ac8c9fc49c	Reject unsupported js_repl image MIME types (#19292 ) ## Summary `codex.emitImage` accepted arbitrary image MIME types for byte payloads and data URLs. That allowed a value like `image/rgba` to be wrapped as an `input_image`, even though it is not a supported encoded image format, so the invalid image could reach the model-input path and trigger output sanitization. This results in a panic in debug builds because the output sanitization is meant as a final safety net, not a primary means of rejecting invalid image types. I've hit this case multiple times when executing certain long-running tasks. This PR rejects unsupported image MIME types before they are emitted from `js_repl`. ## Changes - Validate `codex.emitImage({ bytes, mimeType })` in the JS kernel so only encoded PNG, JPEG, WebP, or GIF payloads are accepted. - Apply the same MIME allowlist to direct image data URLs, including the Rust host-side validation path. - Clarify the JS REPL instructions so agents know byte payloads must already be encoded as PNG/JPEG/WebP/GIF.	2026-04-24 00:14:51 -07:00
Michael Bolin	4816b89204	permissions: make profiles represent enforcement (#19231 ) ## Why `PermissionProfile` is becoming the canonical permissions abstraction, but the old shape only carried optional filesystem and network fields. It could describe allowed access, but not who is responsible for enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy when profiles were exported, cached, or round-tripped through app-server APIs. The important model change is that active permissions are now a disjoint union over the enforcement mode. Conceptually: ```rust pub enum PermissionProfile { Managed { file_system: FileSystemSandboxPolicy, network: NetworkSandboxPolicy, }, Disabled, External { network: NetworkSandboxPolicy, }, } ``` This distinction matters because `Disabled` means Codex should apply no outer sandbox at all, while `External` means filesystem isolation is owned by an outside caller. Those are not equivalent to a broad managed sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an inner sandbox may require the outer Codex layer to use no sandbox rather than a permissive one. ## How Existing Modeling Maps Legacy `SandboxPolicy` remains a boundary projection, but it now maps into the higher-fidelity profile model: - `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed` with restricted filesystem entries plus the corresponding network policy. - `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving the “no outer sandbox” intent instead of treating it as a lax managed sandbox. - `ExternalSandbox { network_access }` maps to `PermissionProfile::External { network }`, preserving external filesystem enforcement while still carrying the active network policy. - Split runtime policies that legacy `SandboxPolicy` cannot faithfully express, such as managed unrestricted filesystem plus restricted network, stay `Managed` instead of being collapsed into `ExternalSandbox`. - Per-command/session/turn grants remain partial overlays via `AdditionalPermissionProfile`; full `PermissionProfile` is reserved for complete active runtime permissions. ## What Changed - Change active `PermissionProfile` into a tagged union: `managed`, `disabled`, and `external`. - Keep partial permission grants separate with `AdditionalPermissionProfile` for command/session/turn overlays. - Represent managed filesystem permissions as either `restricted` entries or `unrestricted`; `glob_scan_max_depth` is non-zero when present. - Preserve old rollout compatibility by accepting the pre-tagged `{ network, file_system }` profile shape during deserialization. - Preserve fidelity for important edge cases: `DangerFullAccess` round-trips as `disabled`, `ExternalSandbox` round-trips as `external`, and managed unrestricted filesystem + restricted network stays managed instead of being mistaken for external enforcement. - Preserve configured deny-read entries and bounded glob scan depth when full profiles are projected back into runtime policies, including unrestricted replacements that now become `:root = write` plus deny entries. - Regenerate the experimental app-server v2 JSON/TypeScript schema and update the `command/exec` README example for the tagged `permissionProfile` shape. ## Compatibility Legacy `SandboxPolicy` remains available at config/API boundaries as the compatibility projection. Existing rollout lines with the old `PermissionProfile` shape continue to load. The app-server `permissionProfile` field is experimental, so its v2 wire shape is intentionally updated to match the higher-fidelity model. ## Verification - `just write-app-server-schema` - `cargo check --tests` - `cargo test -p codex-protocol permission_profile` - `cargo test -p codex-protocol preserving_deny_entries_keeps_unrestricted_policy_enforceable` - `cargo test -p codex-app-server-protocol permission_profile_file_system_permissions` - `cargo test -p codex-app-server-protocol serialize_client_response` - `cargo test -p codex-core session_configured_reports_permission_profile_for_external_sandbox` - `just fix` - `just fix -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-core` - `just fix -p codex-app-server`	2026-04-23 23:02:18 -07:00
starr-openai	49fb25997f	Add sticky environment API and thread state (#18897 ) ## Summary - add sticky environment selections to app-server v2 thread/start and turn/start request flow - carry thread-level selections through core session/thread state - add app-server coverage for sticky selections and turn overrides ## Stack 1. This PR: API and thread persistence 2. #18898: config.toml named environment loading 3. #18899: downstream tool/runtime consumers ## Validation - Not run locally; split only. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-23 18:57:13 -07:00
cassirer-openai	e3c8720a99	[rollout_trace] Add debug trace reduction command (#18880 ) ## Summary Adds the debug CLI entry point for reducing recorded rollout traces. This gives developers a direct way to inspect whether the emitted trace stream reduces into the expected conversation/runtime model. ## Stack This is PR 5/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This PR is intentionally last: it depends on the trace crate, core recorder, runtime/tool events, and session/agent edge data all existing. The command should remain a debug/developer tool and avoid adding new runtime behavior. The useful review question is whether the CLI exposes the reducer in the smallest practical way for local inspection without turning the debug command into a supported user-facing workflow.	2026-04-24 01:56:48 +00:00
Michael Bolin	9c0eced391	shell-escalation: carry resolved permission profiles (#18287 ) ## Why Shell escalation still has adapter code that expects a legacy sandbox policy, but command approvals should carry the resolved `PermissionProfile` so callers can reason about the granted permissions canonically. ## What changed This introduces profile-shaped resolved escalation permissions while retaining the derived legacy sandbox policy for the Unix escalation adapter. It updates approval types, the escalation server protocol, and tests that inspect escalated command permissions. ## Verification - `cargo test -p codex-core --test all handle_container_exec_ -- --nocapture` - `cargo test -p codex-core --test all handle_sandbox_ -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18287). * #18288 * __->__ #18287	2026-04-23 12:46:19 -07:00
cassirer-openai	6d09b6752d	[rollout_trace] Trace tool and code-mode boundaries (#18878 ) ## Summary Extends rollout tracing across tool dispatch and code-mode runtime boundaries. This records canonical tool-call lifecycle events and links code-mode execution/wait operations back to the model-visible calls that caused them. ## Stack This is PR 3/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This PR is about attribution. Reviewers should focus on whether direct tool calls, code-mode-originated tool calls, waits, outputs, and cancellation boundaries are recorded with enough source information for deterministic reduction without coupling the reducer to live runtime internals. The stack remains valid after this layer: tool and code-mode traces reduce through the existing crate model, while the broader session and multi-agent relationships are added in the next PR.	2026-04-23 12:22:11 -07:00
xl-openai	198eddd25d	Move marketplace add/remove and startup sync out of core. (#19099 ) Move more things to core-plugins. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-23 11:27:17 -07:00
jif-oai	a2f868c9d6	feat: drop spawned-agent context instructions (#19127 ) ## Why MultiAgentV2 children should not receive an extra model-visible developer fragment just because they were spawned. The parent/configured developer instructions should carry through normally, but the dedicated `<spawned_agent_context>` block is no longer desired. ## What changed - Removed the `SpawnAgentInstructions` context fragment and its `<spawned_agent_context>` wrapper. - Stopped appending spawned-agent instructions in `codex-rs/core/src/tools/handlers/multi_agents_v2/spawn.rs`. - Updated subagent notification coverage to assert inherited parent developer instructions without expecting the spawned-agent wrapper. ## Verification - `cargo test -p codex-core --test all spawned_multi_agent_v2_child_inherits_parent_developer_context -- --nocapture` - `cargo test -p codex-core --test all skills_toggle_skips_instructions_for_parent_and_spawned_child -- --nocapture` - `cargo test -p codex-core --test all subagent_notifications -- --nocapture`	2026-04-23 18:54:45 +02:00
Abhinav	305825abd9	Support MCP tools in hooks (#18385 ) ## Summary Lifecycle hooks currently treat `PreToolUse`, `PostToolUse`, and `PermissionRequest` as Bash-only flows - hook schema constrains `tool_name` to `Bash` - hook input assumes a command-shaped `tool_input` - core hook dispatch path passes only shell command strings That means hooks cannot target MCP tools even though MCP tool names are model-visible and stable This change generalizes those hook paths so they can match and receive payloads for MCP tools while preserving the existing Bash behavior. ## Reviewer Notes I think these are the key files - `codex-rs/core/src/tools/handlers/mcp.rs` - `codex-rs/core/src/mcp_tool_call.rs` Otherwise the changes across apply_patch, shell, and unified_exec are mainly to rewire everything to be `tool_input` based instead of just `command` so that it'll make sense for MCP tools. ## Changes - Allow `PreToolUse`, `PostToolUse`, and `PermissionRequest` hook inputs to carry arbitrary `tool_name` and `tool_input` values instead of hard-coding `Bash` and command-only payloads. - Add MCP hook payload support through `McpHandler`, using the model-visible tool name from `ToolInvocation` and the raw MCP arguments as `tool_input`. - Include MCP tool responses in `PostToolUse` by serializing `McpToolOutput` into the hook response payload. - Run `PermissionRequest` hooks for MCP approval requests after remembered approval checks and before falling back to user-facing MCP elicitation. - Preserve exact matching for literal hook matchers like `Bash` and `mcp__memory__create_entities`, while keeping regex matcher support for patterns like `mcp__memory__.` and `mcp__.__write.*`. --------- Co-authored-by: Andrei Eternal <eternal@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-04-23 07:33:57 +00:00
Dylan Hurd	5e71da1424	feat(request-permissions) approve with strict review (#19050 ) ## Summary Allow the user to approve a request_permissions_tool request with the condition that all commands in the rest of the turn are reviewed by guardian, regardless of sandbox status. ## Testing - [x] Added unit tests - [x] Ran locally	2026-04-23 01:56:32 +00:00
Michael Bolin	3cc3763e6c	core: box multi-agent wrapper futures (#19059 ) ## Why While debugging the Windows stack overflows we saw in [#13429](https://github.com/openai/codex/pull/13429) and then again in [#18893](https://github.com/openai/codex/pull/18893), I hit another overflow in `tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`. That test drives the legacy multi-agent spawn / close / resume path. The behavior was fine, but several thin async wrappers were still inlining much larger `AgentControl` futures into their callers, which was enough to overflow the default Windows stack. ## What - Box the thin `AgentControl` wrappers around `spawn_agent_internal`, `resume_single_agent_from_rollout`, and `shutdown_agent_tree`. - Box the corresponding legacy `multi_agents` handler calls in `spawn`, `resume_agent`, and `close_agent`. - Keep behavior unchanged while reducing future size on this call path so the Windows test no longer overflows its stack. ## Testing - `cargo test -p codex-core --lib tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed -- --exact --nocapture` - `cargo test -p codex-core` (this still hit unrelated local integration-test failures because `codex.exe` / `test_stdio_server.exe` were not present in this shell; the relevant unit tests passed)	2026-04-22 17:48:13 -07:00
Andrei Eternal	eed0e07825	hooks: emit Bash PostToolUse when exec_command completes via write_stdin (#18888 ) Fixes #16246. ## Why `exec_command` already emits `PreToolUse`, but long-running unified exec commands that finish on a later `write_stdin` poll could miss the matching `PostToolUse`. That left the Bash hook lifecycle inconsistent, broke expectations around `tool_use_id` and `tool_input.command`, and meant `PostToolUse` block/replacement feedback could fail to replace the final session output before it reached model context. This keeps the fix scoped to the `exec_command` / `write_stdin` lifecycle. Broader non-Bash hook expansion is still out of scope here and remains tracked separately in #16732. ## What changed - Compute and store `PostToolUsePayload` while handlers still have access to their concrete output type, and carry `tool_use_id` through that payload. - Preserve the original hook-facing `exec_command` string through unified exec state (`ExecCommandRequest`, `ProcessEntry`, `PreparedProcessHandles`, and `ExecCommandToolOutput`) via `hook_command`, and remove the now-unused `session_command` output metadata. - Emit exactly one Bash `PostToolUse` for long-running `exec_command` sessions when a later `write_stdin` poll observes final completion, using the original `exec_command` call id and hook-facing command. - Keep one-shot `exec_command` behavior aligned with the same payload construction, including interactive completions that return a final result directly. - Apply `PostToolUse` block/replacement feedback before the final `write_stdin` completion output is sent back to the model. - Keep `write_stdin` itself out of `PreToolUse` matching so it continues to act as transport/polling for the original Bash tool call. - Restore plain matcher behavior for tool-name matchers such as `Bash` and `Edit\|Write`, while still treating patterns with regex characters (for example `mcp__.*`) as regexes. - Add unit coverage for unified exec payload construction and parallel session separation, plus a core integration regression that verifies a blocked `PostToolUse` replaces the final `write_stdin` output in model context. ## Testing - `cargo test -p codex-hooks` - `cargo test -p codex-core post_tool_use_payload` - `cargo test -p codex-core post_tool_use_blocks_when_exec_session_completes_via_write_stdin`	2026-04-22 17:14:22 -07:00
viyatb-oai	2d73bac45f	feat: add guardian network approval trigger context (#18197 ) ## Summary Give guardian network-access reviews the command context that triggered a managed-network approval. The prompt JSON now includes the originating tool call id, tool name, command argv, cwd, sandbox permissions, additional permissions, justification, and tty state when a single active tool call can be attributed. The implementation keeps the trigger shape canonical by serializing `GuardianNetworkAccessTrigger` directly and lets each runtime build that trigger from its `ToolCtx`. Non-guardian approval prompts avoid cloning the full trigger payload. ## UX changes Guardian network-access reviews now include a `trigger` object that explains what command caused the network approval. Instead of seeing only the requested host, the guardian reviewer can also see the originating tool call, argv, working directory, sandbox mode, justification, and tty state. Example payload the guardian reviewer can see: ```json { "tool": "network_access", "target": "https://api.github.com:443", "host": "api.github.com", "protocol": "https", "port": 443, "trigger": { "callId": "call_abc123", "toolName": "shell", "command": ["gh", "api", "/repos/openai/codex/pulls/18197"], "cwd": "/workspace/codex", "sandboxPermissions": "require_escalated", "justification": "Fetch PR metadata from GitHub.", "tty": false } } ``` The network review itself remains scoped to the network decision: `target_item_id` stays `null`. `trigger.callId` is attribution context only, so clients can still distinguish network reviews from item-targeted command reviews. ## Verification - Added coverage for serializing network trigger context in guardian approval JSON. - Added regression coverage that network guardian reviews do not reuse `trigger.callId` as `target_item_id`. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-22 14:00:53 -07:00
jif-oai	639382609f	fix: wait_agent timeout for queued mailbox mail (#18968 ) ## Why `wait_agent` can be called while mailbox mail is already pending. The previous implementation subscribed for future mailbox sequence changes and then waited for the next notification. If the mail was queued before that wait started, no new notification arrived, so the tool could sit until `timeout_ms` even though mail was ready to deliver. ## What Changed - Added `Session::has_pending_mailbox_items()` for checking pending mailbox mail through the session API. - Updated `multi_agents_v2::wait` to return immediately when pending mailbox mail already exists before sleeping on a new mailbox sequence update. - Reworked the regression coverage in `multi_agents_tests.rs` so already queued mailbox mail must wake `wait_agent` promptly. Relevant code: - [`wait_agent` pending-mail check](`aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_v2/wait.rs (L55-L60)`) - [`Session::has_pending_mailbox_items`](`aa8ca06e83/codex-rs/core/src/session/mod.rs (L2979-L2981)`) - [`multi_agent_v2_wait_agent_returns_for_already_queued_mail`](`aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_tests.rs (L2854)`) ## Verification - `cargo test -p codex-core multi_agent_v2_wait_agent_returns_for_already_queued_mail`	2026-04-22 11:16:17 +01:00
rhan-oai	213b17b7a3	[codex-analytics] guardian review TTFT plumbing and emission (#17696 ) ## Why Guardian analytics includes time-to-first-token, but the Guardian reviewer runs as a normal Codex session and `TurnCompleteEvent` did not expose TTFT. The timing needs to flow through the standard turn-completion protocol so Guardian review analytics can consume the same value as the rest of the session machinery. ## What changed Adds optional `time_to_first_token_ms` to `TurnCompleteEvent` and populates it from `TurnTiming`. The value is carried through app-server thread history, rollout reconstruction, TUI/app-server adapters, and Guardian review session handling. Guardian review analytics now captures TTFT from the reviewer turn-complete event when available. Existing tests and fixtures are updated to set the new optional field to `None` where TTFT is not relevant. ## Verification - `cargo clippy -p codex-tui --tests -- -D warnings` - `cargo clippy -p codex-core --lib --tests -- -D warnings` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17696). * __->__ #17696 * #17695 * #17693 * #18278 * #18953	2026-04-22 01:52:48 -07:00
Michael Bolin	36f8bb4ffa	exec-server: carry filesystem sandbox profiles (#18276 ) ## Why The exec-server still needs platform sandbox inputs, but the migration should preserve the `PermissionProfile` that produced them. Keeping only the derived legacy sandbox map would keep `SandboxPolicy` as the effective abstraction and would make full-disk vs. restricted profiles harder to preserve as the permissions stack starts round-tripping profiles. `PermissionProfile` entries can also be cwd-sensitive (`:cwd`, `:project_roots`, relative globs), so the exec-server must carry the request sandbox cwd instead of resolving those entries against the long-lived exec-server process cwd. ## What changed `FileSystemSandboxContext` now carries `permissions: PermissionProfile` plus an optional `cwd`: - removed `sandboxPolicy`, `sandboxPolicyCwd`, `fileSystemSandboxPolicy`, and `additionalPermissions` - added `permissions` and `cwd` - kept the platform knobs `windowsSandboxLevel`, `windowsSandboxPrivateDesktop`, and `useLegacyLandlock` Core turn and apply-patch paths populate the context from the active runtime permissions and request cwd. Exec-server derives platform `SandboxPolicy`/`FileSystemSandboxPolicy` at the filesystem boundary, adds helper runtime reads there, and rejects cwd-dependent profiles that arrive without a cwd. The legacy `FileSystemSandboxContext::new(SandboxPolicy)` constructor now preserves the old workspace-write conversion semantics for compatibility tests/callers. ## Verification - `cargo test -p codex-exec-server` - `cargo test -p codex-exec-server sandbox_cwd -- --nocapture` - `cargo test -p codex-exec-server sandbox_context_new_preserves_legacy_workspace_write_read_only_subpaths -- --nocapture` - `cargo test -p codex-core --lib file_system_sandbox_context_uses_active_attempt -- --nocapture`	2026-04-21 20:22:28 -07:00
Felipe Coury	09ebc34f17	fix(core): emit hooks for apply_patch edits (#18391 ) Fixes https://github.com/openai/codex/issues/16732. ## Why `apply_patch` is Codex's primary file edit path, but it was not emitting `PreToolUse` or `PostToolUse` hook events. That meant hook-based policy, auditing, and write coordination could observe shell commands while missing the actual file mutation performed by `apply_patch`. The issue also exposed that the hook runtime serialized command hook payloads with `tool_name: "Bash"` unconditionally. Even if `apply_patch` supplied hook payloads, hooks would either fail to match it directly or receive misleading stdin that identified the edit as a Bash tool call. ## What Changed - Added `PreToolUse` and `PostToolUse` payload support to `ApplyPatchHandler`. - Exposed the raw patch body as `tool_input.command` for both JSON/function and freeform `apply_patch` calls. - Taught tool hook payloads to carry a handler-supplied hook-facing `tool_name`. - Preserved existing shell compatibility by continuing to emit `Bash` for shell-like tools. - Serialized the selected hook `tool_name` into hook stdin instead of hardcoding `Bash`. - Relaxed the generated hook command input schema so `tool_name` can represent tools other than `Bash`. ## Verification Added focused handler coverage for: - JSON/function `apply_patch` calls producing a `PreToolUse` payload. - Freeform `apply_patch` calls producing a `PreToolUse` payload. - Successful `apply_patch` output producing a `PostToolUse` payload. - Shell and `exec_command` handlers continuing to expose `Bash`. Added end-to-end hook coverage for: - A `PreToolUse` hook matching `^apply_patch$` blocking the patch before the target file is created. - A `PostToolUse` hook matching `^apply_patch$` receiving the patch input and tool response, then adding context to the follow-up model request. - Non-participating tools such as the plan tool continuing not to emit `PreToolUse`/`PostToolUse` hook events. Also validated manually with a live `codex exec` smoke test using an isolated temp workspace and temp `CODEX_HOME`. The smoke test confirmed that a real `apply_patch` edit emits `PreToolUse`/`PostToolUse` with `tool_name: "apply_patch"`, a shell command still emits `tool_name: "Bash"`, and a denying `PreToolUse` hook prevents the blocked patch file from being created.	2026-04-21 22:00:40 -03:00
starr-openai	1d4cc494c9	Add turn-scoped environment selections (#18416 ) ## Summary - add experimental turn/start.environments params for per-turn environment id + cwd selections - pass selections through core protocol ops and resolve them with EnvironmentManager before TurnContext creation - treat omitted selections as default behavior, empty selections as no environment, and non-empty selections as first environment/cwd as the turn primary ## Testing - ran `just fmt` - ran `just write-app-server-schema` - not run: unit tests for this stacked PR --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 17:48:33 -07:00
Michael Bolin	799e50412e	sandboxing: materialize cwd-relative permission globs (#18867 ) ## Why #18275 anchors session-scoped `:cwd` and `:project_roots` grants to the request cwd before recording them for reuse. Relative deny glob entries need the same treatment. Without anchoring, a stored session permission can keep a pattern such as `*/.env` relative, then reinterpret that deny against a later turn cwd. That makes the persisted profile depend on the cwd at reuse time instead of the cwd that was reviewed and approved. ## What changed `intersect_permission_profiles` now materializes retained `FileSystemPath::GlobPattern` entries against the request cwd, matching the existing materialization for cwd-sensitive special paths. Materialized accepted grants are now deduplicated before deny retention runs. This keeps the sticky-grant preapproval shape stable when a repeated request is merged with the stored grant and both `:cwd = write` and the materialized absolute cwd write are present. The preapproval check compares against the same materialized form, so a later request for the same cwd-relative deny glob still matches the stored anchored grant instead of re-prompting or rejecting. Tests cover both the storage path and the preapproval path: a session-scoped `:cwd = write` grant with `*/.env = none` is stored with both the cwd write and deny glob anchored to the original request cwd, cannot be reused from a later cwd, and remains preapproved when re-requested from the original cwd after merging with the stored grant. ## Verification - `cargo test -p codex-sandboxing policy_transforms` - `cargo test -p codex-core --lib relative_deny_glob_grants_remain_preapproved_after_materialization` - `cargo clippy -p codex-sandboxing --tests -- -D clippy::redundant_clone` - `cargo clippy -p codex-core --lib -- -D clippy::redundant_clone` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18867). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * #18282 * #18281 * #18280 * #18279 * #18278 * #18277 * #18276 * __->__ #18867	2026-04-21 17:28:58 -07:00
jif-oai	15b8cde2a4	chore: default multi-agent v2 fork to all (#18873 ) Default sub-agents v2 to `all` for the fork mode	2026-04-21 21:54:58 +01:00
iceweasel-oai	8612714aa6	Add Windows sandbox unified exec runtime support (#15578 ) ## Summary This is the runtime/foundation half of the Windows sandbox unified-exec work. - add Windows sandbox `unified_exec` session support in `windows-sandbox-rs` for both: - the legacy restricted-token backend - the elevated runner backend - extend the PTY/process runtime so driver-backed sessions can support: - stdin streaming - stdout/stderr separation - exit propagation - PTY resize hooks - add Windows sandbox runtime coverage in `codex-windows-sandbox` / `codex-utils-pty` This PR does not enable Windows sandbox `UnifiedExec` for product callers yet because hooking this up to app-server comes in the next PR. Windows sandbox advertising is intentionally kept aligned with `main`, so sandboxed Windows callers still fall back to `ShellCommand`. This PR isolates the runtime/session layer so it can be reviewed independently from product-surface enablement. --------- Co-authored-by: jif-oai <jif@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-04-21 10:44:49 -07:00
Michael Bolin	f8562bd47b	sandboxing: intersect permission profiles semantically (#18275 ) ## Why Permission approval responses must not be able to grant more access than the tool requested. Moving this flow to `PermissionProfile` means the comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and cwd-relative special paths such as `:cwd` and `:project_roots` must stay anchored to the turn that produced the request. ## What changed This implements semantic `PermissionProfile` intersection in `codex-sandboxing` for file-system and network permissions. The intersection accepts narrower path grants, rejects broader grants, preserves deny-read carve-outs and glob scan depth, and materializes cwd-dependent special-path grants to absolute paths before they can be recorded for reuse. The request-permissions response paths now use that intersection consistently. App-server captures the request turn cwd before waiting for the client response, includes that cwd in the v2 approval params, and core stores the requested profile plus cwd for direct TUI/client responses and Guardian decisions before recording turn- or session-scoped grants. The TUI app-server bridge now preserves the app-server request cwd when converting permission approval params into core events. ## Verification - `cargo test -p codex-sandboxing intersect_permission_profiles -- --nocapture` - `cargo test -p codex-app-server request_permissions_response -- --nocapture` - `cargo test -p codex-core request_permissions_response_materializes_session_cwd_grants_before_recording -- --nocapture` - `cargo check -p codex-tui --tests` - `cargo check --tests` - `cargo test -p codex-tui app_server_request_permissions_preserves_file_system_permissions`	2026-04-21 10:23:01 -07:00
pakrym-oai	2a226096f6	Split DeveloperInstructions into individual fragments. (#18813 ) Split DeveloperInstructions into individual fragments.	2026-04-21 10:22:36 -07:00
pash-openai	dc1a8f2190	[tool search] support namespaced deferred dynamic tools (#18413 ) Deferred dynamic tools need to round-trip a namespace so a tool returned by `tool_search` can be called through the same registry key that core uses for dispatch. This change adds namespace support for dynamic tool specs/calls, persists it through app-server thread state, and routes dynamic tool calls by full `ToolName` while still sending the app the leaf tool name. Deferred dynamic tools must provide a namespace; non-deferred dynamic tools may remain top-level. It also introduces `LoadableToolSpec` as the shared function-or-namespace Responses shape used by both `tool_search` output and dynamic tool registration, so dynamic tools use the same wrapping logic in both paths. Validation: - `cargo test -p codex-tools` - `cargo test -p codex-core tool_search` --------- Co-authored-by: Sayan Sisodiya <sayan@openai.com>	2026-04-21 14:13:08 +08:00
Michael Bolin	d62421d322	chore: document intentional await-holding cases (#18423 ) ## Why This PR prepares the stack to enable Clippy await-holding lints that were left disabled in #18178. The mechanical lock-scope cleanup is handled separately; this PR is the documentation/configuration layer for the remaining await-across-guard sites. Without explicit annotations, reviewers and future maintainers cannot tell whether an await-holding warning is a real concurrency smell or an intentional serialization boundary. ## What changed - Configures `clippy.toml` so `await_holding_invalid_type` also covers `tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`. - Adds targeted `#[expect(clippy::await_holding_invalid_type, reason = ...)]` annotations for intentional async guard lifetimes. - Documents the main categories of intentional cases: active-turn state transitions that must remain atomic, session-owned MCP manager accesses, remote-control websocket serialization, JS REPL kernel/process serialization, OAuth persistence, external bearer token refresh serialization, and tests that intentionally serialize shared global or session-owned state. - For external bearer token refresh, documents the existing serialization boundary: holding `cached_token` across the provider command prevents concurrent cache misses from starting duplicate refresh commands, and the current behavior is small enough that an explicit expectation is easier to maintain than adding another synchronization primitive. ## Verification - `cargo clippy -p codex-login --all-targets` - `cargo clippy -p codex-connectors --all-targets` - `cargo clippy -p codex-core --all-targets` - The follow-up PR #18698 enables `await_holding_invalid_type` and `await_holding_lock` as workspace `deny` lints, so any undocumented remaining offender will fail Clippy. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423). * #18698 * __->__ #18423	2026-04-20 22:41:54 -07:00
Dylan Hurd	86535c9901	feat(auto-review) Handle request_permissions calls (#18393 ) ## Summary When auto-review is enabled, it should handle request_permissions tool. We'll need to clean up the UX but I'm planning to do that in a separate pass ## Testing - [x] Ran locally <img width="893" height="396" alt="Screenshot 2026-04-17 at 1 16 13 PM" src="https://github.com/user-attachments/assets/4c045c5f-1138-4c6c-ac6e-2cb6be4514d8" /> --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-20 21:48:57 -07:00
viyatb-oai	33fa952426	fix: fix stale proxy env restoration after shell snapshots (#17271 ) ## Summary This fixes a stale-environment path in shell snapshot restoration. A sandboxed command can source a shell snapshot that was captured while an older proxy process was running. If that proxy has died and come back on a different port, the snapshot can otherwise put old proxy values back into the command environment, which is how tools like `pip` end up talking to a dead proxy. The wrapper now captures the live process environment before sourcing the snapshot and then restores or clears every proxy env var from the proxy crate's canonical list. That makes proxy state after shell snapshot restoration match the current command environment, rather than whatever proxy values happened to be present in the snapshot. On macOS, the Codex-generated `GIT_SSH_COMMAND` is refreshed when the SOCKS listener changes, while custom SSH wrappers are still left alone. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-20 16:39:17 -07:00
Thibault Sottiaux	54bd07d28c	[codex] prefer inherited spawn agent model (#18701 ) This updates the spawn-agent tool contract so subagents are presented as inheriting the parent model by default. The visible model list is now framed as optional overrides, the model parameter tells callers to leave it unset and the delegation guidance no longer nudges models toward picking a smaller/mini override. Fixes reports that 5.4 would occasionally pick 5.2 or lower as sub-agents.	2026-04-20 22:34:08 +00:00

1 2 3 4 5 ...

706 Commits