codex

mirror of https://github.com/openai/codex.git synced 2026-06-01 19:02:59 +00:00

Author	SHA1	Message	Date
sayan-oai	1f93706e99	[codex] Require model for standalone web search (#25131 ) ## Why The standalone `/v1/alpha/search` request now requires a `model`, but the `web.run` extension currently omits it. Adds `model` to extension `ToolCall` invocation. Follow-up to #23823. ## What changed - Make `SearchRequest.model` required. - Expose the effective per-turn model on extension tool calls and pass it in standalone web-search requests. - Assert the model is forwarded in the app-server round-trip test. ## Testing - `just test -p codex-api -p codex-tools -p codex-web-search-extension -p codex-memories-extension -p codex-goal-extension` - `just test -p codex-core -E 'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'` - `just test -p codex-app-server -E 'test(standalone_web_search_round_trips_encrypted_output)'`	2026-05-29 12:03:04 -07:00
Michael Bolin	a1ecf0cf1c	thread-store: store permission profiles (#23165 ) ## Why `SandboxPolicy` is the legacy compatibility shape, but `codex-thread-store` still exposed it through `StoredThread`, `ThreadMetadataPatch`, and live metadata sync. That kept thread-store consumers tied to the legacy representation and meant richer permission profile data could not round-trip through thread metadata or cold rollout reconciliation. ## What Changed - Replaced thread-store `sandbox_policy` API fields with canonical `PermissionProfile` fields. - Persist new permission-profile metadata as canonical JSON in the existing SQLite metadata slot while continuing to read older legacy sandbox policy values. - Updated local, in-memory, live metadata sync, and rollout extraction paths to propagate `TurnContextItem::permission_profile()`. - Re-materialize legacy permission metadata against the final rollout cwd when rollout-derived metadata replaces stale SQLite summaries. - Updated affected app-server and core test constructors to build `PermissionProfile` values directly. ## Test Plan - `cargo test -p codex-state` - `cargo test -p codex-thread-store` - `cargo test -p codex-app-server summary_from_stored_thread_preserves_millisecond_precision --lib` - `cargo test -p codex-core realtime_context --lib`	2026-05-29 11:55:31 -07:00
Channing Conger	c9dc0f6338	code-mode: introduce durable session interface (#24180 ) ## Summary Introduce a `CodeModeSession` interface for executing and managing code-mode cells. This moves cell lifecycle, callback delegation, termination, and shutdown behind a session abstraction, while continuing to use the existing in-process implementation, and the ability to implement an external process one behind this interface. A Codex session owns one `CodeModeSession`, which in turn owns its running cells and stored code-mode state. Each cell is represented to the caller as a `StartedCell`, exposing its cell ID and initial response. It also introduces a `CodeModeSessionDelegate` callback interface. A session uses the delegate to invoke nested host tools and emit notifications while a cell is running, allowing the runtime to communicate with its owning Codex session without depending directly on core turn handling. <img width="2121" height="1001" alt="image" src="https://github.com/user-attachments/assets/c349a819-2a59-485c-bda4-2caf68ac4c31" />	2026-05-29 11:42:52 -07:00
Eric Horacek	451b386442	[exec-server] Kill dropped filesystem helpers (#25116 ) ## Summary - terminate sandbox filesystem helpers when the Tokio child handle is dropped ## Why A sandbox filesystem helper can stall during process startup before reading stdin. If the owning async operation is cancelled or torn down, the spawned helper should not remain running as an orphaned process. Setting `kill_on_drop(true)` gives the filesystem helper the cleanup behavior that Tokio child processes otherwise do not enable by default. This intentionally does not add a timeout. It does not detect or recover an active hung file edit while the owning future remains alive. A more precise startup-health mechanism can be handled separately. ## Validation - `just test -p codex-exec-server` (186 tests passed; benchmark smoke passed) - `just fmt` - `just fix -p codex-exec-server` - `git diff --check`	2026-05-29 11:40:44 -07:00
Owen Lin	fc9cf62efb	Add subagent lineage metadata for responsesapi (#24161 ) ## Why We recently added `forked_from_thread_id` which lets us trace where a thread's _context_ comes from, but we also want to understand subagent lineage (e.g. which parent thread spawned this subagent? what kind of subagent is it?) which is orthogonal. This PR adds `parent_thread_id` and `subagent_kind` to the `x-codex-turn-metadata` header sent to ResponsesAPI. ## What changed - Adds `parent_thread_id` and `subagent_kind` to core-owned `x-codex-turn-metadata`. - Restores persisted `SessionSource` and `ThreadSource` from resumed session metadata so cold-resumed subagent threads keep their lineage on later Responses API requests. - Centralizes parent-thread extraction on `SessionSource` / `SubAgentSource` and reuses it in the Responses client, analytics, agent control, and state parsing paths. - Extends reserved-key, git-enrichment, thread-spawn, and app-server v2 metadata coverage for the new lineage fields. ## Verification - Not run locally per request. - Added focused coverage in `core/src/turn_metadata_tests.rs` and `app-server/tests/suite/v2/client_metadata.rs`.	2026-05-29 11:28:12 -07:00
Eric Traut	62039e8d35	Use session wording in `/rename` confirmation (#25035 ) ## Why The TUI `/rename` confirmation should use the term "session" for consistency.	2026-05-29 11:09:40 -07:00
Eric Traut	36cd36626d	Add `/archive` slash command (#25027 ) ## Why TUI users can archive saved sessions from other surfaces, but there is no in-session command for archiving the active session. Since archiving the active session also exits the TUI, the command should ask for explicit confirmation instead of firing immediately. I'm also working on [a companion PR](https://github.com/openai/codex/pull/25021) that adds `codex archive` and `codex unarchive` top-level CLI commands. ## What changed - Adds a new `/archive` slash command described as `archive this session and exit`. - Shows a confirmation dialog with `No, don't archive` selected first and `Yes, archive and exit` as the explicit action. - On confirmation, calls the existing `thread/archive` app-server RPC for the active main session and exits after success. - Keeps `/archive` disabled while a task is running and unavailable in side conversations. ## Verification Added focused TUI coverage for the `/archive` confirmation flow, disabled-while-task-running behavior, and the `/ar` slash-command popup snapshot.	2026-05-29 11:07:19 -07:00
Eric Traut	1333f4a689	Align TUI permissions labels with app (#25017 ) ## Summary The desktop app now presents the on-request permissions mode as `Ask for approval` and the manual-review-backed mode as `Approve for me`. The TUI still exposed older/internal labels like `Default` and `Auto-review`, which made the same underlying settings look different across clients. This updates the TUI UX copy to match the app without changing the underlying default behavior. Fresh threads continue to use the existing on-request approval mode, now displayed as `Ask for approval`. The label changes cover `/permissions`, explicit profile permissions menus, status surfaces, config persistence history/error text, and the corresponding TUI snapshots. ### Before <img width="1181" height="119" alt="Screenshot 2026-05-28 at 10 19 47 PM" src="https://github.com/user-attachments/assets/0664846b-b6dd-4931-b4dd-d0af0d42058e" /> <img width="523" height="19" alt="Screenshot 2026-05-28 at 10 21 29 PM" src="https://github.com/user-attachments/assets/7899c33e-b35d-4684-8389-97e357803423" /> ### After <img width="1216" height="117" alt="Screenshot 2026-05-28 at 10 19 32 PM" src="https://github.com/user-attachments/assets/015aab43-ac97-411f-8031-75cdd887251b" /> <img width="567" height="18" alt="Screenshot 2026-05-28 at 10 20 24 PM" src="https://github.com/user-attachments/assets/28b6422c-b823-4298-b221-c83d46d09d66" />	2026-05-29 11:06:40 -07:00
iceweasel-oai	cb9178e8b3	Add Windows sandbox provisioning setup command (#24831 ) ## Why Some Windows users do not have local admin access, so they cannot complete the elevated portion of the Windows sandbox setup when Codex first needs it. This adds an alpha provisioning path that an admin or IT deployment script can run ahead of time for the Codex user. The intended managed-deployment shape is: ```powershell codex sandbox setup --elevated --user "$env:COMPUTERNAME\Alice" --codex-home "C:\Users\Alice\.codex" ``` `--elevated` is treated as the requested sandbox setup level, not as proof that the process is elevated. The Windows sandbox setup orchestration still checks that the caller is actually elevated before launching the helper without a UAC prompt. ## What changed - Added `codex sandbox setup --elevated` with explicit user selection via either `--current-user` or `--user ... --codex-home ...`. - Moved the CLI implementation into `cli/src/sandbox_setup.rs` instead of growing `cli/src/main.rs`. - Added a Windows sandbox `ProvisionOnly` helper mode that runs the elevation-required provisioning work without requiring a workspace cwd or runtime sandbox policy. - Reused the existing elevated helper path for creating/updating sandbox users, configuring firewall/WFP rules, and applying sandbox directory ACLs. - Persisted `windows.sandbox = "elevated"` into the target `CODEX_HOME` so the desktop app does not show the initial sandbox setup banner after pre-provisioning succeeds. ## Validation - `cargo fmt -p codex-windows-sandbox -p codex-core -p codex-cli` - `cargo test -p codex-cli sandbox_setup --target-dir target\sandbox-setup-check` - `cargo test -p codex-windows-sandbox payload_accepts_provision_only_mode --target-dir target\sandbox-setup-check` - `git diff --check` - Manual Windows alpha flow with a standard local user (`Mandi Lavida`): ran the new setup command from an admin shell, verified the target `.codex` contents, sandbox marker/secrets, ACLs, firewall rules, and desktop startup without the sandbox setup banner once experimental network proxy requirements were disabled. ## Notes This intentionally does not solve later elevated update coordination for IT-managed deployments. The setup command can still apply provisioning updates when run again, but a broader coordination/process story is out of scope for this alpha.	2026-05-29 11:01:44 -07:00
Won Park	10b0399034	Route extension image generation through the native image completion pipeline (#24972 ) ## Why The standalone `image_gen.imagegen` extension should behave like native image generation for artifact persistence and UI completion, while returning its save-location guidance as part of the tool result instead of injecting a developer message. ## What Changed - Added an image-generation completion hook for extension tools so core can persist generated images and emit the existing `ImageGeneration` lifecycle events. - Reused core image artifact persistence for extension output and removed extension-local save-path/file-writing logic. - Split shared image persistence from built-in finalization so native image generation keeps its existing developer-message instruction behavior. - Returned the generated image save-location instruction through the extension `FunctionCallOutput`, alongside the generated image input for model follow-up. - Preserved the existing image-generation event shape for current UI and replay compatibility. - Avoided cloning the full generated-image base64 payload when emitting the in-progress image item. - Removed dependencies no longer needed after moving persistence out of the extension crate. ## Fast Follow - Adjust the existing Extension API and add a general `TurnItem` finalization path for re-usability of code ## Validation - Ran `just fmt`. - Ran `just bazel-lock-update`. - Ran `just bazel-lock-check`. - Ran `just test -p codex-tools -p codex-extension-api -p codex-image-generation-extension`. - Ran `just test -p codex-core image_generation_publication_is_finalized_by_core`. - Ran `just test -p codex-core handle_output_item_done_records_image_save_history_message`. - Ran `just fix -p codex-tools -p codex-extension-api -p codex-core -p codex-image-generation-extension`.	2026-05-29 17:33:13 +00:00
Adam Perry @ OpenAI	3e666dd32a	[codex] Wait for MCP readiness in core integration tests (#24964 ) Ensures MCP-backed `codex-core` integration tests exercise initialized servers instead of racing server startup. I've been idly investigating a few flakes and the failure modes are much more confusing when a tool call fails because of a failed server start than when the failed server start causes the test to fail directly.	2026-05-29 10:22:27 -07:00
xl-openai	e29bbb5368	feat: Add focused diagnostics for MCP HTTP send failures (#25013 ) Adds failure-only logging for MCP streamable HTTP post_message calls and the underlying reqwest send path, capturing the MCP method/request id, endpoint shape, auth-header presence, timeout/connect classification, and sanitized error source chain without logging headers, bodies, tokens, or full URLs.	2026-05-29 10:09:33 -07:00
jif-oai	f4e9d2caac	Move config document helpers into their own module (#25110 ) ## Why `core/src/config/edit.rs` owns the config edit state machine, but it also carried the TOML document helper code inline as a nested module. Moving those helpers into their own file keeps the edit orchestration easier to scan without changing the config persistence behavior. ## What changed - Moved the existing `document_helpers` module from `core/src/config/edit.rs` into `core/src/config/edit/document_helpers.rs`. - Added `mod document_helpers;` so the existing `pub(super)` helper API remains available to the rest of `config::edit`. ## Testing Not run; this is a refactor-only module extraction with no intended behavior change.	2026-05-29 18:49:21 +02:00
sayan-oai	96f1347fa3	Show activity for standalone web search calls (#24693 ) ## Why Standalone `web.run` calls run in the extension, so they need normal web-search progress activity while a request is in flight and durable completed activity after a thread is reloaded. Follow-up to #23823; uses the extension turn-item emission path added in #24813. ## What changed - Emit standalone `web.run` start/completion items through the host turn-item emitter, preserving standard client delivery and rollout persistence. - Include useful completion detail for queries, image queries, and literal-URL `open`/`find` commands. - Render completed searches as `Searched the web` or `Searched the web for <detail>`, with snapshot coverage for the detail-free case. - Extend the app-server round-trip test to verify completed search activity is reconstructed by `thread/read` after a fresh-process reload. ## Testing - `just test -p codex-web-search-extension` - `just test -p codex-app-server -E "test(standalone_web_search_round_trips_encrypted_output)"`	2026-05-29 16:12:58 +00:00
Ahmed Ibrahim	5577a9e148	[codex] Add model tool mode selector (#25031 ) ## Why Some models need to select their code-execution behavior through model catalog metadata. Models without that metadata must continue to follow the existing `CodeMode` and `CodeModeOnly` feature flags, including when a newer server sends an enum value this client does not recognize. ## What changed - add optional `ModelInfo.tool_mode` metadata with `direct`, `code_mode`, and `code_mode_only` - treat omitted and unknown wire values as `None` - resolve `None` from the existing feature flags - carry the resolved `ToolMode` directly on `TurnContext`, outside `Config` - use the resolved value for turn creation, model switches, review turns, tool planning, and code execution ## Coverage - add protocol coverage for omitted, known, and unknown enum values - add focused coverage for flag fallback and explicit metadata overriding feature flags - add core integration coverage that fetches remote model metadata through `/v1/models` and verifies the outbound `/responses` tools for explicit `direct` and `code_mode_only` selectors ## Stack - followed by #25032	2026-05-29 09:05:05 -07:00
Abhinav	251b2412b2	Render multiline hook output in TUI (#24965 ) # Why Fixes #24529. Completed hook output in the TUI rendered each `HookOutputEntry` as one ratatui line, so explicit newlines inside hook output were not shown as separate transcript rows. That made multiline `SessionStart.additionalContext` hard to inspect even though the model-facing context path preserved the original text. # What - Split completed hook output entries on explicit newlines before rendering them in `codex-rs/tui/src/history_cell/hook_cell.rs`. - Keep the hook output prefix, such as `hook context:` or `warning:`, on the first physical line only. - Preserve explicit blank lines and render continuation lines with the hook body indent. - Add unit coverage for multiline context and warning output, plus a chatwidget snapshot regression for `SessionStart` history output. # Testing - `cargo nextest run -p codex-tui completed_hook_multiline hook_completed_before_reveal_renders_completed_without_running_flash` - `just argument-comment-lint -p codex-tui -- --ignore-rust-version --lib --tests`	2026-05-29 15:12:40 +00:00
jif-oai	b40ad0d84d	Remove stale rollout TODO tests (#25106 ) ## Summary Remove a stale `TODO(jif)` block of commented-out rollout listing tests that still referenced an older listing API. The current rollout listing behavior is covered by the active state DB and filesystem fallback tests, so keeping the dead commented tests just adds noise. ## Validation - `just fmt` - `just test -p codex-rollout`	2026-05-29 17:09:00 +02:00
jif-oai	27e256bc40	Handle goal usage limits from turn errors (#25095 ) ## Summary - handle goal usage-limit turn errors in the goal extension - exercise the extension path in the goal backend test ## Tests - just fmt - just test -p codex-goal-extension - just fix -p codex-goal-extension	2026-05-29 15:39:05 +02:00
jif-oai	1c55bb2702	[codex] Improve built-in tool schema docs (#24794 ) ## Summary - Clarify default, omission, and bounded behavior across built-in tool schemas, including unified exec, classic shell, Code Mode exec/wait, multi-agent, agent job, MCP resource, image, goal, plan, tool_search, and test-sync fields. - Convert update_plan status to an enum and add short field descriptions where the schema previously relied on surrounding context. - Remove the dedicated permission-approval schema test and keep only updates to existing expected-spec tests. ## Validation - Ran `just fmt`. - Ran `git diff --check`. - Did not run clippy or tests, per request. Regression has been eval [here](https://openai.slack.com/archives/C09GDSP1J9X/p1779905065496949) and we proved there are no regressions	2026-05-29 13:32:19 +02:00
jif-oai	3deda3116c	fix: main (#25075 )	2026-05-29 12:53:31 +02:00
jif-oai	191c39aa75	Drop debug-client prompt state tracking (#25070 ) Deletes `codex-rs/debug-client/src/state.rs` as one step in removing the stale app-server debug client. This intentionally leaves Cargo workspace and lockfile cleanup for a later follow-up PR.	2026-05-29 12:51:23 +02:00
jif-oai	43fa4e5d25	Remove debug-client server event reader (#25069 ) Deletes `codex-rs/debug-client/src/reader.rs` as one step in removing the stale app-server debug client. This intentionally leaves Cargo workspace and lockfile cleanup for a later follow-up PR.	2026-05-29 12:51:19 +02:00
jif-oai	5c1387846d	Delete debug-client JSONL output helper (#25068 ) Deletes `codex-rs/debug-client/src/output.rs` as one step in removing the stale app-server debug client. This intentionally leaves Cargo workspace and lockfile cleanup for a later follow-up PR.	2026-05-29 12:51:16 +02:00
jif-oai	e2b8ec616a	Remove the debug-client CLI entrypoint (#25067 ) Deletes `codex-rs/debug-client/src/main.rs` as one step in removing the stale app-server debug client. This intentionally leaves Cargo workspace and lockfile cleanup for a later follow-up PR.	2026-05-29 12:51:12 +02:00
jif-oai	3d3cc5a953	Retire debug-client interactive command parsing (#25066 ) Deletes `codex-rs/debug-client/src/commands.rs` as one step in removing the stale app-server debug client. This intentionally leaves Cargo workspace and lockfile cleanup for a later follow-up PR.	2026-05-29 12:51:09 +02:00
jif-oai	1197c7d654	Delete debug-client app-server process plumbing (#25065 ) Deletes `codex-rs/debug-client/src/client.rs` as one step in removing the stale app-server debug client. This intentionally leaves Cargo workspace and lockfile cleanup for a later follow-up PR.	2026-05-29 12:51:05 +02:00
jif-oai	a9a92cbb0a	Remove the generated debug-client README (#25064 ) Deletes `codex-rs/debug-client/README.md` as one step in removing the stale app-server debug client. This intentionally leaves Cargo workspace and lockfile cleanup for a later follow-up PR.	2026-05-29 12:51:01 +02:00
jif-oai	fc8c723553	Drop the stale debug-client manifest (#25063 ) Deletes `codex-rs/debug-client/Cargo.toml` as one step in removing the stale app-server debug client. This intentionally leaves Cargo workspace and lockfile cleanup for a later follow-up PR.	2026-05-29 12:50:58 +02:00
jif-oai	8f6a945ec9	Use inject_if_running for active goal steering (#24924 ) ## Why This PR is stacked on #24918, which moves goal steering onto source-labeled internal model context fragments. Active-turn goal steering should use the same running-turn injection path as other runtime steering, so those fragments enter the pending input queue as `ResponseItem`s through the existing [`Session::inject_if_running`](`8d6f6cdf69/codex-rs/core/src/session/inject.rs (L12-L27)`) behavior instead of through a goal-specific conversion wrapper. ## What Changed - Exposes a narrow `CodexThread::inject_if_running` bridge for callers that only hold a thread handle. - Changes `ext/goal` active-turn steering to pass `ResponseItem`s directly. - Builds goal steering prompts as contextual internal model context `ResponseItem`s before injecting them into the running turn. ## Testing Not run locally; PR metadata update only.	2026-05-29 11:24:39 +02:00
jif-oai	740d942f90	Use internal model context fragments for goal steering (#24918 ) ## Why Goal steering is one form of runtime-owned model context, but the old `<goal_context>` wrapper made the contextual-fragment hiding path goal-specific. Using a source-labeled internal context fragment gives core and extensions a shared shape for hidden model steering while keeping those prompts out of visible turn history. The change also keeps legacy `<goal_context>` messages recognized as hidden contextual input so existing stored history does not start rendering old goal-steering prompts as user-visible turn items. ## What Changed - Replaces `GoalContext` with `InternalModelContextFragment` plus a validated `InternalContextSource`. - Renders goal steering as `<codex_internal_context source="goal">...</codex_internal_context>`. - Updates core goal steering and `ext/goal` steering to inject the new internal-context fragment. - Updates contextual-fragment, event-mapping, goal, and session tests for the new wrapper. ## Test Coverage - Adds coverage for detecting the new internal model context fragment. - Preserves coverage for hiding legacy `<goal_context>` fragments. - Verifies invalid internal context sources are rejected and arbitrary context tags are not hidden. - Updates goal steering/session assertions to expect the new `source="goal"` wrapper.	2026-05-29 10:28:25 +02:00
Eric Traut	522f549922	Fix fs/watch debounce batching (#24716 ) ## Summary `fs/watch` was using a local debounce wrapper whose deadline was initialized once and then reused after the first batch. Once that stale deadline was in the past, later file changes could bypass the intended 200ms debounce and send noisier `fs/changed` notifications. This moves the debounce wrapper into `codex-file-watcher` as `DebouncedWatchReceiver`, resets the debounce deadline for each event batch, preserves pending paths across cancelled receives, and updates app-server `fs/watch` to use the shared wrapper. Fixes #24692.	2026-05-28 23:09:55 -07:00
Michael Bolin	6e10142199	fix: preserve deny-read sandboxing for safe commands (#23943 ) ## Why Permission profiles can mark filesystem entries as unreadable with `deny` rules, including glob patterns. Several shell execution paths treated known-safe commands or execpolicy `allow` rules as sufficient to run outside the filesystem sandbox. That is not valid for read-capable commands: for example, `cat` or `ls` may be reasonable to allow generally, but dropping the sandbox would also drop deny-read constraints such as `*/.env`. ## What changed - Added a shared check that treats active deny-read restrictions as incompatible with unsandboxed execution. - Kept first-attempt execution sandboxed for explicit escalation and execpolicy allow bypasses when deny-read entries are present. - Prevented no-sandbox retry after a sandbox denial when the active filesystem policy contains deny-read entries. - Updated the zsh-fork execve path so prefix-rule `allow` decisions continue inside the current sandbox when deny-read restrictions are active. ## Verification - `cargo test -p codex-core tools::sandboxing::tests` - `cargo test -p codex-core tools::runtimes::shell::unix_escalation::tests` - `cargo test -p codex-core shell_command_enforces_glob_deny_read_policy`	2026-05-28 22:49:37 -07:00
Eric Traut	56958f2512	Seed prompt history from resumed messages (#24298 ) ## Why When the TUI resumes a thread, transcript replay renders prior user messages but did not seed the composer history. That leaves the resumed session with empty in-memory prompt history, so pressing Up can fall through to persisted global history and surface a prompt from another thread. The expected behavior is that prompts from the resumed thread are recalled first, with global history only as a fallback. ## What changed - Record replayed user messages into the composer history during resume replay. - Preserve the existing persisted history format and avoid any startup history scan. - Add focused TUI coverage showing replayed prompts are recalled before persisted global history. ## Validation - Added `replayed_user_messages_seed_composer_history` in `codex-rs/tui/src/chatwidget/tests/history_replay.rs`. - `just test -p codex-tui replayed_user_messages_seed_composer_history` passed.	2026-05-28 22:08:05 -07:00
xl-openai	f0a839ea0c	Add runtime extra skill roots API (#24977 ) ## Summary - Add v2 `skills/extraRoots/set` to replace app-server process-local standalone skill roots. The setting is not persisted, accepts missing roots, and `extraRoots: []` clears the runtime set. - Wire runtime roots into core skill discovery for `skills/list` and turn loads, clear skill caches on set, and register the roots with the skills watcher so later filesystem changes emit `skills/changed`. - Update app-server docs, generated JSON/TypeScript schemas, and coverage for serialization, missing roots, empty clears, and restart behavior. ## Testing - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core-skills` - `cargo test -p codex-app-server skills_extra_roots_set_updates_process_runtime_roots` - `just fix -p codex-app-server-protocol` - `just fix -p codex-core-skills` - `just fix -p codex-app-server`	2026-05-28 21:14:34 -07:00
Adrian	42c80385cd	[codex] Avoid PowerShell safety parsing off Windows (#24946 ) ## Summary This fixes BUGB-17567 by preventing non-Windows command safety classification from invoking the Windows PowerShell safelist/parser path. Previously, `is_known_safe_command` called the Windows PowerShell classifier on every platform. That classifier recognizes `pwsh`/`powershell` by basename and delegates script parsing to the PowerShell AST parser. The parser starts the supplied executable, so on macOS/Linux a repository-controlled `pwsh` path could execute during safety parsing before the normal sandboxed command execution path. The change gates the Windows PowerShell classifier and module behind `#[cfg(windows)]`. On macOS/Linux, PowerShell-looking commands are no longer auto-approved by the Windows classifier and instead fall through to the normal non-Windows safe-command logic. ## Validation - `/private/tmp/codex-tools/bin/just fmt` - `PATH=/private/tmp/codex-tools/bin:$PATH /private/tmp/codex-tools/bin/just test -p codex-shell-command` The focused test run passed 135 tests with 0 skipped and completed the crate bench-smoke step. ## Notes This PR is scoped to the BUGB-17567 macOS/Linux path. Windows still uses the PowerShell classifier; a separate hardening follow-up should ensure Windows safety parsing only executes a trusted PowerShell parser binary and does not spawn the command's `argv[0]` when that path may be repository-controlled.	2026-05-29 03:00:35 +00:00
viyatb-oai	bf72be5927	fix(config): use deny for Unix socket permissions (#24970 ) ## Why Unix socket permissions still accepted and displayed `"none"` while file permissions use the clearer `"deny"` spelling. This keeps network Unix socket policy vocabulary consistent with filesystem policy vocabulary. ## What changed - Replace the Unix socket permission variant and serialized spelling from `none` to `deny` across config, feature configuration, and network proxy types. - Update app-server v2 serialization, TUI debug output, focused tests, and generated schemas to expose `"deny"`. - Add coverage for denied Unix socket entries in managed requirements and profile overlay behavior. ## Security This is a vocabulary change for explicit Unix socket rejection, not a network access expansion. Denied entries continue to be omitted from the effective allowlist. ## Validation - `just fmt` - `just write-config-schema` - `just write-app-server-schema` - `just test -p codex-config -p codex-core -p codex-app-server-protocol -p codex-tui -E 'test(network_requirements_are_preserved_as_constraints_with_source) \| test(network_permission_containers_project_allowed_and_denied_entries) \| test(network_toml_overlays_unix_socket_permissions_by_path) \| test(permissions_profiles_resolve_extends_parent_first_with_child_overrides) \| test(network_requirements_serializes_canonical_and_legacy_fields) \| test(debug_config_output_formats_unix_socket_permissions)'`\n- Automatic `bench-smoke` follow-up from `just test`\n- `cargo clippy -p codex-config -p codex-core -p codex-features -p codex-network-proxy -p codex-app-server-protocol -p codex-app-server -p codex-tui --all-targets -- -D warnings`	2026-05-28 23:53:26 +00:00
Anton Panasenko	912d7d4f75	feat(app-server): migrate remote control to server tokens (#24141 ) ## Why `codex-backend` now authenticates remote-control server websocket connections with short-lived server tokens instead of the user's ChatGPT access token. `app-server` needs to mint and refresh those server tokens without persisting them, so a restart can reconnect from durable enrollment identity while keeping the bearer token memory-only. ## What Changed Updated the remote-control transport to consume `remote_control_token` and `expires_at` from server enroll responses and added `/server/refresh` support for persisted enrollments or expiring cached tokens. Websocket handshakes now send `Authorization: Bearer <remote_control_token>` with the existing server identity headers, and no longer send the ChatGPT bearer token or `chatgpt-account-id` on that websocket path. The in-memory enrollment state now owns the ephemeral server token cache, while SQLite still persists only `server_id`, `environment_id`, and `server_name`. Websocket `401`/`403` clears only the cached token for refresh on reconnect; websocket or refresh `404` clears stale persisted enrollment and re-enrolls. Response body previews redact `remote_control_token` before surfacing parse errors. ## Verification - `just test -p codex-app-server-transport` - Manual prod smoke with an isolated `CODEX_HOME`: `codex remote-control --json -c 'chatgpt_base_url="https://chatgpt.com/backend-api"'` reached `status:"connected"` with `environmentId:"env_i_6a17d9f1d764832986da2e80f4554f1b"`.	2026-05-28 15:57:08 -07:00
Abhinav	a576be2b73	Tighten hook output event schemas (#24962 ) # Why Fixes #23993. Hook command output schemas are published as the contract for hook authors and schema-driven tooling. The event-specific output schemas previously described `hookSpecificOutput.hookEventName` as the global `HookEventNameWire` enum, so a `pre-tool-use.command.output` schema would validate mismatched values like `PostToolUse`. That made the schemas less precise than the intended event-specific contract. # What Constrain each hook-specific output schema to the matching literal `hookEventName` value, mirroring the existing input-schema shape. Also split `SubagentStartHookSpecificOutputWire` from the session-start output wire so `subagent-start.command.output.schema.json` can emit `const: "SubagentStart"` instead of sharing the session-start definition. # Verification - `cargo nextest run -p codex-hooks` - `just fix -p codex-hooks` - `just argument-comment-lint -p codex-hooks -- --all-targets`	2026-05-28 15:55:40 -07:00
Michael Bolin	bcf2b55957	windows-sandbox: fix capture cancellation test roots (#24974 ) ## Why The Windows Bazel job on `main` started failing after #24108 because one Windows-only capture test still passed `cwd.as_path()` to `run_windows_sandbox_capture`. That helper now expects the explicit `workspace_roots` slice introduced by #24108, so the Windows test target no longer compiled. ## What Changed - Updates `legacy_capture_cancellation_is_not_reported_as_timeout` to pass `workspace_roots_for(cwd.as_path()).as_slice()`, matching the adjacent capture test and the new runner signature. ## Verification - GitHub Actions CI is the important validation for this Windows-only compile path. - Created quickly to get Windows CI running while the separate Ubuntu `compact_resume_fork` timeout is still under investigation.	2026-05-28 15:51:27 -07:00
Michael Bolin	986c60467b	windows-sandbox: pass workspace roots to runner (#24108 ) ## Why #23813 switches the Windows sandbox runner path to `PermissionProfile`, but it still left one runtime anchor for resolving symbolic `:workspace_roots` entries. That is not enough once a turn has multiple effective workspace roots: exact entries and deny globs under `:workspace_roots` need to be materialized for every runtime root before the command runner chooses token mode or builds ACL plans. ## What Changed - Replaces the Windows runner/setup `permission_profile_cwd` plumbing with `workspace_roots: Vec<AbsolutePathBuf>`. - Resolves Windows-local `PermissionProfile` data with `materialize_project_roots_with_workspace_roots(...)` instead of the single-cwd helper. - Threads `Config::effective_workspace_roots()` through core execution, unified exec, TUI setup/read-grant flows, app-server setup, app-server `command/exec`, and `debug sandbox` on Windows. - Preserves those workspace roots through the zsh-fork escalation executor instead of rebuilding them from `sandbox_policy_cwd`. - Makes `ExecRequest::new(...)` and the remaining `build_exec_request(...)` helper path take `windows_sandbox_workspace_roots` explicitly so new call sites cannot silently fall back to `vec![cwd]`. - Clarifies the `debug sandbox` non-Windows comment: remaining cwd-dependent resolution still uses `sandbox_policy_cwd`, while `:workspace_roots` entries are already materialized from config roots. - Updates elevated runner IPC `SpawnRequest` to send `workspace_roots` and bumps the framed IPC protocol version to `3` for the payload shape change. - Adds Windows-local resolver coverage for expanding exact and glob `:workspace_roots` entries across multiple roots, plus core helper coverage proving explicit roots are preserved. ## Verification - `cargo check -p codex-windows-sandbox -p codex-core -p codex-tui -p codex-cli -p codex-app-server` - `cargo test -p codex-windows-sandbox` - `cargo test -p codex-core windows_sandbox` - `cargo test -p codex-core unix_escalation` - `cargo test -p codex-app-server windows_sandbox` - `cargo test -p codex-tui windows_sandbox` - `cargo test -p codex-cli debug_sandbox` - `just test -p codex-core unified_exec` - `just test -p codex-core build_exec_request_preserves_windows_workspace_roots` - `env -u CODEX_NETWORK_PROXY_ACTIVE -u CODEX_NETWORK_ALLOW_LOCAL_BINDING just test -p codex-app-server --lib command_exec` - `just test -p codex-windows-sandbox` - `just test -p codex-exec sandbox` - `just fix -p codex-core -p codex-app-server -p codex-windows-sandbox` A local macOS cross-check with `cargo check --target x86_64-pc-windows-msvc ...` did not reach crate Rust code because native dependencies require Windows SDK headers (`windows.h` / `assert.h`) in this environment; Windows CI remains the real target validation. Two local targeted filters compile but do not run assertions on macOS: `env -u CODEX_NETWORK_PROXY_ACTIVE -u CODEX_NETWORK_ALLOW_LOCAL_BINDING just test -p codex-app-server --lib command_exec_processor` matched zero tests, and `just test -p codex-linux-sandbox landlock` matched zero tests because the landlock suite is Linux-only.	2026-05-28 15:26:55 -07:00
Michael Bolin	e7dda8070e	Surface filesystem permission profiles in prompt context (#23924 ) ## Summary Some permission profiles can encode filesystem reads that should remain unavailable to the agent. Before this change, the model-visible context and automatic approval review prompt summarized the effective permissions as a legacy sandbox mode, which can omit permission-profile filesystem entries from escalation decisions. For example, a profile can grant workspace access while denying a private subtree across every workspace root: ```toml default_permissions = "restricted-workspace" [permissions.restricted-workspace.workspace_roots] "/Users/alice/project" = true "/Users/alice/other-project" = true [permissions.restricted-workspace.filesystem] ":minimal" = "read" [permissions.restricted-workspace.filesystem.":workspace_roots"] "." = "write" "private" = "deny" "private/" = "deny" ``` The context window now describes the workspace roots and effective filesystem side of the `PermissionProfile` directly, with deny entries marked as non-escalatable: ```xml <environment_context> <cwd>/Users/alice/project</cwd> <shell>zsh</shell> <filesystem><workspace_roots><root>/Users/alice/project</root><root>/Users/alice/other-project</root></workspace_roots><permission_profile type="managed"><file_system type="restricted"><entry access="read"><special>:minimal</special></entry><entry access="write"><path>/Users/alice/project</path></entry><entry access="write"><path>/Users/alice/other-project</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/project/private</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/other-project/private</path></entry><entry access="deny" escalatable="false"><glob>/Users/alice/project/private/</glob></entry><entry access="deny" escalatable="false"><glob>/Users/alice/other-project/private/</glob></entry></file_system></permission_profile></filesystem> </environment_context> ``` Managed requirements can impose the same kind of deny-read restriction: ```toml [permissions.filesystem] deny_read = [ "/Users/alice/project/private", "/Users/alice/project/private/", ] ``` The automatic approval review prompt also receives the parent turn's denied-read context, so review decisions can account for the active permission profile. ## What Changed - Render the effective filesystem profile in `<environment_context>`, including profile type, filesystem entries, workspace roots, and non-escalatable deny entries. - Persist effective `workspace_roots` in `TurnContextItem` so resumed/replayed context does not have to bind `:workspace_roots` through legacy `cwd` fallback. - Add explicit permission instructions that denied reads are policy restrictions, not escalation targets. - Pass the parent turn's denied-read context into automatic approval reviews. - Add targeted coverage for prompt rendering, workspace-root materialization, replay context, and review prompt context. - Keep the prompt-context test expectations platform-aware so the same filesystem rendering assertions pass on Unix and Windows paths. ## Testing - `just test -p codex-core context::environment_context::tests::serialize_environment_context_with_full_filesystem_profile` - `just test -p codex-core context::environment_context::tests::turn_context_item_filesystem_uses_workspace_roots_instead_of_cwd` - `just test -p codex-core context::permissions_instructions::permissions_instructions_tests::builds_permissions_from_profile_with_denied_reads` - `just fix -p codex-core` I also attempted `just test -p codex-core`; the changed prompt-context tests passed, but the full local run did not complete cleanly in this sandboxed macOS environment due unrelated user-shell `CODEX_SANDBOX*` expectations and integration-test timeouts.	2026-05-28 14:56:53 -07:00
Alexi Christakis	e92c952b2e	[codex] Add user input client ids (#24653 ) ## Summary Adds an optional `clientId` field to app-server v2 `UserInput` and carries it through the core `UserInput` model so clients can correlate echoed user input items without relying on payload equality. ## Details - Adds `client_id: Option<String>` to core `UserInput` variants. - Exposes the v2 app-server field as `clientId` on the wire and in generated TypeScript. - Preserves the id when converting between app-server v2 and core protocol types. - Regenerates app-server schema fixtures. ## Validation - `just fmt` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-protocol` - `git diff --check`	2026-05-28 14:54:39 -07:00
viyatb-oai	a027135bc6	fix(exec-server): reject websocket requests with Origin headers (#24947 ) ## Why `codex exec-server` has a local WebSocket listener, but it did not apply the same browser-origin request handling as the `app-server` WebSocket transport. Requests that carry an `Origin` header should not be upgraded by this local transport, keeping both local WebSocket servers consistent and avoiding unexpected browser-initiated connections. ## What changed - Added an Axum middleware guard in `codex-rs/exec-server/src/server/transport.rs` that returns `403 Forbidden` for requests carrying an `Origin` header. - Added an integration test in `codex-rs/exec-server/tests/websocket.rs` that covers rejection of an `Origin`-bearing WebSocket handshake. - Kept ordinary WebSocket clients unchanged: existing no-`Origin` initialization and process behavior remains covered by the crate tests. ## Validation - `just test -p codex-exec-server` test phase (`186 passed`; run outside the parent macOS sandbox so nested sandbox tests can execute) - `just clippy -p codex-exec-server`	2026-05-28 14:44:14 -07:00
viyatb-oai	3cf737e4e3	fix: cancel Windows sandbox on network denial (#19880 ) ## Why When Guardian or the sandbox network proxy detects and denies a network attempt, core cancels the associated execution through `ExecExpiration`. The Windows sandbox capture path was only forwarding the timeout component of that expiration state. As a result, a sandboxed Windows command whose network attempt had already been denied could keep running until its timeout elapsed rather than terminating promptly in response to the denial. This change closes that cancellation-propagation gap for Windows sandbox execution. ## What changed - Added `WindowsSandboxCancellationToken` as the cancellation hook exposed to Windows capture backends. - Extracted the cancellation token from `ExecExpiration` in core and passed it to both the direct and elevated Windows sandbox capture paths alongside the existing timeout. - Updated direct capture to poll for either process exit, timeout, or cancellation and to terminate cancelled processes without reporting them as timed out. - Updated elevated capture to watch for cancellation and send the existing `Terminate` IPC frame to the elevated runner. The watcher parks for 50 ms between checks to bound response latency without a tight busy wait. - Added Windows regression coverage for a long-running PowerShell command: cancellation ends capture before its timeout and does not set `timed_out`. - Added a visible skip diagnostic when that PowerShell-dependent regression test cannot execute, and consolidated the duplicated expiration-policy branch identified in review. ## Security This improves enforcement after a denied network attempt has been attributed to a Windows sandboxed execution: the command no longer remains alive simply because Windows capture lost the cancellation signal. This PR does not claim to make Windows offline mode an airtight no-network or no-exfiltration boundary. It does not introduce AppContainer or change how network denial is detected; it makes an already-detected denial promptly stop the affected sandboxed command. ## Validation ### Commands run - `just fmt` - `cargo test -p codex-windows-sandbox` - `cargo test -p codex-core network_denial` - `cargo clippy -p codex-core -p codex-windows-sandbox --tests --no-deps -- -D warnings` - `just argument-comment-lint -p codex-windows-sandbox -p codex-core` The new capture regression is `cfg(target_os = "windows")`, so Windows CI is the execution coverage for that test path. The local macOS test runs validate the host-runnable crate and core network-denial behavior. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-28 21:28:06 +00:00
Michael Bolin	bc10e5b390	runtime: prepend zsh fork bin dir to PATH (#23768 ) ## Why #23756 makes packaged Codex builds include and default to the bundled zsh fork. The important reason to put that fork's directory at the front of `PATH` is to keep executable-level escalation working after a command leaves the original shell and later re-enters zsh through `env`. The expected chain is: 1. The zsh fork runs the top-level shell command. 2. That command launches another program, such as `python3`, while inheriting the `EXEC_WRAPPER` environment and the escalation socket fd. 3. That program spawns a shell script whose shebang is `#!/usr/bin/env zsh` rather than `#!/bin/zsh`, and it does not close the escalation fd. 4. `/usr/bin/env` resolves `zsh` through `PATH`, so it must find the packaged zsh fork before the system zsh. 5. Commands inside that nested script are intercepted by the zsh fork and can still request escalation from Codex. If `PATH` resolves `zsh` to the system shell instead, the nested script loses zsh-fork exec interception. Commands that should request escalation can then run only in the original sandbox, or fail there, without Codex ever receiving the approval request. Shell snapshots make this slightly more subtle: a snapshot can restore an older `PATH` after the child shell starts. This PR treats the zsh fork `PATH` prepend as an explicit environment override so snapshot wrapping preserves it. ## What Changed - Added shared zsh-fork runtime helpers that prepend the configured zsh executable parent directory to `PATH` without duplicate entries. - Applied the zsh fork `PATH` prepend to both zsh-fork `shell_command` launches and unified-exec zsh-fork launches before sandbox command construction. - Kept the shell-command zsh-fork backend API narrow: it derives the configured zsh path from session services and rebuilds its sandbox environment from `req.env`, rather than accepting a second, competing environment map or a separately threaded bin dir. - Kept Unix-only zsh-fork `PATH` mutation out of Windows clippy-visible mutability. - Added coverage for duplicate `PATH` entries, for preserving the zsh fork prepend through shell snapshot wrapping, and for the nested `python3` -> `#!/usr/bin/env zsh` escalation flow. ## Testing - `just fmt` - `just fix -p codex-core` I left final test validation to CI after the latest review-comment cleanup. Before that cleanup, `just test -p codex-core zsh_fork` passed locally for the zsh-fork-focused tests.	2026-05-28 14:10:40 -07:00
Celia Chen	0a8c835845	[codex] Remove Bedrock OSS models from catalog (#24960 ) Remove the GPT OSS 120B and 20B entries from the Amazon Bedrock static model catalog, as they are no longer supported.	2026-05-28 14:10:26 -07:00
iceweasel-oai	d9f53128b7	[codex] Handle PowerShell UTF-8 setup failures (#24949 ) Fixes #12496. ## Why Windows sandboxed PowerShell commands can run under `ConstrainedLanguage` on some machines, especially enterprise-managed Windows environments. In that mode, our PowerShell command prelude could fail before every command because it directly assigned `[Console]::OutputEncoding` to UTF-8. The actual user command still ran, but Codex surfaced noisy `Cannot set property. Property setting is supported only on core types in this language mode.` output for every shell call. ## What Changed - Makes the PowerShell UTF-8 output encoding prelude best-effort by wrapping the assignment in `try { ... } catch {}`. - Keeps the existing UTF-8 behavior when PowerShell allows the assignment. - Adds focused tests for adding the prelude and avoiding duplicate prelude insertion. ## Validation - `cargo fmt -p codex-shell-command` - `cargo check -p codex-shell-command` - `git diff --check` - Verified a local `ConstrainedLanguage` PowerShell probe prints only the command output with no property-setting error. - Verified `codex exec` from a temporary `chcp 437` context reports `utf-8` / `65001` and preserves non-ASCII output (`café`, `漢字`).	2026-05-28 13:58:20 -07:00
Felipe Coury	2e0c4f4977	fix(tui): prevent repository-configured code execution in /diff (#24954 ) ## Why `/diff` is intended to display working-tree changes, but its Git invocations honored repository-selected executable helpers. A repository could configure diff/text conversion helpers, clean/process filters, `core.fsmonitor`, or `post-index-change` hooks that execute when a user runs `/diff`. Fixes [PSEC-4395](https://linear.app/openai/issue/PSEC-4395/codex-cli-diff-executes-repository-selected-diff-helpers). ## What Changed - Pass `--no-textconv` and `--no-ext-diff` for tracked and untracked diff generation. - Discover configured `filter.<driver>.clean` and `.process` entries, then neutralize the selected drivers through structured `GIT_CONFIG_KEY_` / `GIT_CONFIG_VALUE_` overrides, including driver names containing `=`. - Run all `/diff` Git probes with `core.fsmonitor=false` and a null `core.hooksPath`. - Use short submodule reporting while ignoring dirty submodule worktrees, since inspecting a checked-out submodule for dirtiness can execute filters from that child repository. This intentionally omits dirty-only submodule markers in order to preserve the non-executing security boundary. - Add real-Git marker tests covering filters, fsmonitor, hooks, and configured helpers inside checked-out submodules. ## How to Test 1. In a repository with ordinary tracked and untracked edits, run `/diff`. 2. Confirm the normal working-tree diff is shown for top-level files. 3. Run the targeted tests below; they configure executable marker helpers for repository filters, fsmonitor, hooks, and a checked-out submodule, then verify `/diff` does not invoke them. 4. Confirm a dirty-only submodule does not cause Codex to enter the submodule and execute its configured helper. Targeted tests: - `just test -p codex-tui get_git_diff_` Validation note: `just test -p codex-tui` runs the new coverage, but this worktree currently also has two unrelated failing guardian tests: `app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default` and `app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`.	2026-05-28 16:53:59 -03:00
Adam Perry @ OpenAI	b90ec46387	Add `codex app-server --stdio` alias (#24940 ) ## Summary - Add `--stdio` as a direct alias for `codex app-server --listen stdio://`. - Keep `--stdio` and `--listen` mutually exclusive. - Update the app-server README to document both forms.	2026-05-28 12:43:30 -07:00
Won Park	ecb41fcb64	Add feature-gated standalone image generation extension (#24723 ) ## Why Add a standalone image generation path that can be exercised independently of hosted Responses image generation, while retaining the hosted tool as fallback unless the extension is actually available to the model. ## What changed - Added the `codex-image-generation-extension` crate with standalone generate/edit execution, prior-image selection for edits, model-visible image output, and local generated-image persistence. - Installed the extension in app-server behind the disabled-by-default `imagegenext` feature and backend eligibility checks. - Updated core tool planning so eligible `image_gen.imagegen` exposure replaces hosted `image_generation`, while unavailable configurations retain hosted fallback. - Added coverage for extension behavior, edit history reuse, feature gating, auth eligibility, and hosted-tool replacement. - The extension is installed through app-server only in this PR; other execution paths retain hosted image generation because hosted replacement occurs only when the standalone executor is actually registered and model-visible. - The initial extension contract intentionally fixes the image model to `gpt-image-2` and uses automatic image parameters. - Native generated-image history/card parity and rollout persistence cleanup are intentionally deferred follow-up work. ## Validation - `just test -p codex-image-generation-extension` - `just test -p codex-features` - `just test -p codex-core hosted_tools_follow_provider_auth_model_and_config_gates` - `just test -p codex-app-server` - `just fix -p codex-image-generation-extension -p codex-features -p codex-core -p codex-app-server` - `just fmt` - `just bazel-lock-update` - `just bazel-lock-check` --------- Co-authored-by: jif-oai <jif@openai.com>	2026-05-28 11:44:55 -07:00

1 2 3 4 5 ...

6083 Commits