codex

mirror of https://github.com/openai/codex.git synced 2026-05-16 17:23:57 +00:00

Author	SHA1	Message	Date
jif-oai	5d30764fe9	Run compact hooks for remote compaction v2 (#22828 ) ## Why Remote compaction v2 is the `/responses` implementation of session-history compaction, but it still needs to preserve the observable contract of the legacy `/responses/compact` path. In particular, users and integrations that rely on `PreCompact` and `PostCompact` hooks should not see different behavior when `remote_compaction_v2` is enabled. ## What Changed - Runs `PreCompact` before issuing the remote compaction v2 request, including `Interrupted` analytics when a pre-hook stops execution. - Runs `PostCompact` after a successful v2 compaction and aborts the turn if the post-hook stops execution. - Adds `compact_remote_parity` coverage that compares legacy and v2 compaction across manual transcript shapes, automatic pre-turn compaction, automatic mid-turn compaction, hook payloads, replacement history, follow-up request payloads, and API-key `service_tier=fast` behavior. - Registers the new parity suite under `core/tests/suite`. Relevant code: - [`compact_remote_v2.rs`](`af63745cb5/codex-rs/core/src/compact_remote_v2.rs`) - [`compact_remote_parity.rs`](`af63745cb5/codex-rs/core/tests/suite/compact_remote_parity.rs`) ## Verification - Added `core/tests/suite/compact_remote_parity.rs` to assert parity between legacy remote compaction and remote compaction v2 for the affected request, hook, rollout-history, and follow-up paths. - Existing `compact_remote_v2` unit coverage still exercises v2 replacement-history retention and compaction-output collection.	2026-05-15 15:26:21 +02:00
jif-oai	0322ac3df8	[codex] Use compaction_trigger item for remote compaction v2 (#22809 ) ## Why Remote compaction v2 was still using `context_compaction` as both the request trigger and the compacted output shape. The Responses API now has the landed contract for this flow: Codex sends a dedicated `{ "type": "compaction_trigger" }` input item, and the backend returns the standard `compaction` output item with encrypted content. This aligns the v2 path with that wire contract while preserving the existing local compacted-history post-processing behavior. ## What changed - Add `ResponseItem::CompactionTrigger` and regenerate the app-server protocol schema fixtures. - Send `compaction_trigger` from `remote_compaction_v2` instead of a payload-less `context_compaction`. - Collect exactly one backend `compaction` output item, then reuse the existing compacted-history rebuilding path. - Treat the trigger item as a transient request marker rather than model output or persisted rollout/memory content. ## Verification - `cargo test -p codex-protocol compaction_trigger` - `cargo test -p codex-core remote_compact_v2` - `cargo test -p codex-core compact_remote_v2` - `cargo test -p codex-core responses_websocket_sends_response_processed_after_remote_compaction_v2` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol schema_fixtures`	2026-05-15 11:40:35 +02:00
Michael Bolin	8a5306ff88	app-server: use permission ids and runtime workspace roots (#22611 ) ## Why This PR builds on [#22610](https://github.com/openai/codex/pull/22610) and is the app-server side of the migration from mutable per-turn `SandboxPolicy` replacement toward selecting immutable permission profiles by id plus mutable runtime workspace roots. Once permission profiles can carry their own immutable `workspace_roots`, app-server no longer needs to mutate the selected `PermissionProfile` just to represent thread-specific filesystem context. The mutable part now lives on the thread as explicit `runtimeWorkspaceRoots`, while `:workspace_roots` remains symbolic until the sandbox is realized for a turn. ## What Changed - Replaced the v2 permission-selection wrapper surface with plain profile ids for `thread/start`, `thread/resume`, `thread/fork`, and `turn/start`. - Removed the API surface for profile modifications (`PermissionProfileSelectionParams`, `PermissionProfileModificationParams`, `ActivePermissionProfileModification`). - Added experimental `runtimeWorkspaceRoots` fields to the thread lifecycle and turn-start APIs. - Threaded runtime workspace roots through core session/thread snapshots, turn overrides, app-server request handling, and command execution permission resolution. - Kept session permission state symbolic so later runtime root updates and cwd-only implicit-root retargeting rebind `:workspace_roots` correctly. - Updated the embedded clients just enough to send and restore the new thread state. - Refreshed the generated schema/TypeScript artifacts and the app-server README to match the new contract. ## Verification Targeted coverage for this layer lives in: - `codex-rs/app-server-protocol/src/protocol/v2/tests.rs` - `codex-rs/app-server/tests/suite/v2/thread_start.rs` - `codex-rs/app-server/tests/suite/v2/thread_resume.rs` - `codex-rs/app-server/tests/suite/v2/turn_start.rs` - `codex-rs/core/src/session/tests.rs` The key regression checks exercise that: - `runtimeWorkspaceRoots` resolve against the effective cwd on thread start. - Profile-declared workspace roots are excluded from the runtime workspace roots returned by app-server. - A turn-level runtime workspace-root update persists onto the thread and is returned by `thread/resume`. - A named permission profile selected on one turn remains symbolic so a later runtime-root-only turn update changes the actual sandbox writes. - A cwd-only turn update retargets the implicit runtime cwd root while preserving additional runtime roots. - The protocol fixtures and generated client artifacts stay in sync with the string-based permission selection contract. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22611). * #22612 * __->__ #22611	2026-05-14 23:00:05 -07:00
Dylan Hurd	06bb508547	Stabilize compact rollback follow-up test (#22303 ) ## Summary - add the missing response.created event to the mocked empty follow-up response in the compact rollback test - keep the fix scoped to the flaky mocked stream shape, without increasing timeouts ## Recent flakes on main - `snapshot_rollback_followup_turn_trims_context_updates` failed in `rust-ci-full` on `main` in the Ubuntu remote test job on 2026-05-14: https://github.com/openai/codex/actions/runs/25891434395/job/76095284830 - The same `compact_resume_fork` suite also failed recently on `main` with `snapshot_rollback_past_compaction_replays_append_only_history`, which has the same mocked Responses stream shape sensitivity this PR is tightening: https://github.com/openai/codex/actions/runs/25892437363/job/76098329098 ## Verification - env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test all snapshot_rollback_followup_turn_trims_context_updates -- --nocapture - repeated the same focused test 3 consecutive times locally - UV_CACHE_DIR=/private/tmp/uv-cache-codex-fmt just fmt	2026-05-14 18:43:18 -07:00
mchen-oai	10cf1f79dd	Add `user_input_requested_during_turn` to MCP turn metadata (#22237 ) ## Why - Similar change as https://github.com/openai/codex/pull/21219 - Without change: MCP tool calls receive `_meta["x-codex-turn-metadata"]` with various key values. - Issue: MCP servers currently do not know if user input was requested during the turn (Ex: Model decides to prompt the user for approval mid-turn before making a possibly risky tool call). MCP servers may want to know this when tracking latency metrics because these instances are inflated. ## What Changed - With change: MCP turn metadata now includes `user_input_requested_during_turn` when a model-visible `request_user_input` call happened earlier in the turn, propagated in `_meta["x-codex-turn-metadata"]`. - `mark_turn_user_input_requested()` is called when user input is requested through either MCP elicitation (`mcp.rs`) or the `request_user_input` tool (`mod.rs`). - MCP tool call `_meta` is now built immediately before execution (`mcp_tool_call.rs`) so user input requested earlier in the same turn, including within the same tool call via elicitation, is reflected in the metadata. - Normal `/responses` turn metadata headers are unchanged. ## Verification - `codex-rs/core/src/session/mcp_tests.rs` - `codex-rs/core/src/tools/handlers/request_user_input_tests.rs` - `codex-rs/core/src/turn_metadata_tests.rs` - `codex-rs/core/tests/suite/search_tool.rs`	2026-05-15 01:26:50 +00:00
Michael Bolin	c25d905f61	permissions: support workspace roots in profiles (#22610 ) ## Why This is the configuration/model half of the alternative permissions migration we discussed as a comparison point for [#22401](https://github.com/openai/codex/pull/22401) and [#22402](https://github.com/openai/codex/pull/22402). The old `workspace-write` model mixes three concerns that we want to keep separate: - reusable profile rules that should stay immutable once selected - user/runtime workspace roots from `cwd`, `--add-dir`, and legacy workspace-write config - internal Codex writable roots such as memories, which should not be shown as user workspace roots This PR gives permission profiles first-class `workspace_roots` so users can opt multiple repositories into the same `:workspace_roots` rules without using broad absolute-path write grants. It also starts separating the raw selected profile from the effective runtime profile by making `Permissions` expose explicit accessors instead of public mutable fields. A representative `config.toml` looks like this: ```toml default_permissions = "dev" [permissions.dev.workspace_roots] "~/code/openai" = true "~/code/developers-website" = true [permissions.dev.filesystem.":workspace_roots"] "." = "write" ".codex" = "read" ".git" = "read" ".vscode" = "read" ``` If Codex starts in `~/code/codex` with that profile selected, the effective workspace-root set becomes: - `~/code/codex` from the runtime `cwd` - `~/code/openai` from the profile - `~/code/developers-website` from the profile The `:workspace_roots` rules are materialized across each root, so `.git`, `.codex`, and `.vscode` stay scoped the same way everywhere. Runtime additions such as `--add-dir` can still layer on later stack entries without mutating the selected profile. ## Stack Shape This PR intentionally stops before the profile-identity cleanup in [#22683](https://github.com/openai/codex/pull/22683) so the base review stays focused on config loading, workspace-root materialization, and compatibility with legacy `workspace-write`. The representation in this PR is therefore transitional: `Permissions` carries enough state to distinguish the raw constrained profile from the effective runtime profile, and there are still call sites that must keep the active profile identity and constrained profile value in sync. The follow-up PR replaces that with a single resolved profile state (`ResolvedPermissionProfile` / `PermissionProfileState`) that keeps the profile id, immutable `PermissionProfile`, and profile-declared workspace roots together. That follow-up removes APIs such as `set_constrained_permission_profile_with_active_profile()` where separate arguments could drift out of sync. Downstream PRs then build on this base to switch app-server turn updates to profile ids plus runtime workspace roots and to finish the user-visible summary behavior. Reviewers should judge this PR as the workspace-roots foundation, not as the final in-memory shape of selected permission profiles. ## Review Guide Suggested review order: 1. Start with `codex-rs/core/src/config/mod.rs`. This is the main shape change in the base slice. `Permissions` now stores a private raw `Constrained<PermissionProfile>` plus runtime `workspace_roots`. Callers use `permission_profile()` when they need the raw constrained value and `effective_permission_profile()` when they need a materialized runtime profile. As noted above, [#22683](https://github.com/openai/codex/pull/22683) replaces this transitional shape with a resolved profile state that keeps identity and profile data together. 2. Review `codex-rs/config/src/permissions_toml.rs` and `codex-rs/core/src/config/permissions.rs`. These add `[permissions.<id>.workspace_roots]`, resolve enabled entries relative to the policy cwd, and keep `:workspace_roots` deny-read glob patterns symbolic until the actual roots are known. 3. Review `codex-rs/protocol/src/permissions.rs` and `codex-rs/protocol/src/models.rs`. These add the policy/profile materialization helpers that expand exact `:workspace_roots` entries and scoped deny-read globs over every workspace root. This is also where `ActivePermissionProfileModification` is removed from the core model. 4. Review the legacy bridge in `Config::load_from_base_config_with_overrides` and `Config::set_legacy_sandbox_policy`. This is where legacy `workspace-write` roots become runtime workspace roots, while Codex internal writable roots stay internal and do not appear as user-facing workspace roots. 5. Then skim downstream call sites. The interesting pattern is raw-vs-effective access: state/proxy/bwrap paths keep the raw constrained profile, while execution, summaries, and user-visible status use the effective profile and workspace-root list. ## What Changed - added `[permissions.<id>.workspace_roots]` to the config model and schema - added runtime `workspace_roots` state to `Config`/`Permissions` and `ConfigOverrides` - made `Permissions` profile fields private and replaced direct mutation with accessors/setters - added `PermissionProfile` and `FileSystemSandboxPolicy` helpers for materializing `:workspace_roots` exact paths and deny-read globs across all roots - moved legacy additional writable roots into runtime workspace-root state instead of active profile modifications - removed `ActivePermissionProfileModification` and its app-server protocol/schema export - updated sandbox/status summary paths so internal writable roots are not reported as user workspace roots ## Verification Strategy The targeted tests cover the behavior at the layers where regressions are most likely: - `codex-rs/core/src/config/config_tests.rs` verifies config loading, legacy workspace-root seeding, effective profile materialization, and memory-root handling. - `codex-rs/core/src/config/permissions_tests.rs` verifies profile `workspace_roots` parsing and `:workspace_roots` scoped/glob compilation. - `codex-rs/protocol/src/permissions.rs` unit tests verify exact and glob materialization over multiple workspace roots. - `codex-rs/tui/src/status/tests.rs` and `codex-rs/utils/sandbox-summary/src/sandbox_summary.rs` verify the user-facing summaries show effective workspace roots and hide internal writes. I also ran `cargo check --tests` locally after the latest stack refresh to catch cross-crate API breakage from the private-field/accessor changes. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22610). * #22612 * #22611 * #22683 * __->__ #22610	2026-05-14 18:25:23 -07:00
Dylan Hurd	7dbe1c9498	[codex] Remove experimental instructions file config (#22724 ) ## Summary Remove the deprecated `experimental_instructions_file` config setting from the typed config surface and the remaining deprecation-notice plumbing. `model_instructions_file` remains the supported setting and its loading path is unchanged. The setting was deprecated when it was renamed to `model_instructions_file` on January 20, 2026 in https://github.com/openai/codex/pull/9555. ## Changes - Remove `experimental_instructions_file` from `ConfigToml` and `ConfigProfile`. - Delete the custom config-layer scan and session deprecation notice for the removed setting. - Stop clearing the removed field from generated session config locks. - Remove the obsolete deprecation-notice test case while keeping `model_instructions_file` coverage intact. ## Validation - `just write-config-schema` - `just fmt` - `cargo test -p codex-config` - `cargo test -p codex-core model_instructions_file` - `just fix -p codex-core` - `git diff --check` Co-authored-by: Codex <noreply@openai.com>	2026-05-14 18:04:26 -07:00
pakrym-oai	4bff020a96	Remove SSE fixture loaders (#22684 ) ## Why The Responses API test support already has structured SSE event builders. Keeping separate JSON fixture loaders made small mock streams harder to read and left an on-disk fixture for a single event. ## What changed - Removed `load_sse_fixture` and `load_sse_fixture_with_id_from_str` from `core_test_support`. - Deleted the one `tests/fixtures/incomplete_sse.json` Responses API fixture. - Replaced the remaining call sites with `responses::sse(...)` and existing event helpers. ## Validation - `cargo test -p codex-core --test all stream_no_completed::retries_on_early_close` - `cargo test -p codex-core --test all history_dedupes_streamed_and_final_messages_across_turns` - `cargo test -p codex-core --test all review::`	2026-05-15 00:40:32 +00:00
Dylan Hurd	51b0e94105	chore(features) rm Feature::ApplyPatchFreeform (#22711 ) ## Summary Removes the feature since this is effectively on by default in all cases where we should use it, or can be configured via models.json. ## Testing - [x] unit tests pass	2026-05-14 16:15:56 -07:00
starr-openai	32b45a43e2	tests: isolate codex home for live cli (#22563 ) ## Why Some core integration-test paths were creating Codex state under ambient `~/.codex`. In environments where `HOME=/tmp`, that showed up as `/tmp/.codex`, which is host-level shared state and makes these tests environment/order sensitive. The affected paths were: - `core/tests/suite/live_cli.rs`: `run_live()` spawned the real CLI with a temp cwd, but without an isolated home, so the child resolved Codex home from ambient `HOME`. - core / exec-server integration test binaries using `configure_test_binary_dispatch(...)`: their startup ctor installs arg0 helper aliases like `apply_patch` and `codex-linux-sandbox`. Full `arg0_dispatch()` also installs aliases from ambient Codex-home resolution, so test-binary startup could create `CODEX_HOME/tmp/arg0`; with `HOME=/tmp`, that became `/tmp/.codex/tmp/arg0/...`. ## What changed - `live_cli` now gives the spawned CLI a temp `HOME` and temp `CODEX_HOME`. - arg0 alias setup now has an explicit-home form, `prepend_path_entry_for_codex_aliases_in(...)`, so test helpers can place alias state under a temp directory without relying on ambient `CODEX_HOME`. - helper re-entry behavior is preserved with `dispatch_arg0_if_needed()`, so aliases like `apply_patch` and `codex-linux-sandbox` still dispatch correctly before test alias installation. - core test support keeps the temp Codex home alive for the lifetime of the test binary, matching the alias lifetime. ## Verification Verified on `dev2` with `HOME=/tmp` that the focused core test-binary startup path no longer recreates `/tmp/.codex`. Also checked the exact `live_cli` test path under `HOME=/tmp`; on `dev2` it still hits the existing remote-only `cargo_bin("codex-rs")` resolution failure before spawning the child, but `/tmp/.codex` remains absent after the run.	2026-05-14 12:59:56 -07:00
starr-openai	255748638c	Fix remote environment test fixtures (#22572 ) ## Why The Docker remote-env coverage was failing before it reached the behavior those tests are meant to exercise. The remote-aware test fixture only registered the remote environment, so tests that intentionally select both `local` and `remote` could not start a turn. After that was fixed, two tests exposed stale fixtures: the approval test was auto-approving under workspace-write, and the remote `view_image` test was writing invalid PNG bytes. ## What Changed - Added `EnvironmentManager::create_for_tests_with_local(...)` so tests can keep the provider default while also selecting `local` explicitly. - Updated `build_remote_aware()` to use that test-only manager when a remote exec-server URL is present. - Changed the remote apply-patch approval helper to use `SandboxPolicy::new_read_only_policy()` so the test actually exercises approval caching per environment. - Replaced the hardcoded remote `view_image` PNG blob with the existing `png_bytes(...)` helper so the test uses a valid image fixture. ## Validation Ran these isolated Docker remote-env tests on the devbox with `$remote-tests` setup: - `suite::remote_env::apply_patch_freeform_routes_to_selected_remote_environment` - `suite::remote_env::apply_patch_approvals_are_remembered_per_environment` - `suite::remote_env::apply_patch_intercepted_exec_command_routes_to_selected_remote_environment` - `suite::remote_env::exec_command_routes_to_selected_remote_environment` - `suite::view_image::view_image_routes_to_selected_remote_environment` All five pass.	2026-05-14 12:40:01 -07:00
Matthew Zeng	d8ddeb6869	Support explicit MCP OAuth client IDs (#22575 ) ## Why Some MCP OAuth providers require a pre-registered public client ID and cannot rely on dynamic client registration. Codex already supports MCP OAuth, but it had no way to supply that client ID from config into the PKCE flow. ## What changed - add `oauth.client_id` under `[mcp_servers.<server>]` config, including config editing and schema generation - thread the configured client ID through CLI, app-server, plugin login, and MCP skill dependency OAuth entrypoints - configure RMCP authorization with the explicit client when present, while preserving the existing dynamic-registration path when it is absent - add focused coverage for config parsing/serialization and OAuth URL generation ## Verification - `cargo test -p codex-config -p codex-rmcp-client -p codex-mcp -p codex-core-plugins` - `cargo test -p codex-core blocking_replace_mcp_servers_round_trips --lib` - `cargo test -p codex-core replace_mcp_servers_streamable_http_serializes_oauth_resource --lib` - `cargo test -p codex-core config_schema_matches_fixture --lib` ## Notes Broader local package runs still hit unrelated pre-existing stack overflows in: - `codex-app-server::in_process_start_clamps_zero_channel_capacity` - `codex-core::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`	2026-05-14 11:52:43 -07:00
starr-openai	8736e32657	tests: avoid ambient temp sandbox roots (#22576 ) ## Why Some sandboxed integration tests enabled both ambient temp roots (`TMPDIR` and literal `/tmp`) even though they were not testing temp-root behavior. On Linux bwrap, making `/tmp` writable causes protected metadata mount targets such as `/tmp/.git`, `/tmp/.agents`, and `/tmp/.codex` to be synthesized. If a run is interrupted, those top-level markers can be left behind and contaminate later tests. ## What changed For the incidental integration tests that do not need ambient temp-root access, set `exclude_tmpdir_env_var` and `exclude_slash_tmp` to `true`. Dedicated protected-metadata coverage remains in the lower-level sandbox tests that use isolated temp roots. ## Verification Focused remote devbox repros passed with a watcher polling `/tmp/.git`, `/tmp/.agents`, and `/tmp/.codex`; no leaked markers were observed.	2026-05-14 10:04:24 -07:00
jif-oai	deedf3b2c4	feat: add layered --profile-v2 config files (#17141 ) ## Why `--profile-v2 <name>` gives launchers and runtime entry points a named profile config without making each profile duplicate the base user config. The base `$CODEX_HOME/config.toml` still loads first, then `$CODEX_HOME/<name>.config.toml` layers above it and becomes the active writable user config for that session. That keeps shared defaults, plugin/MCP setup, and managed/user constraints in one place while letting a named profile override only the pieces that need to differ. ## What Changed - Added the shared `--profile-v2 <name>` runtime option with validated plain names, now represented by `ProfileV2Name`. - Extended config layer state so the base user config and selected profile config are both `User` layers; APIs expose the active user layer and merged effective user config. - Threaded profile selection through runtime entry points: `codex`, `codex exec`, `codex review`, `codex resume`, `codex fork`, and `codex debug prompt-input`. - Made user-facing config writes go to the selected profile file when active, including TUI/settings persistence, app-server config writes, and MCP/app tool approval persistence. - Made plugin, marketplace, MCP, hooks, and config reload paths read from the merged user config so base and profile layers both participate. - Updated app-server config layer schemas to mark profile-backed user layers. ## Limits `--profile-v2` is still rejected for config-management subcommands such as feature, MCP, and marketplace edits. Those paths remain tied to the base `config.toml` until they have explicit profile-selection semantics. Some adjacent background writes may still update base or global state rather than the selected profile: - marketplace auto-upgrade metadata - automatic MCP dependency installs from skills - remote plugin sync or uninstall config edits - personality migration marker/default writes ## Verification Added targeted coverage for profile name validation, layer ordering/merging, selected-profile writes, app-server config writes, session hot reload, plugin config merging, hooks/config fixture updates, and MCP/app approval persistence. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-14 15:16:15 +02:00
Abhinav	23bb524973	Spill oversized PreToolUse additionalContext (#22529 ) # Why `PreToolUse.additionalContext` became model-visible after #20692, but the hook-output spilling path from #21069 never picked up that newer lane. As a result, oversized `PreToolUse` context could bypass the truncation/spill treatment that already applies to the other hook outputs Codex forwards to the model. # What - Run `PreToolUseOutcome.additional_contexts` through `maybe_spill_texts(...)` - Add an integration test proving a large `PreToolUse.additionalContext` is replaced with a truncated preview plus spill-file pointer, while the full text is preserved on disk.	2026-05-13 15:21:31 -07:00
pakrym-oai	83decfa300	[codex] Remove unused legacy shell tools (#22246 ) ## Why Recent session history showed no active use of the raw `shell`, `local_shell`, or `container.exec` execution surfaces. Keeping those handlers/specs wired into core leaves duplicate shell execution paths alongside the supported `shell_command` and unified exec tools. ## What changed - Removed the raw `shell` handler/spec and its `ShellToolCallParams` protocol helper. - Removed the legacy `local_shell` and `container.exec` handler/spec plumbing while preserving persisted-history compatibility for old response items. - Normalized model/config `default` and `local` shell selections to `shell_command`. - Pruned tests that exercised removed raw-shell/local-shell/apply-patch variants and kept coverage on `shell_command`, unified exec, and freeform `apply_patch`. ## Verification - `git diff --check` - `cargo test -p codex-protocol` - `cargo test -p codex-tools` - `cargo test -p codex-core tools::handlers::shell` - `cargo test -p codex-core tools::spec` - `cargo test -p codex-core tools::router` - `cargo test -p codex-core active_call_preserves_triggering_command_context` - `cargo test -p codex-core guardian_tests` - `cargo test -p codex-core --test all shell_serialization` - `cargo test -p codex-core --test all apply_patch_cli` - `cargo test -p codex-core --test all shell_command_` - `cargo test -p codex-core --test all local_shell` - `cargo test -p codex-core --test all otel::` - `cargo test -p codex-core --test all hooks::` - `just fix -p codex-core` - `just fix -p codex-tools`	2026-05-13 16:43:25 +00:00
jif-oai	7c7b4861d8	fix: drop underscored id headers (#22193 ) ## Why Stop sending duplicate `session_id`/`thread_id` headers. We only want the hyphenated forms as `_` is rejected by some proxies Related discussion here: https://openai.slack.com/archives/C095U48JNL9/p1778508316923179 ## What - Keep `session-id` and `thread-id` - Remove the underscore aliases	2026-05-13 18:21:02 +02:00
Ahmed Ibrahim	87de4e3290	Add service tier overrides to spawned agents (#22139 ) ## Why Spawned agents can already override `model` and `reasoning_effort`, but they have no equivalent way to opt into a model-supported service tier. That makes it impossible to preserve or intentionally select tiered execution behavior when delegating work to a sub-agent, even though the model catalog already advertises supported `service_tiers`. ## What changed - Add optional `service_tier` to both legacy and `MultiAgentV2` `spawn_agent` tool inputs. - Show each picker-visible model's supported service tier ids and descriptions in the `spawn_agent` tool guidance. - Resolve service tier selection after the child agent's effective model is known. - Inherit the parent tier when omitted and still supported by the final child model; otherwise clear it. - Reject explicit unsupported tier requests with a model-facing error. - Keep explicit `service_tier` usable on full-history forks, while still honoring the existing model/reasoning fork restrictions. - Hide `service_tier` alongside other spawn metadata when `hide_spawn_agent_metadata` is enabled. ## Verification Added focused coverage for: - v1/v2 `spawn_agent` schema exposure for `service_tier` - tier descriptions in spawn guidance - hidden-metadata suppression - explicit supported tier selection - explicit unknown and unsupported tier rejection - inherited tier preservation or clearing based on child-model support - full-history fork acceptance for explicit service tiers in both v1 and v2 Local Rust tests were not run in this workspace per repo guidance; the new coverage is included for CI.	2026-05-13 18:11:50 +03:00
sayan-oai	2304ec45ca	Remove unavailable MCP placeholder tool backfill (#22439 ) ## Why `UnavailableDummyTools` kept synthetic placeholder tools alive for historical tool calls whose backing MCP tool was no longer available. That path adds stale model-visible tool specs and special routing at the point where unavailable MCP calls should use ordinary current-tool handling. This removes the runtime backfill instead of preserving a second compatibility lane. ## Is it safe to remove? The unavailable tools were added in #17853 after a CS issue when a previously-called MCP tool failed to load and was omitted from the CS spec. Now that we have tool search, I think this is resolved: - API merges tools from previous TST output into effective tool set so theyre always in CS spec - if an MCP tool surfaced by TST later becomes unavailable, the model can still call it and it will just return model-visible error - both TST output and function call output are dropped on compaction so model will not remember old calls to MCP post compaction ## What changed - Delete unavailable-tool collection, placeholder handler, router/spec plumbing, and obsolete placeholder coverage. - Keep `features.unavailable_dummy_tools` as a removed no-op feature tombstone so existing configs still parse cleanly. - Add an integration-style `tool_search` regression test showing that a deferred MCP tool surfaced through `tool_search` still routes through MCP and returns a model-visible tool-call error rather than `unsupported call`. ## Verification - `cargo test -p codex-core tool_search`	2026-05-12 23:30:13 -07:00
pakrym-oai	96833c5b15	Remove CODEX_RS_SSE_FIXTURE test hook (#22413 ) ## Why `CODEX_RS_SSE_FIXTURE` let integration-style CLI, exec, and TUI tests bypass the normal Responses transport by reading SSE from local files. That kept test-only behavior wired through production client code. The affected tests can stay hermetic by using the existing `core_test_support::responses` mock server and passing `openai_base_url` instead. ## What Changed - Removed the `CODEX_RS_SSE_FIXTURE` flag, `codex_api::stream_from_fixture`, the `env-flags` dependency, and the checked-in SSE fixture files. - Repointed the affected core, exec, and TUI tests at `MockServer` with the existing SSE event constructors. - Removed the Bazel test data plumbing for the deleted fixtures and refreshed cargo/Bazel lock state. ## Verification - `cargo build -p codex-cli` - `cargo test -p codex-api` - `cargo test -p codex-core --test all responses_api_stream_cli` - `cargo test -p codex-core --test all integration_creates_and_checks_session_file` - `cargo test -p codex-exec --test all ephemeral` - `cargo test -p codex-exec --test all resume` - `cargo test -p codex-tui --test all resume_startup_does_not_consume_model_availability_nux_count` - `just bazel-lock-update` - `just bazel-lock-check` - `just fix -p codex-api -p codex-core -p codex-exec -p codex-tui` - `git diff --check`	2026-05-13 03:08:01 +00:00
Tom	c51c65ad09	Unify thread metadata updates above store (#22236 ) - make ThreadStore::update_thread_metadata accept a broad range of metadata patches - keep ThreadStore::append_items as raw canonical history append (no metadata side effects) - in the local store, write these metadata updates to a combination of sqlite and rollout jsonl files for backwards-compat. It special cases which fields need to go into jsonl vs sqlite vs whatever, confining the awkwardness to just this implementation - in remote stores we can simply persist the metadata directly to a database, no special casing required. - move the "implicit metadata updates triggered by appending rollout items" from the RolloutRecorder (which is local-threadstore-specific) to the LiveThread layer above the ThreadStore, inside of a private helper utility called ThreadMetadataSync. LiveThread calls ThreadStore append_items and update_metadata separately. - Add a generic update metadata method to ThreadManager that works on both live threads and "cold" threads - Call that ThreadManager method from app server code, so app server doesn't need to worry about whether the thread is live or not	2026-05-13 00:28:15 +00:00
pakrym-oai	f11ad1eacb	[codex] Add search term coverage for tool_search (#22398 ) ## Why `tool_search` already had solid end-to-end coverage for discovery and follow-up execution, but it did not prove that distinct pieces of indexed search text actually work in integration. In particular, we were not exercising whether unique tool names, descriptions, namespaces, underscore-expanded dynamic names, and schema-property terms were sufficient to surface the expected deferred tools. This change adds focused integration coverage for those term sources so regressions in search text construction are caught by a real `TestCodex` flow instead of only by lower-level unit tests. ## What changed - added a small helper in `core/tests/suite/search_tool.rs` to assert that a `tool_search_output` contains an expected namespace child tool - added an MCP integration test that issues several `tool_search_call`s and verifies distinct query terms match the expected app tools: - exact tool name: `calendar_timezone_option_99` - tool description phrase: `uploaded document` - top-level schema property: `starts_at` - added a dynamic-tool integration test that verifies distinct query terms match the expected deferred dynamic tool: - exact name: `quasar_ping_beacon` - underscore-expanded name: `quasar ping beacon` - description phrase: `saffron metronome` - namespace: `orbit_ops` - schema property: `chrono_spec` ## Validation - `cargo test -p codex-core tool_search_matches_` ## Docs No documentation update needed.	2026-05-13 00:24:07 +00:00
pakrym-oai	0173f71143	Refactor namespaced tool spec registration (#22256 ) ## Summary This refactor makes tool handlers the owner of the specs they can publish, so registry construction can register handlers once and separately publish only the specs that should be model-visible. The main motivation is deferred tools: MCP and dynamic tools still need handlers registered up front, but deferred tools should be discoverable through `tool_search` rather than emitted in the initial tool spec list. ## What changed - `McpHandler` and `DynamicToolHandler` can return their own `ToolSpec`. - `build_tool_registry_builder` now collects handlers, registers them through the no-spec path, and publishes only non-deferred handler specs. - Deferred MCP and dynamic tool names are combined into one `all_deferred_tools` set that drives spec filtering, code-mode deferred-tool signaling, and `tool_search` registration. - `tool_search` registration now requires both deferred tools and `namespace_tools`. - Namespace specs are merged in `spec_plan`, preserving top-level spec order, sorting tools within each namespace, and backfilling empty namespace descriptions. - Hosted web search and image-generation specs are included in the collected spec vector before namespace merge/publication, and tool-name tests that should not care about hosted relative order now compare sets. ## Testing - `cargo test -p codex-core tools::spec::tests:: -- --nocapture` - `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture` - `cargo test -p codex-core tools::router::tests::specs_filter_deferred_dynamic_tools -- --nocapture` - `cargo test -p codex-core suite::prompt_caching::prompt_tools_are_consistent_across_requests -- --nocapture` - `just fmt` - `just fix -p codex-core` - `cargo test -p codex-core -- --skip tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed` passed the library suite after skipping the known stack-overflowing unit test. Full `cargo test -p codex-core` currently hits a stack overflow in `tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`; the same focused test reproduces on `origin/main`.	2026-05-12 17:09:14 -07:00
Dylan Hurd	8123bddb16	chore(config) include_collaboration_mode_instructions (#22383 ) ## Summary Adds include_collaboration_mode_instructions, which is a config equivalent to include_permissions_instructions for collaboration modes. Desired for situations where we want to disable this instruction from entering the context ## Testing - [x] Added unit test	2026-05-12 15:50:10 -07:00
viyatb-oai	46f30d0282	feat(sandbox): add Windows deny-read parity (#18202 ) ## Why The split filesystem policy stack already supports exact and glob `access = none` read restrictions on macOS and Linux. Windows still needed subprocess handling for those deny-read policies without claiming enforcement from a backend that cannot provide it. ## Key finding The unelevated restricted-token backend cannot safely enforce deny-read overlays. Its `WRITE_RESTRICTED` token model is authoritative for write checks, not read denials, so this PR intentionally fails that backend closed when deny-read overrides are present instead of claiming unsupported enforcement. ## What changed This PR adds the Windows deny-read enforcement layer and makes the backend split explicit: - Resolves Windows deny-read filesystem policy entries into concrete ACL targets. - Preserves exact missing paths so they can be materialized and denied before an enforceable sandboxed process starts. - Snapshot-expands existing glob matches into ACL targets for Windows subprocess enforcement. - Honors `glob_scan_max_depth` when expanding Windows deny-read globs. - Plans both the configured lexical path and the canonical target for existing paths so reparse-point aliases are covered. - Threads deny-read overrides through the elevated/logon-user Windows sandbox backend and unified exec. - Applies elevated deny-read ACLs synchronously before command launch rather than delegating them to the background read-grant helper. - Reconciles persistent deny-read ACEs per sandbox principal so policy changes do not leave stale deny-read ACLs behind. - Fails closed on the unelevated restricted-token backend when deny-read overrides are present, because its `WRITE_RESTRICTED` token model is not authoritative for read denials. ## Landed prerequisites These prerequisite PRs are already on `main`: 1. #15979 `feat(permissions): add glob deny-read policy support` 2. #18096 `feat(sandbox): add glob deny-read platform enforcement` 3. #17740 `feat(config): support managed deny-read requirements` This PR targets `main` directly and contains only the Windows deny-read enforcement layer. ## Implementation notes - Exact deny-read paths remain enforceable on the elevated path even when they do not exist yet: Windows materializes the missing path before applying the deny ACE, so the sandboxed command cannot create and read it during the same run. - Existing exact deny paths are preserved lexically until the ACL planner, which then adds the canonical target as a second ACL target when needed. That keeps both the configured alias and the resolved object covered. - Windows ACLs do not consume Codex glob syntax directly, so glob deny-read entries are expanded to the concrete matches that exist before process launch. - Glob traversal deduplicates directory visits within each pattern walk to avoid cycles, without collapsing distinct lexical roots that happen to resolve to the same target. - Persistent deny-read ACL state is keyed by sandbox principal SID, so cleanup only removes ACEs owned by the same backend principal. - Deny-read ACEs are fail-closed on the elevated path: setup aborts if mandatory deny-read ACL application fails. - Unelevated restricted-token sessions reject deny-read overrides early instead of running with a silently unenforceable read policy. ## Verification - `cargo test -p codex-core windows_restricted_token_rejects_unreadable_split_carveouts` - `just fmt` - `just fix -p codex-core` - `just fix -p codex-windows-sandbox` - GitHub Actions rerun is in progress on the pushed head. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 23:04:28 -07:00
pakrym-oai	79c65f816c	[codex] Filter legacy warning messages during compaction (#22243 ) ## Why Older sessions can contain model-warning records persisted as `user` messages, including the unified exec process-limit warning, the `apply_patch`-via-`exec_command` warning, and the model-mismatch high-risk cyber fallback warning. Those warnings are no longer produced as conversation history items, but when old sessions compact they should still be recognized as injected context rather than preserved as real user turns. ## What changed - Removed `record_model_warning` and the production paths that emitted these warning messages into conversation history. - Added `LegacyUnifiedExecProcessLimitWarning`, `LegacyApplyPatchExecCommandWarning`, and `LegacyModelMismatchWarning` contextual fragments that are used only for matching old persisted messages. - Registered the legacy fragments with contextual user message detection so compaction filters them through the existing fragment path. - Added focused compaction coverage for old warning messages being dropped during compacted-history processing. ## Testing - `cargo test -p codex-core warning` - `just fix -p codex-core`	2026-05-11 19:51:51 -07:00
Abhinav	d08906a944	Support PreToolUse updatedInput rewrites (#20527 ) ## Why `PreToolUse` already exposes `updatedInput` in its hook output schema, but Codex currently rejects it instead of applying the rewrite. That leaves hook authors unable to make the documented pre-execution adjustment to a tool call before it runs. ## What - Accept `updatedInput` from `PreToolUse` hooks when paired with `permissionDecision: "allow"`. - Apply the rewritten input before dispatch so the tool executes the updated payload, not the original one. - Preserve the stable hook-facing compatibility shapes that participating tool handlers expose: - Bash-like tools (`shell`, `container.exec`, `local_shell`, `shell_command`, `exec_command`) use `{ "command": ... }`. - `apply_patch` exposes its patch body through the same command-shaped hook contract. - MCP tools expose their JSON argument object directly. - Keep each participating tool handler responsible for translating hook-facing `updatedInput` back into its concrete invocation shape. ## Verification Direct Bash-like rewrite coverage: - `pre_tool_use_rewrites_shell_before_execution` - `pre_tool_use_rewrites_container_exec_before_execution` - `pre_tool_use_rewrites_local_shell_before_execution` - `pre_tool_use_rewrites_shell_command_before_execution` - `pre_tool_use_rewrites_exec_command_before_execution` These cases assert that each supported Bash-like surface runs only the rewritten command while the hook still observes the original `{ "command": ... }` input. `pre_tool_use_rewrites_apply_patch_before_execution` - Model emits one patch. - Hook swaps in a different patch. - Asserts only the rewritten file is created, and the hook saw the original patch. `pre_tool_use_rewrites_code_mode_nested_exec_command_before_execution` - Model runs one nested shell command from code mode. - Hook rewrites it. - Asserts only the rewritten command runs, and the hook saw the original nested input. `pre_tool_use_rewrites_mcp_tool_before_execution` - Model calls the RMCP echo tool. - Hook rewrites the MCP arguments. - Asserts the MCP server receives and returns the rewritten message, not the original one.	2026-05-11 22:27:24 -04:00
starr-openai	17ed5ad0b0	Apply sandbox context to local view_image reads (#21861 ) ## Summary - create a selected-cwd filesystem sandbox context for view_image metadata and file reads in both local and remote environments - add a local restricted-profile regression test for the previously unsandboxed read path ## Validation - just fmt - bazel test --bes_backend= --bes_results_url= --test_output=errors --test_filter=view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads //codex-rs/core:core-unit-tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 18:48:43 -07:00
starr-openai	22e84c49d0	Support multi-environment apply_patch selection (#21617 ) ## Summary - add multi-environment apply_patch routing for both freeform and function-call tool flows - parse and reconcile the optional environment selector in the main apply_patch parser, then verify against the selected environment in the handler - carry environment_id through runtime and approval surfaces so remote-targeted patches stay explicit end to end ## Testing - just fmt - remote exec-server e2e: `cargo test -p codex-core --test all apply_patch_multi_environment_uses_remote_executor -- --nocapture` on dev via `scripts/test-remote-env.sh` --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 16:33:44 -07:00
viyatb-oai	c7b55cdc46	feat: add network proxy feature flag (#20147 ) ## Why The permissions migration is making `permissions.<profile>.network.enabled` the canonical sandbox network bit, while proxy startup is a separate concern. Enabling network access should not implicitly start the proxy, and users who are still on legacy sandbox modes need a separate place to opt into proxy startup and provide proxy-specific settings. This follow-up to #19900 gives the network proxy its own feature surface instead of overloading permission-profile network semantics. ## What changed - Add an experimental `network_proxy` feature with a configurable `[features.network_proxy]` table. - Overlay `features.network_proxy` settings onto the configured proxy state after permission-profile selection, so the proxy only starts when the active `NetworkSandboxPolicy` already allows network access. - Preserve `[experimental_network]` startup behavior independently of the new feature flag. ## Behavior and examples There are now three related knobs: - `permissions.<profile>.network.enabled` controls whether the active permission profile has network access at all. - `features.network_proxy` enables proxy restrictions for an already-network-enabled profile. - Legacy `sandbox_mode` plus `[sandbox_workspace_write].network_access` still control whether legacy `workspace-write` has network access at all. The rule is: - network off + proxy flag on -> network stays off, proxy is a no-op - network on + proxy flag off -> unrestricted direct network - network on + proxy flag on -> network stays on, with proxy restrictions applied For permission profiles, the feature toggle adds proxy restrictions only when network access is already enabled: ```toml default_permissions = "workspace" [permissions.workspace.filesystem] ":minimal" = "read" [permissions.workspace.network] enabled = true [features] network_proxy = true ``` If `network.enabled = false`, the same feature flag is a no-op: network remains off and the proxy does not start. For legacy sandbox config, `network_access` remains the master switch: ```toml sandbox_mode = "workspace-write" [sandbox_workspace_write] network_access = true [features] network_proxy = true ``` That keeps legacy `workspace-write` network access on, but routes it through the proxy policy. If `network_access = false`, the proxy feature is a no-op and legacy `workspace-write` remains offline. The same proxy opt-in can be supplied from the CLI: ```bash codex -c 'features.network_proxy=true' ``` Additional proxy settings can be supplied when a table is needed: ```bash codex \ -c 'features.network_proxy.enabled=true' \ -c 'features.network_proxy.enable_socks5=false' ``` The intended behavior matrix is: \| Config surface \| Network setting \| `features.network_proxy` \| Direct sandbox network \| Proxy \| \| --- \| --- \| --- \| --- \| --- \| \| Permission profile \| `network.enabled = false` \| off \| restricted \| off \| \| Permission profile \| `network.enabled = false` \| on \| restricted \| off \| \| Permission profile \| `network.enabled = true` \| off \| enabled \| off \| \| Permission profile \| `network.enabled = true` \| on \| enabled \| on \| \| Legacy `workspace-write` \| `network_access = false` \| off \| restricted \| off \| \| Legacy `workspace-write` \| `network_access = false` \| on \| restricted \| off \| \| Legacy `workspace-write` \| `network_access = true` \| off \| enabled \| off \| \| Legacy `workspace-write` \| `network_access = true` \| on \| enabled \| on \| `[experimental_network]` requirements remain separate from the user feature toggle and still start the proxy on their own. Relevant code: - [`features/src/feature_configs.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/features/src/feature_configs.rs#L58-L117) defines the feature-specific proxy config. - [`core/src/config/mod.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L1959-L1964) reads the feature table, and [later applies it only when network access is already enabled](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L2448-L2458). ## Verification Added focused coverage for: - keeping the proxy off when `features.network_proxy` is enabled but sandbox network access is disabled - the full permission-profile and legacy `workspace-write` matrix above - preserving `[experimental_network]` startup without the feature - reusing profile-supplied proxy settings when the feature is enabled Ran: - `cargo test -p codex-features` - `cargo test -p codex-core network_proxy_feature` - `cargo test -p codex-core experimental_network_requirements_enable_proxy_without_feature`	2026-05-11 14:12:00 -07:00
Dylan Hurd	e783dab44c	fix(exec-policy) use is_known_safe_command less (#20305 ) ## Summary Restricts behavior of `is_known_safe_command` only to modes where it is explicitly part of the documented behavior: - when `environment_lacks_sandbox_protections` - in `AskForApproval::UnlessTrusted` Notably, as a result of this, escalations for commands that pass `is_known_safe_commands` are no longer auto-approved in AskForApproval::OnRequest or AskForApproval::Granular. ## Testing - [x] Updated unit tests - [x] Updated approvals scenario tests. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 11:37:53 -07:00
jif-oai	32b1ae7099	chore: drop built-in MCPs (#22173 ) Drop something that was never used	2026-05-11 19:45:08 +02:00
Ahmed Ibrahim	69f3183a8e	Revert "[codex] Harden overflow auto-compaction recovery" (#22170 ) Reverts openai/codex#22141	2026-05-11 19:33:15 +03:00
Ahmed Ibrahim	15e79f3c26	[codex] Harden overflow auto-compaction recovery (#22141 ) ## Why Dogfooder feedback exposed two correctness gaps in normal-loop overflow recovery: 1. a sampling request that hit `ContextWindowExceeded` could keep re-entering auto-compaction indefinitely if the compacted retry still did not fit, and 2. local compact-history rebuilds flattened user messages down to text, so an overflowing `[image, "what is this?"]` turn could be retried without the image after compaction. That means recovery could either fail to terminate cleanly or proceed with a materially weakened version of the user request. ## What changed - Move normal-loop `ContextWindowExceeded` handling into the sampling retry loop, so successful rescue compaction consumes the provider retry budget instead of creating an unbounded outer-turn loop. - Keep compacted user-history rebuilds structured: `collect_user_messages` now carries user `UserInput` content rather than flattened strings, and `build_compacted_history` reconstructs full user messages from that structured representation. - Preserve image inputs while retaining the existing text-budget truncation behavior for compacted user history. - Preserve existing compaction-task failure handling and client-session reset behavior while bounding repeated overflow retries. - Add focused regression coverage for: - recovery after a normal-loop overflow, - retry-budget exhaustion after repeated overflow, - local recovery preserving image + text input, - remote recovery preserving image + text input, - remote compaction v2 preserving image + text input, and - compaction failure still terminating cleanly. The main behavior changes are in `codex-rs/core/src/session/turn.rs` and `codex-rs/core/src/compact.rs`. ## Verification - Not run locally; relying on PR CI for this update. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 16:16:49 +00:00
Andrey Mishchenko	704ad620f6	Add x-codex-ws-stream-request-start-ms (#22113 ) For capturing client-side timing information.	2026-05-11 08:15:52 -07:00
jif-oai	436c0df658	extension: wire extension registries into sessions (#21737 ) ## Why [#21736](https://github.com/openai/codex/pull/21736) introduces the typed extension API, but the runtime does not yet carry a registry through thread/session startup or give contributors host-owned stores to read from. This PR wires that host-side path so later feature migrations can move product-specific behavior behind typed contributions without adding another bespoke seam directly to `codex-core`. ## What changed - Thread `ExtensionRegistry<Config>` through `ThreadManager`, `CodexSpawnArgs`, `Session`, and sub-agent spawn paths. - Wire `ThreadStartContributor` and `ContextContributor` - Expose the small supporting surface needed by non-core callers that construct threads directly, including `empty_extension_registry()` through `codex-core-api`. This PR lands the host plumbing only: the app-server registry is still empty, and concrete feature migrations are intended to follow separately.	2026-05-11 11:38:18 +02:00
Michael Bolin	0c70698e24	tests: cover sandbox link write behavior (#21819 ) ## Why [PR #1705](https://github.com/openai/codex/pull/1705) moved `apply_patch` execution under the configured sandbox and called out the need for integration coverage. We already covered textual `../` escapes, but did not have coverage for link aliases that live inside a writable workspace while pointing at, or aliasing, files visible outside it. This PR locks in the current sandbox boundary without changing production write semantics. Symlink escapes into a read-only outside root should fail and leave the outside file unchanged. Existing hard links are characterized separately: if a user-created hard link already exists inside the writable root, sandboxed writes preserve normal hard-link semantics rather than replacing the link and silently breaking that relationship. ## What Changed - Added `apply_patch_cli_does_not_write_through_symlink_escape_outside_workspace` to verify `apply_patch` cannot update a symlink that targets a file outside the writable workspace. - Added `apply_patch_cli_preserves_existing_hard_link_outside_workspace` to verify `apply_patch` intentionally writes through an existing hard link and does not unlink or replace it. - Added `file_system_sandboxed_write_preserves_existing_hard_link` to verify sandboxed `fs/writeFile` preserves an existing hard link and writes the shared inode. ## Testing - `cargo test -p codex-exec-server file_system_sandboxed_write` - `cargo test -p codex-core apply_patch_cli_does_not_write_through_symlink_escape_outside_workspace` - `cargo test -p codex-core apply_patch_cli_preserves_existing_hard_link_outside_workspace` - `just fix -p codex-exec-server -p codex-core` - `just fix -p codex-core` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21819). * #21845 * __->__ #21819	2026-05-09 08:28:15 -07:00
pakrym-oai	408e6218ab	Reapply "Move skills watcher to app-server" (#21652 ) ## Why PR #21460 reverted the earlier move of skills change watching from `codex-core` into app-server. This reapplies that boundary change so app-server owns client-facing `skills/changed` notifications and core no longer carries the watcher. ## What - Restore the app-server `SkillsWatcher` and register it from thread listener setup. - Remove the core-owned skills watcher and its core live-reload integration surface. - Restore app-server coverage for `skills/changed` notifications after a watched skill file changes. ## Validation - `cargo test -p codex-app-server --test all suite::v2::skills_list::skills_changed_notification_is_emitted_after_skill_change -- --exact --nocapture` - `cargo test -p codex-core --lib --no-run`	2026-05-08 17:41:15 -07:00
Charlie Marsh	7c9731c9af	Enable `--deny-warnings` for `cargo shear` (#21616 ) ## Summary In https://github.com/openai/codex/pull/21584, we disabled doctests for crates that lack any doctests. We can enforce that property via `cargo shear --deny-warnings`: crates that lack doctests will be flagged if doctests are enabled, and crates with doctests will be flagged if doctests are disabled. A few additional notes: - By adding `--deny-warnings`, `cargo shear` also flagged a number of modules that were not reachable at all. Some of those have been removed. - This PR removes a usage of `windows_modules!` (since `cargo shear` and `rustfmt` couldn't see through it) in favor of simple `#[cfg(target_os = "windows")]` macros. As a consequence, many of these files exhibit churn in this PR, since they weren't being formatted by `rustfmt` at all on main. - Again, to make the code more analyzable, this PR also removes some usages of `#[path = "cwd_junction.rs"]` in favor of a more standard module structure. The bin sidecar structure is still retained, but, e.g., `windows-sandbox-rs/src/bin/command_runner.rs‎` was moved to `windows-sandbox-rs/src/bin/command_runner/main.rs`, and so on. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-08 20:29:00 +00:00
pakrym-oai	e783341b70	[codex] Delete function-style apply_patch (#21651 ) ## Why `apply_patch` is now a freeform/custom tool. Keeping the old JSON/function-style registration and parsing path left another way for models and tests to invoke `apply_patch`, which made the tool surface harder to reason about. ## What changed - Removed the `ApplyPatchToolType::Function` variant, JSON `apply_patch` spec, and handler support for function payloads. - Kept `apply_patch_tool_type = freeform` as the supported model metadata path, including Bedrock catalog metadata. - Migrated `apply_patch` tests and SSE fixtures to custom/freeform tool calls. ## Verification - `cargo test -p codex-tools -p codex-protocol -p codex-model-provider` - `cargo test -p codex-core tools::handlers::apply_patch --lib` - `cargo test -p codex-core --test all apply_patch_tool_executes_and_emits_patch_events` - `cargo test -p codex-core --test all apply_patch_reports_parse_diagnostics` - `cargo test -p codex-exec test_apply_patch_tool` - `just fix -p codex-core` - `just fix -p codex-tools -p codex-protocol -p codex-model-provider -p codex-exec`	2026-05-08 13:00:57 -07:00
Jiaming Zhang	5f4d0ec343	[codex] request desktop attestation from app (#20619 ) ## Summary TL;DR: teaches `codex-rs` / app-server to request a desktop-provided attestation token and attach it as `x-oai-attestation` on the scoped ChatGPT Codex request paths. ![DeviceCheck attestation interface](https://raw.githubusercontent.com/openai/codex/dev/jm/devicecheck-diagram-assets/pr-assets/devicecheck-attestation-interface.png) ## Details This PR teaches the Codex app-server runtime how to request and attach an attestation token. It does not generate DeviceCheck tokens directly; instead, it relies on the connected desktop app to advertise that it can generate attestation and then asks that app for a fresh header value when needed. The flow is: 1. The Codex desktop app connects to app-server. 2. During `initialize`, the app can advertise that it supports `requestAttestation`. 3. Before app-server calls selected ChatGPT Codex endpoints, it sends the internal server request `attestation/generate` to the app. 4. app-server receives a pre-encoded header value back. 5. app-server forwards that value as `x-oai-attestation` on the scoped outbound requests. The code in this repo is mostly protocol and runtime plumbing: it adds the app-server request/response shape, introduces an attestation provider in core, wires that provider into Responses / compaction / realtime setup paths, and covers the intended scoping with tests. The signed macOS DeviceCheck generation remains owned by the desktop app PR. ## Related PR - Codex desktop app implementation: https://github.com/openai/openai/pull/878649 ## Validation <details> <summary>Tests run</summary> ```sh cargo test -p codex-app-server-protocol cargo test -p codex-core attestation --lib cargo test -p codex-app-server --lib attestation ``` Also ran: ```sh just fix -p codex-core just fix -p codex-app-server just fix -p codex-app-server-protocol just fmt just write-app-server-schema ``` </details> <details> <summary>E2E DeviceCheck validation</summary> First validated the signed desktop app boundary directly: launched a packaged signed `Codex.app`, sent `attestation/generate`, decoded the returned `v1.` attestation header, and validated the extracted DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using bundle ID `com.openai.codex`. Apple returned `status_code: 200` and `is_ok: true`. Then ran the fuller app + app-server flow. The packaged `Codex.app` launched a current-branch app-server via `CODEX_CLI_PATH`, and a local MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server requested `attestation/generate` from the real Electron app process, and the intercepted `/backend-api/codex/responses` traffic included `x-oai-attestation` on both routes: ```text GET /backend-api/codex/responses Upgrade: websocket x-oai-attestation: present POST /backend-api/codex/responses Upgrade: none x-oai-attestation: present ``` The captured header decoded to a DeviceCheck token that also validated with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`, team `2DC432GLL2`). </details> --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-08 12:36:02 -07:00
Ahmed Ibrahim	7c0e54bf59	[codex] Generalize service tier slash commands (#21745 ) ## Why `/fast` was wired as a one-off slash command even though model metadata now exposes service tiers as catalog data. That meant adding another tier, such as a slower/cheaper tier, would require more hardcoded TUI plumbing instead of letting the model catalog drive the available commands. This change makes service-tier commands data-driven: each advertised `service_tiers` entry becomes a `/name` command using the catalog description, while the request path sends the tier `id` only when the selected model supports it. ## What Changed - Removed the hardcoded `/fast` slash-command variant and introduced dynamic service-tier command items in the composer and command popup. - Added toggle behavior for service-tier commands: invoking `/name` selects that tier, and invoking it again clears the selection. - Preserved the existing Fast-mode keybinding/status affordances by resolving the current model tier whose name is `fast`, while still sending the tier request value such as `priority`. - Persisted service-tier selections as raw request strings so non-fast tiers can round-trip through config. - Updated the Bedrock catalog entry to advertise fast support through `service_tiers` with `id: "priority"` and `name: "fast"`. - Added defensive filtering in core so unsupported selected service tiers are omitted from `/responses` requests. ## Validation - Added/updated coverage for dynamic service-tier slash command lookup, popup descriptions, composer dispatch, TUI fast toggling, and unsupported-tier omission in core request construction. - Local tests were not run per request. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-08 20:09:51 +03:00
jif-oai	bd8fc9adb9	api: send hyphenated session and thread headers (#21757 ) ## Why Some consumers expect conventional hyphenated HTTP headers. Codex already sends the session and thread IDs on outbound Responses requests, but it only uses the underscore spellings today, which makes those IDs harder to consume in systems that normalize or reject underscore header names. Full context here: https://openai.slack.com/archives/C08KCGLSPSQ/p1778248578422369 ## What changed - `build_session_headers` now emits both `session_id` and `session-id` when a session ID is present. - It does the same for `thread_id` and `thread-id`. - Added regression coverage in `codex-api/tests/clients.rs` and `core/tests/suite/client.rs` so both the lower-level client tests and the end-to-end request tests assert the two header spellings are present. ## Test plan - Added header assertions in `codex-api/tests/clients.rs`. - Added request-header assertions in `core/tests/suite/client.rs` for both the `/v1/responses` and `/api/codex/responses` request paths.	2026-05-08 17:11:19 +02:00
github-actions[bot]	aadcae9f3c	Update models.json (#19896 ) Automated update of models.json. --------- Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2026-05-08 17:41:55 +03:00
Ahmed Ibrahim	cce059467a	[codex] Enable apply_patch freeform by default (#21687 ) ## Summary - enable `apply_patch_freeform` by default in the feature registry ## Why - make the freeform `apply_patch` tool available by default when model metadata does not explicitly opt into another mode ## Validation - `just fmt` - did not run tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-08 13:15:00 +00:00
Ahmed Ibrahim	71d80f9a14	Omit service_tier from remote /responses/compact requests under API auth (#21676 ) ## Summary API-key-auth remote compaction requests should not inherit `service_tier` from normal `/responses` turns. This path needs to match API auth expectations, while ChatGPT-auth remote compaction should keep reusing the shared request fields that still apply there. This change keeps the decision inline in `codex-rs/core/src/compact_remote.rs` only. Under API key auth, the classic remote `/responses/compact` path now omits `service_tier`; under ChatGPT auth, it keeps reusing the configured tier. `codex-rs/core/src/compact_remote_v2.rs` is unchanged. The remote compaction parity coverage and snapshots were updated to assert the API-key omission and preserve the ChatGPT-auth behavior. ## Testing - Updated remote compaction parity coverage in `codex-rs/core/tests/suite/compact_remote.rs` and the corresponding snapshots.	2026-05-08 11:15:14 +03:00
pakrym-oai	dfa1e864a2	Send response.processed after remote compaction v2 (#21642 ) ## Why Remote compaction v2 consumes a normal Responses stream, but that compaction-specific stream consumer dropped the `response.completed` id. As a result, the `responses_websocket_response_processed` lifecycle notification was emitted for normal turn sampling but not after a v2 remote compaction response was fully processed. ## What changed - Return the completed response id alongside the v2 `context_compaction` output item. - After v2 compacted history is installed, send `response.processed` through the same websocket session when the feature is enabled. - Add websocket regression coverage for a remote compaction v2 request followed by `response.processed`. ## Verification - `cargo test -p codex-core --test all responses_websocket_sends_response_processed_after_remote_compaction_v2 -- --nocapture` - `cargo test -p codex-core collect_context_compaction_output_accepts_additional_output_items -- --nocapture`	2026-05-07 19:57:36 -07:00
starr-openai	07b695190f	Add CODEX_HOME environments TOML provider (#20666 ) ## Why After stdio transports and provider-owned defaults exist, Codex needs a config-backed provider that can describe more than the single legacy `CODEX_EXEC_SERVER_URL` remote. This PR adds that provider without activating it in product entrypoints yet, keeping parser/validation review separate from runtime wiring. Stack position: this is PR 4 of 5. It builds on PR 3's provider/default model and adds the `environments.toml` provider used by PR 5. ## What Changed - Add `environment_toml.rs` as the TOML-specific home for parsing, validation, and provider construction. - Keep the TOML schema/provider structs private; the public constructor added here is `EnvironmentManager::from_codex_home(...)`. - Add `TomlEnvironmentProvider`, including validation for: - reserved ids such as `local` and `none` - duplicate ids - unknown explicit defaults - empty programs or URLs - exactly one of `url` or `program` per configured environment - Support websocket environments with `url = "ws://..."` / `wss://...`. - Support stdio-command environments with `program = "..."`. - Add helpers to load `environments.toml` from `CODEX_HOME`, but do not wire entrypoints to call them yet. - Add the `toml` dependency for parsing. ## Stack - 1. https://github.com/openai/codex/pull/20663 - Add stdio exec-server listener - 2. https://github.com/openai/codex/pull/20664 - Add stdio exec-server client transport - 3. https://github.com/openai/codex/pull/20665 - Make environment providers own default selection - 4. This PR: https://github.com/openai/codex/pull/20666 - Add CODEX_HOME environments TOML provider - 5. https://github.com/openai/codex/pull/20667 - Load configured environments from CODEX_HOME Split from original draft: https://github.com/openai/codex/pull/20508 ## Validation Not run locally; this was split out of the original draft stack. ## Documentation This introduces the config shape for `environments.toml`; user-facing documentation should be added before this stack is treated as a documented public workflow. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-08 01:37:47 +00:00
starr-openai	1bfc3d9773	Route view_image through selected environments Route view_image through selected environments so image reads use the selected turn environment and cwd, with schema exposure limited to multi-environment toolsets.\n\nCo-authored-by: Codex <noreply@openai.com>	2026-05-08 01:29:03 +00:00
Charlie Marsh	54ef99a365	Disable empty Cargo test targets (#21584 ) ## Summary `cargo test` has entails both running standard Rust tests and doctests. It turns out that the doctest discovery is fairly slow, and it's a cost you pay even for crates that don't include any doctests. This PR disables doctests with `doctest = false` for crates that lack any doctests. For the collection of crates below, this speeds up test execution by >4x. E.g., before this PR: ``` Benchmark 1: cargo test -p codex-utils-absolute-path -p codex-utils-cache -p codex-utils-cli -p codex-utils-home-dir -p codex-utils-output-truncation -p codex-utils-path -p codex-utils-string -p codex-utils-template -p codex-utils-elapsed -p codex-utils-json-to-toml Time (mean ± σ): 1.849 s ± 4.455 s [User: 0.752 s, System: 1.367 s] Range (min … max): 0.418 s … 14.529 s 10 runs ``` And after: ``` Benchmark 1: cargo test -p codex-utils-absolute-path -p codex-utils-cache -p codex-utils-cli -p codex-utils-home-dir -p codex-utils-output-truncation -p codex-utils-path -p codex-utils-string -p codex-utils-template -p codex-utils-elapsed -p codex-utils-json-to-toml Time (mean ± σ): 428.6 ms ± 6.9 ms [User: 187.7 ms, System: 219.7 ms] Range (min … max): 418.0 ms … 436.8 ms 10 runs ``` For a single crate, with >2x speedup, before: ``` Benchmark 1: cargo test -p codex-utils-string Time (mean ± σ): 491.1 ms ± 9.0 ms [User: 229.8 ms, System: 234.9 ms] Range (min … max): 480.9 ms … 512.0 ms 10 runs ``` And after: ``` Benchmark 1: cargo test -p codex-utils-string Time (mean ± σ): 213.9 ms ± 4.3 ms [User: 112.8 ms, System: 84.0 ms] Range (min … max): 206.8 ms … 221.0 ms 13 runs ``` Co-authored-by: Codex <noreply@openai.com>	2026-05-07 15:44:17 -07:00

1 2 3 4 5 ...

1234 Commits