codex

mirror of https://github.com/openai/codex.git synced 2026-05-18 10:12:59 +00:00

Author	SHA1	Message	Date
pakrym-oai	ba159cbc79	Fix codex-core config test type paths (#19726 ) Summary: - Update config tests to reference config requirement types from codex_config after the loader split. Tests: - just fmt - cargo build -p codex-core --tests - cargo clippy -p codex-core --tests -- -D warnings	2026-04-26 15:58:17 -07:00
Michael Bolin	dda8199b73	permissions: migrate approval and sandbox consumers to profiles (#19393 ) ## Why Runtime decisions should not infer permissions from the lossy legacy sandbox projection once `PermissionProfile` is available. In particular, `Disabled` and `External` need to remain distinct, and managed profiles with split filesystem or deny-read rules should not be collapsed before approval, network, safety, or analytics code makes decisions. ## What Changed - Changes managed network proxy setup and network approval logic to use `PermissionProfile` when deciding whether a managed sandbox is active. - Migrates patch safety, Guardian/user-shell approval paths, Landlock helper setup, analytics sandbox classification, and selected turn/session code to profile-backed permissions. - Validates command-level profile overrides against the constrained `PermissionProfile` rather than a strict `SandboxPolicy` round trip. - Preserves configured deny-read restrictions when command profiles are narrowed. - Adds coverage for profile-backed trust, network proxy/approval behavior, patch safety, analytics classification, and command-profile narrowing. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393). * #19395 * #19394 * __->__ #19393	2026-04-26 15:30:40 -07:00
pakrym-oai	9c3abcd46c	[codex] Move config loading into codex-config (#19487 ) ## Why Config loading had become split across crates: `codex-config` owned the config types and merge logic, while `codex-core` still owned the loader that assembled the layer stack. This change consolidates that responsibility in `codex-config`, so the crate that defines config behavior also owns how configs are discovered and loaded. To make that move possible without reintroducing the old dependency cycle, the shell-environment policy types and helpers that `codex-exec-server` needs now live in `codex-protocol` instead of flowing through `codex-config`. This also makes the migrated loader tests more deterministic on machines that already have managed or system Codex config installed by letting tests override the system config and requirements paths instead of reading the host's `/etc/codex`. ## What Changed - moved the config loader implementation from `codex-core` into `codex-config::loader` and deleted the old `core::config_loader` module instead of leaving a compatibility shim - moved shell-environment policy types and helpers into `codex-protocol`, then updated `codex-exec-server` and other downstream crates to import them from their new home - updated downstream callers to use loader/config APIs from `codex-config` - added test-only loader overrides for system config and requirements paths so loader-focused tests do not depend on host-managed config state - cleaned up now-unused dependency entries and platform-specific cfgs that were surfaced by post-push CI ## Testing - `cargo test -p codex-config` - `cargo test -p codex-core config_loader_tests::` - `cargo test -p codex-protocol -p codex-exec-server -p codex-cloud-requirements -p codex-rmcp-client --lib` - `cargo test --lib -p codex-app-server-client -p codex-exec` - `cargo test --no-run --lib -p codex-app-server` - `cargo test -p codex-linux-sandbox --lib` - `cargo shear` - `just bazel-lock-check` ## Notes - I did not chase unrelated full-suite failures outside the migrated loader surface. - `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive failures on this machine, and Windows CI still shows unrelated long-running/timeouting test noise outside the loader migration itself.	2026-04-26 15:10:53 -07:00
Michael Bolin	deaa307fb2	permissions: derive compatibility policies from profiles (#19392 ) ## Why After #19391, `PermissionProfile` and the split filesystem/network policies could still be stored in parallel. That creates drift risk: a profile can preserve deny globs, external enforcement, or split filesystem entries while a cached projection silently loses those details. This PR makes the profile the runtime source and derives compatibility views from it. ## What Changed - Removes stored filesystem/network sandbox projections from `Permissions` and `SessionConfiguration`; their accessors now derive from the canonical `PermissionProfile`. - Derives legacy `SandboxPolicy` snapshots from profiles only where an older API still needs that field. - Updates MCP connection and elicitation state to track `PermissionProfile` instead of `SandboxPolicy` for auto-approval decisions. - Adds semantic filesystem-policy comparison so cwd changes can preserve richer profiles while still recognizing equivalent legacy projections independent of entry ordering. - Updates config/session tests to assert profile-derived projections instead of parallel stored fields. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392). * #19395 * #19394 * #19393 * __->__ #19392	2026-04-26 15:06:42 -07:00
Michael Bolin	4d7ce3447d	permissions: make runtime config profile-backed (#19606 ) ## Why This supersedes #19391. During stack repair, GitHub marked #19391 as merged into a temporary stack branch rather than into `main`, so the runtime-config change needed a fresh PR. `PermissionProfile` is now the canonical permissions shape after #19231 because it can distinguish `Managed`, `Disabled`, and `External` enforcement while also carrying filesystem rules that legacy `SandboxPolicy` cannot represent cleanly. Core config and session state still needed to accept profile-backed permissions without forcing every profile through the strict legacy bridge, which rejected valid runtime profiles such as direct write roots. The unrelated CI/test hardening that previously rode along with this PR has been split into #19683 so this PR stays focused on the permissions model migration. ## What Changed - Adds `Permissions.permission_profile` and `SessionConfiguration.permission_profile` as constrained runtime state, while keeping `sandbox_policy` as a legacy compatibility projection. - Introduces profile setters that keep `PermissionProfile`, split filesystem/network policies, and legacy `SandboxPolicy` projections synchronized. - Uses a compatibility projection for requirement checks and legacy consumers instead of rejecting profiles that cannot round-trip through `SandboxPolicy` exactly. - Updates config loading, config overrides, session updates, turn context plumbing, prompt permission text, sandbox tags, and exec request construction to carry profile-backed runtime permissions. - Preserves configured deny-read entries and `glob_scan_max_depth` when command/session profiles are narrowed. - Adds `PermissionProfile::read_only()` and `PermissionProfile::workspace_write()` presets that match legacy defaults. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606). * #19395 * #19394 * #19393 * #19392 * __->__ #19606	2026-04-26 13:29:54 -07:00
Michael Bolin	ac2bffa443	test: harden app-server integration tests (#19683 ) ## Why Windows Bazel runs in the permissions stack exposed that app-server integration tests were launching normal plugin startup warmups in every subprocess. Those warmups can call `https://chatgpt.com/backend-api/plugins/featured` when a test is not specifically exercising plugin startup, which adds slow background work, noisy stderr, and dependence on external network state. The relevant startup/featured-plugin behavior was introduced across #15042 and #15264. A few app-server tests also had long optional waits or unbounded cleanup paths, making failures expensive to diagnose and contributing to slow Windows shards. One external-agent config test from #18246 used a GitHub-style marketplace source, which was enough to exercise the pending remote-import path but also meant the background completion task could attempt a real clone. ## What Changed - Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks` plumbing and a hidden debug-only `--disable-plugin-startup-tasks-for-tests` app-server flag, so integration tests can suppress startup plugin warmups without adding a production env-var gate. - Has the app-server test harness pass that hidden flag by default, while opting plugin-startup coverage back in for tests that intentionally exercise startup sync and featured-plugin warmup behavior. - Lowers normal app-server subprocess logging from `info`/`debug` to `warn` to avoid multi-megabyte stderr output in Bazel logs. - Prevents the external-agent config test from attempting a real marketplace clone by using an invalid non-local source while still exercising the pending-import completion path. - Bounds optional filesystem/realtime waits and fake WebSocket test-server shutdown so failures produce targeted timeouts instead of hanging a shard. - Fixes the Unix script-resolution test in `rmcp-client` to exercise PATH resolution directly and include the actual spawn error in failures. ## Verification - `cargo check -p codex-app-server` - `cargo clippy -p codex-app-server --tests -- -D warnings` - `cargo test -p codex-rmcp-client program_resolver::tests::test_unix_executes_script_without_extension` - `cargo test -p codex-app-server --test all external_agent_config_import_sends_completion_notification_after_pending_plugins_finish -- --nocapture` - `cargo test -p codex-app-server --test all plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request -- --nocapture` - Windows Local Bazel passed with this test-hardening bundle before it was extracted from #19606. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683). * #19395 * #19394 * #19393 * #19392 * #19606 * __->__ #19683	2026-04-26 12:43:16 -07:00
Andrey Mishchenko	355c40ad7e	Support end_turn in response.completed (#19610 ) Some providers of Responses API forward a model-defined `end_turn` boolean indicating explicitly the model's indication of whether it would like to end the turn or to be inferenced again. In this PR, we update the sampling loop to use this field correctly if it's set. If the field is not set by the provider, we fall back to the existing sampling logic.	2026-04-25 21:57:42 -07:00
Felipe Coury	5591912f0b	fix(tui): reflow scrollback on terminal resize (#18575 ) Fixes multiple scrollback and terminal resize issues: #5538, #5576, #8352, #12223, #16165, and #15380. ## Why Codex writes finalized transcript output into terminal scrollback after wrapping it for the current viewport width. A later terminal resize could leave that scrollback shaped for the old width, so wider windows kept narrow output and narrower windows could show stale wrapping artifacts until enough new output replaced the visible area. This is also the foundation PR for responsive markdown tables. Table rendering needs finalized transcript content to be width-sensitive after insertion, not only while content is first streaming. Markdown table rendering itself stays in #18576. ## Stack - PR1: resize backlog reflow and interrupt cleanup - #18576: markdown table support ## What Changed - Rebuild source-backed transcript history when the terminal width changes. `terminal_resize_reflow` is introduced through the experimental feature system, but is enabled by default for this rollout so we can validate behavior across real terminals. - Preserve assistant and plan stream source so finalized streaming output can participate in resize reflow after consolidation. - Debounce resize work, but force a final source-backed reflow when a resize happened during active or unconsolidated streaming output. - Clear stale pending history lines on resize so old-width wrapped output is not emitted just before rebuilt scrollback. - Bound replay work with `[tui.terminal_resize_reflow].max_rows`: omitted uses terminal-specific defaults, `0` keeps all rendered rows, and a positive value sets an explicit cap. The cap applies both while initially replaying a resumed transcript into scrollback and when rebuilding scrollback after terminal resize. - Consolidate interrupted assistant streams before cleanup, then clear pending stream output and active-tail state consistently. - Move resize reflow and thread event buffering helpers out of `app.rs` into dedicated TUI modules. - Add focused coverage for resize reflow, feature-gated behavior, streaming source preservation, interrupted output cleanup, unicode-neutral text, terminal-specific row caps, and composer/layout stability. ## Runtime Bounds Resize reflow keeps only the most recent rendered rows when a row cap is active. The default is `auto`, which maps to the detected terminal's default scrollback size where Codex can identify it: VS Code `1000`, Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`. Terminals without a dedicated mapping use the conservative fallback of `1000` rows. Users can override this with `[tui.terminal_resize_reflow] max_rows = N`, or set `max_rows = 0` to disable row limiting. ## Validation - `just fmt` - `git diff --check` - `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow` - `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui transcript_reflow` - `just fix -p codex-tui` - PR CI in progress on the squashed branch	2026-04-25 22:00:32 -03:00
viyatb-oai	9aaa5d9358	[codex] Bypass managed network for escalated exec (#19595 ) ## Why `sandbox_permissions = "require_escalated"` is treated as an explicit request to approve the command and run it outside the filesystem/platform sandbox. Before this change, shell and unified exec still registered managed network approval context and could inject Codex-managed proxy state into the child process, which meant an approved escalated command could still hit a second network approval path. This PR makes that escalation boundary consistent: once a command is explicitly approved to run outside the sandbox, Codex does not also route that process through the managed network proxy. ## Security impact Command/filesystem sandbox approval now implies network approval for that command. If an untrusted command or script is allowed to run with `require_escalated`, its network calls are unsandboxed: Codex-managed network allowlists and denylists are not respected for that process, so the command can exfiltrate any data it can read. ## What changed - Skip managed network approval specs for `SandboxPermissions::RequireEscalated`. - Pass `network: None` into shell, zsh-fork shell, and unified exec sandbox preparation for explicitly escalated requests. - Strip Codex-managed proxy environment variables when `CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env when the Codex marker is absent. - Add regression coverage for the prepared exec request so the old behavior cannot silently reappear. ## Verification - `cargo test -p codex-core explicit_escalation` - `cargo clippy -p codex-core --all-targets -- -D warnings`	2026-04-25 23:23:58 +00:00
Dylan Hurd	f5497f4d65	Split approval matrix test groups (#19454 ) ## Why Recent `main` CI repeatedly timed out in: - `codex-core::all suite::approvals::approval_matrix_covers_all_modes` It failed in runs [24909500958](https://github.com/openai/codex/actions/runs/24909500958), [24908076251](https://github.com/openai/codex/actions/runs/24908076251), [24906197645](https://github.com/openai/codex/actions/runs/24906197645), [24905823212](https://github.com/openai/codex/actions/runs/24905823212), [24903439629](https://github.com/openai/codex/actions/runs/24903439629), [24903336028](https://github.com/openai/codex/actions/runs/24903336028), and [24898949647](https://github.com/openai/codex/actions/runs/24898949647). The failure pattern was a 60s Linux remote timeout. Logs showed many approval scenarios completing before the single matrix test timed out. ## Root Cause `approval_matrix_covers_all_modes` packed every approval/sandbox/tool scenario into one test case. That made the test vulnerable to normal CI variance: one slow scenario or a slow process startup could push the whole monolithic case past the 60s per-test timeout. It also hid which part of the matrix was slow because the runner only reported the one large matrix test. ## What Changed - Keep the shared `scenarios()` table as the single source of approval matrix coverage. - Use one `#[test_case]` per `ScenarioGroup` to generate five async Tokio tests: danger/full-access, read-only, workspace-write, apply-patch, and unified-exec. - Keep the group runner small and add per-scenario error context so a failure still reports the specific scenario name. ## Why This Should Be Reliable Each scenario group now has its own test harness timeout instead of sharing one timeout window with the full matrix. That removes the long sequential loop from a single test while keeping the implementation compact and easy to scan. The tests still run through the same scenario definitions and runner, so this preserves coverage. `test-case` already composes with `#[tokio::test]` in this crate and is already available for test code. ## Verification - `cargo test -p codex-core --test all approval_matrix_ -- --list` - `cargo test -p codex-core --test all approval_matrix_`	2026-04-24 21:38:27 -07:00
Eric Traut	4167628622	Add goal core runtime (4 / 5) (#18076 ) Adds the core runtime behavior for active goals on top of the model tools from PR 3. ## Why A long-running goal should be a core runtime concern, not something every client has to implement. Core owns the turn lifecycle, tool completion boundaries, interruptions, resume behavior, and token usage, so it is the right place to account progress, enforce budgets, and decide when to continue work. ## What changed - Centralized goal lifecycle side effects behind `Session::goal_runtime_apply(GoalRuntimeEvent::...)`. - Starts goal continuation turns only when the session is idle; pending user input and mailbox work take priority. - Accounts token and wall-clock usage at turn, tool, mutation, interrupt, and resume boundaries; `get_thread_goal` remains read-only. - Preserves sub-second wall-clock remainder across accounting boundaries so long-running goals do not drift downward over time. - Treats token budget exhaustion as a soft stop by marking the goal `budget_limited` and injecting wrap-up steering instead of aborting the active turn. - Suppresses budget steering when `update_goal` marks a goal complete. - Pauses active goals on interrupt and auto-reactivates paused goals when a thread resumes outside plan mode. - Suppresses repeated automatic continuation when a continuation turn makes no tool calls. - Added continuation and budget-limit prompt templates. ## Verification - Added focused core coverage for continuation scheduling, accounting boundaries, budget-limit steering, completion accounting, interrupt pause behavior, resume auto-activation, and wall-clock remainder accounting.	2026-04-24 21:16:00 -07:00
Eric Traut	32ace07ac5	Add goal model tools (3 / 5) (#18075 ) Adds the model-facing goal tools on top of the app-server API from PR 2. ## Why Once goals are persisted and exposed to clients, the model needs a small, constrained tool surface for goal workflows. The tool contract should let the model inspect goals, create them only when explicitly requested, and mark them complete without giving it broad control over user/runtime-owned state. ## What changed - Added `get_goal`, `create_goal`, and `update_goal` tool specs behind the `goals` feature flag. - Added core goal tool handlers that validate objectives and token budgets before mutating persisted state. - Constrained `create_goal` to create only when no goal exists, with optional `token_budget` only when a budget is explicitly provided. - Tightened the `create_goal` instructions so the model does not infer goals from ordinary task requests. - Constrained `update_goal` to expose only goal completion; pause, resume, clear, and budget-limited transitions remain user- or runtime-controlled. - Registered the goal tools in the tool registry and kept them out of review contexts where they should not appear. ## Verification - Added tool-registry coverage for feature gating and tool availability. - Added core session tests for create/get/update behavior, duplicate goal rejection, budget validation, and completion-only updates.	2026-04-24 20:54:40 -07:00
Eric Traut	6c874f9b34	Add goal app-server API (2 / 5) (#18074 ) Adds the app-server v2 goal API on top of the persisted goal state from PR 1. ## Why Clients need a stable app-server surface for reading and controlling materialized thread goals before the model tools and TUI can use them. Goal changes also need to be observable by app-server clients, including clients that resume an existing thread. ## What changed - Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear` RPCs for materialized threads. - Added `thread/goal/updated` and `thread/goal/cleared` notifications so clients can keep local goal state in sync. - Added resume/snapshot wiring so reconnecting clients see the current goal state for a thread. - Added app-server handlers that reconcile persisted rollout state before direct goal mutations. - Updated the app-server README plus generated JSON and TypeScript schema fixtures for the new API surface. ## Verification - Added app-server v2 coverage for goal get/set/clear behavior, notification emission, resume snapshots, and non-local thread-store interactions.	2026-04-24 20:53:41 -07:00
Eric Traut	0ee737cea6	Add goal persistence foundation (1 / 5) (#18073 ) Adds the persisted goal foundation for the rest of the stack. This PR is intentionally limited to feature flag and state-layer behavior; app-server APIs, model tools, runtime continuation, and TUI UX are layered in later PRs. ## Why Goal mode needs durable thread-level state before clients or model tools can safely build on it. The state layer needs to know whether a goal exists, what objective it tracks, whether it is active, paused, budget-limited, or complete, and how much time/token usage has already been accounted. ## What changed - Added the `goals` feature flag and generated config schema entry. - Added the `thread_goals` state table and Rust model for persisted thread goals. - Added state runtime APIs for creating, replacing, updating, deleting, and accounting goal usage. - Added `goal_id`-based stale update protection so an old goal update cannot overwrite a replacement. - Kept this PR scoped to persistence and state runtime behavior, with no app-server, model-facing, continuation, or TUI behavior yet. ## Verification - Added state runtime coverage for goal creation, replacement, stale update protection, status transitions, token-budget behavior, and usage accounting.	2026-04-24 20:51:38 -07:00
Curtis 'Fjord' Hawthorne	8a559e7938	Remove js_repl feature (#19410 )	2026-04-24 17:49:29 -07:00
Michael Bolin	789f387982	permissions: remove legacy read-only access modes (#19449 ) ## Why `ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`: `FullAccess` meant the historical read-only/workspace-write modes could read the full filesystem, while `Restricted` tried to carry partial readable roots. The partial-read model now belongs in `FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on `SandboxPolicy` makes every legacy projection reintroduce lossy read-root bookkeeping and creates unnecessary noise in the rest of the permissions migration. This PR makes the legacy policy model narrower and explicit: `SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent the old full-read sandbox modes only. Split readable roots, deny-read globs, and platform-default/minimal read behavior stay in the runtime permissions model. ## What changed - Removes `ReadOnlyAccess` from `codex_protocol::protocol::SandboxPolicy`, including the generated `access` and `readOnlyAccess` API fields. - Updates legacy policy/profile conversions so restricted filesystem reads are represented only by `FileSystemSandboxPolicy` / `PermissionProfile` entries. - Keeps app-server v2 compatible with legacy `fullAccess` read-access payloads by accepting and ignoring that no-op shape, while rejecting legacy `restricted` read-access payloads instead of silently widening them to full-read legacy policies. - Carries Windows sandbox platform-default read behavior with an explicit override flag instead of depending on `ReadOnlyAccess::Restricted`. - Refreshes generated app-server schema/types and updates tests/docs for the simplified legacy policy shape. ## Verification - `cargo check -p codex-app-server-protocol --tests` - `cargo check -p codex-windows-sandbox --tests` - `cargo test -p codex-app-server-protocol sandbox_policy_` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449). * #19395 * #19394 * #19393 * #19392 * #19391 * __->__ #19449	2026-04-24 17:16:58 -07:00
rreichel3-oai	219c65dc2f	[codex] Forward Codex Apps tool call IDs to backend metadata (#19207 ) ## Summary - include the outer tool `call_id` in Codex Apps MCP request metadata under `_meta._codex_apps.call_id` - preserve existing Codex Apps metadata like `resource_uri` and `contains_mcp_source` - add request metadata coverage for both the existing-metadata and no-existing-metadata cases ## Why The paired backend change in [openai/openai#850796](https://github.com/openai/openai/pull/850796) updates MCP compliance logging to prefer `_meta._codex_apps.call_id` instead of the JSON-RPC request id. This client change sends that outer tool call id so the backend can record the model/tool call identifier when it is available. This is wire-compatible with older backends because `_meta._codex_apps` is already reserved backend-only metadata. Backends that do not read `call_id` will ignore the extra field. ## Testing - `cargo test -p codex-core request_meta` - `just fmt` - `just fix -p codex-core`	2026-04-24 18:49:34 -04:00
xl-openai	1e560f33e1	feat: Compress skill paths with root aliases (#19098 ) Add skill root tracking so model-visible skill lists can use short path aliases when absolute paths would exceed the metadata budget.	2026-04-24 15:49:07 -07:00
Tom	588f7a9fc4	[codex] add non-local thread store regression harness (#19266 ) - Add an integration test that guarantees nothing gets written to codex home dir or sqlite when running a rollout with a non-local ThreadStore - Add an in-memory "spy" ThreadStore for tests like this Note I could not find a good way to also ensure there were no filesystem _reads_ that didn't go through threadstore. I explored a more elaborate sandboxed-subprocess approach but it isn't platform portable and felt like it wasn't (yet) worth it.	2026-04-24 15:45:44 -07:00
Ahmed Ibrahim	6de6eaa0c1	[4/4] Honor Streamable HTTP MCP placement (#18584 )	2026-04-24 15:03:55 -07:00
Tom	0a9b559c0b	Migrate fork and resume reads to thread store (#18900 ) - Route cold thread/resume and thread/fork source loading through ThreadStore reads instead of direct rollout path operations - Keep lookups that explicitly specify a rollout-path using the local thread store methods but return an invalid-request error for remote ThreadStore configurations - Add some additional unit tests for code path coverage	2026-04-24 13:51:37 -07:00
Michael Bolin	13e0ec1614	permissions: make legacy profile conversion cwd-free (#19414 ) ## Why The profile conversion path still required a `cwd` even when it was only translating a legacy `SandboxPolicy` into a `PermissionProfile`. That made profile producers invent an ambient `cwd`, which is exactly the anchoring we are trying to remove from permission-profile data. A legacy workspace-write policy can be represented symbolically instead: `:cwd = write` plus read-only `:project_roots` metadata subpaths. This PR creates that cwd-free base so the rest of the stack can stop threading cwd through profile construction. Callers that actually need a concrete runtime filesystem policy for a specific cwd still have an explicitly named cwd-bound conversion. ## What Changed - `PermissionProfile::from_legacy_sandbox_policy` now takes only `&SandboxPolicy`. - `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the symbolic, cwd-free projection for profiles. - The old concrete projection is retained as `FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for runtime/boundary code that must materialize legacy cwd behavior. - Workspace-write profiles preserve `CurrentWorkingDirectory` and `ProjectRoots` special entries instead of materializing cwd into absolute paths. ## Verification - `cargo check -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p codex-sandboxing -p codex-linux-sandbox -p codex-analytics --tests` - `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p codex-sandboxing -p codex-linux-sandbox -p codex-analytics` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414). * #19395 * #19394 * #19393 * #19392 * #19391 * __->__ #19414	2026-04-24 13:42:05 -07:00
jif-oai	f802f0a391	chore: drop MCP Plugins and App from Morpheus (#19380 ) Quick fix of https://github.com/openai/codex/issues/18333	2026-04-24 17:57:48 +02:00
jif-oai	28742866c7	Add agents.interrupt_message for interruption markers (#19351 ) ## Why Agent interruptions currently always persist a model-visible interrupted-turn marker before emitting `TurnAborted`. That marker is useful by default because it gives the next model turn context about a deliberately interrupted task, but some deployments need to suppress that history injection entirely while still keeping the client-visible interruption event. ## What changed - Add `[agents] interrupt_message = false` to disable the model-visible interrupted-turn marker. - Resolve the setting into `Config::agent_interrupt_message_enabled`, defaulting to `true` so existing behavior is unchanged. - Apply the setting to both live interrupted turns and interrupted fork snapshots. - Keep emitting `TurnAborted` even when the history marker is disabled. - Regenerate `core/config.schema.json` for the new `agents.interrupt_message` field. ## Testing - `cargo test -p codex-core load_config_resolves_agent_interrupt_message -- --nocapture` - `cargo test -p codex-core disabled_interrupted_fork_snapshot_appends_only_interrupt_event -- --nocapture` - `cargo test -p codex-core multi_agent_v2_interrupted_marker_uses_developer_input_message -- --nocapture` - `cargo test -p codex-core multi_agent_v2_followup_task_can_disable_interrupted_marker -- --nocapture` - `cargo test -p codex-core multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message -- --nocapture` - `cargo check -p codex-core`	2026-04-24 16:02:45 +02:00
jif-oai	deb4509302	feat: surface multi-agent thread limit in spawn description (#19360 ) ## Summary - Thread `agent_max_threads` into `ToolsConfig` and `SpawnAgentToolOptions`. - Render the configured `max_concurrent_threads_per_session` value in the MultiAgentV2 `spawn_agent` description. - Cover the description text in `codex-tools` unit tests and `codex-core` tool spec tests. ## Validation - `just fmt` - `cargo test -p codex-tools` - `cargo test -p codex-core spawn_agent_description` - `git diff --check` ## Notes - `cargo test -p codex-core` was also attempted, but unrelated environment-sensitive tests failed with the active local environment. Examples: approvals reviewer defaults observed `AutoReview` instead of `User`, request-permissions event tests did not emit events, and proxy-env tests saw `http://127.0.0.1:50604` from the active proxy environment. Co-authored-by: Codex <noreply@openai.com>	2026-04-24 15:13:54 +02:00
jif-oai	120aa07d81	Make MultiAgentV2 interruption markers assistant-authored (#19124 ) ## Why `MultiAgentV2` follow-up messages are delivered to agents as assistant-authored `InterAgentCommunication` envelopes. When `followup_task` used `interrupt: true`, the interrupted-turn guidance was still persisted as a contextual user message, so model-visible history made a system-generated interruption boundary look user-authored. This keeps interruption guidance consistent with the rest of the v2 inter-agent message stream while preserving the legacy marker shape for non-v2 sessions. ## What changed - Make `interrupted_turn_history_marker` feature-aware. - Record the interrupted-turn marker as an assistant `OutputText` message when `Feature::MultiAgentV2` is enabled. - Keep the existing user contextual fragment for non-v2 sessions. - Apply the same feature-aware marker to interrupted fork snapshots. - Add coverage for the live `followup_task` interrupt path and the helper-level v2 marker shape. ## Testing - `cargo test -p codex-core multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message -- --nocapture` - `cargo test -p codex-core multi_agent_v2_interrupted_marker_uses_assistant_output_message -- --nocapture` - `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`	2026-04-24 13:39:26 +02:00
sayan-oai	c10f95ddac	Update models.json and related fixtures (#19323 ) Supersedes #18735. The scheduled rust-release-prepare workflow force-pushed `bot/update-models-json` back to the generated models.json-only diff, which dropped the test and snapshot updates needed for CI. This PR keeps the latest generated `models.json` from #18735 and adds the corresponding fixture updates: - preserve model availability NUX in the app-server model cache fixture - update core/TUI expectations for the new `gpt-5.4` `xhigh` default reasoning - refresh affected TUI chatwidget snapshots for the `gpt-5.5` default/model copy changes Validation run locally while preparing the fix: - `just fmt` - `cargo test -p codex-app-server model_list` - `cargo test -p codex-core includes_no_effort_in_request` - `cargo test -p codex-core includes_default_reasoning_effort_in_request_when_defined_by_model_info` - `cargo test -p codex-tui --lib chatwidget::tests` - `cargo insta pending-snapshots` --------- Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>	2026-04-24 11:14:13 +02:00
Eric Traut	6f87eb0479	Hide unsupported MCP bearer_token from config schema (#19294 ) ## Summary Fixes #19275. Codex runtime rejects inline MCP `bearer_token` config entries and asks users to configure `bearer_token_env_var` instead, but the generated config schema still advertised `mcp_servers.<name>.bearer_token` as a supported field. That made editor/schema validation disagree with runtime validation. This keeps `bearer_token` in `RawMcpServerConfig` so Codex can continue producing the targeted runtime error for recent or existing configs, but skips the field during schemars generation. The checked-in `core/config.schema.json` fixture now exposes `bearer_token_env_var` without exposing unsupported inline `bearer_token`. ## Verification - Added `config_schema_hides_unsupported_inline_mcp_bearer_token` to assert the generated schema hides `bearer_token` while preserving `bearer_token_env_var`. - Ran `cargo test -p codex-config`. - Ran `cargo test -p codex-core config_schema`.	2026-04-24 00:17:43 -07:00
sayan-oai	e083b6c757	chore: apply truncation policy to unified_exec (#19247 ) we were not respecting turn's `truncation_policy` to clamp output tokens for `unified_exec` and `write_stdin`. this meant truncation was only being applied by `ContextManager` before the output was stored in-memory (so it _was_ being truncated from model-visible context), but the full output was persisted to rollout on disk. now we respect that `truncation_policy` and `ContextManager`-level truncation remains a backup. ### Tests added tests, tested locally.	2026-04-24 00:17:39 -07:00
Eric Traut	ac8c9fc49c	Reject unsupported js_repl image MIME types (#19292 ) ## Summary `codex.emitImage` accepted arbitrary image MIME types for byte payloads and data URLs. That allowed a value like `image/rgba` to be wrapped as an `input_image`, even though it is not a supported encoded image format, so the invalid image could reach the model-input path and trigger output sanitization. This results in a panic in debug builds because the output sanitization is meant as a final safety net, not a primary means of rejecting invalid image types. I've hit this case multiple times when executing certain long-running tasks. This PR rejects unsupported image MIME types before they are emitted from `js_repl`. ## Changes - Validate `codex.emitImage({ bytes, mimeType })` in the JS kernel so only encoded PNG, JPEG, WebP, or GIF payloads are accepted. - Apply the same MIME allowlist to direct image data URLs, including the Rust host-side validation path. - Clarify the JS REPL instructions so agents know byte payloads must already be encoded as PNG/JPEG/WebP/GIF.	2026-04-24 00:14:51 -07:00
Eric Traut	d87d918716	Resolve relative agent role config paths from layers (#19261 ) Fixes #19257. ## Summary Agent roles declared in config layers can set `config_file` to a relative path, but deserializing the layer-local `[agents.]` table happened without an `AbsolutePathBuf` base path. That caused configs like `config_file = "agents/my-role.toml"` to fail with `AbsolutePathBuf deserialized without a base path`. This updates agent role layer loading to deserialize `[agents.]` while the layer config folder is active as the path base, matching the behavior documented for `AgentRoleToml.config_file`. It also adds coverage for a user config layer with a relative agent role `config_file`.	2026-04-23 23:23:11 -07:00
Michael Bolin	4816b89204	permissions: make profiles represent enforcement (#19231 ) ## Why `PermissionProfile` is becoming the canonical permissions abstraction, but the old shape only carried optional filesystem and network fields. It could describe allowed access, but not who is responsible for enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy when profiles were exported, cached, or round-tripped through app-server APIs. The important model change is that active permissions are now a disjoint union over the enforcement mode. Conceptually: ```rust pub enum PermissionProfile { Managed { file_system: FileSystemSandboxPolicy, network: NetworkSandboxPolicy, }, Disabled, External { network: NetworkSandboxPolicy, }, } ``` This distinction matters because `Disabled` means Codex should apply no outer sandbox at all, while `External` means filesystem isolation is owned by an outside caller. Those are not equivalent to a broad managed sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an inner sandbox may require the outer Codex layer to use no sandbox rather than a permissive one. ## How Existing Modeling Maps Legacy `SandboxPolicy` remains a boundary projection, but it now maps into the higher-fidelity profile model: - `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed` with restricted filesystem entries plus the corresponding network policy. - `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving the “no outer sandbox” intent instead of treating it as a lax managed sandbox. - `ExternalSandbox { network_access }` maps to `PermissionProfile::External { network }`, preserving external filesystem enforcement while still carrying the active network policy. - Split runtime policies that legacy `SandboxPolicy` cannot faithfully express, such as managed unrestricted filesystem plus restricted network, stay `Managed` instead of being collapsed into `ExternalSandbox`. - Per-command/session/turn grants remain partial overlays via `AdditionalPermissionProfile`; full `PermissionProfile` is reserved for complete active runtime permissions. ## What Changed - Change active `PermissionProfile` into a tagged union: `managed`, `disabled`, and `external`. - Keep partial permission grants separate with `AdditionalPermissionProfile` for command/session/turn overlays. - Represent managed filesystem permissions as either `restricted` entries or `unrestricted`; `glob_scan_max_depth` is non-zero when present. - Preserve old rollout compatibility by accepting the pre-tagged `{ network, file_system }` profile shape during deserialization. - Preserve fidelity for important edge cases: `DangerFullAccess` round-trips as `disabled`, `ExternalSandbox` round-trips as `external`, and managed unrestricted filesystem + restricted network stays managed instead of being mistaken for external enforcement. - Preserve configured deny-read entries and bounded glob scan depth when full profiles are projected back into runtime policies, including unrestricted replacements that now become `:root = write` plus deny entries. - Regenerate the experimental app-server v2 JSON/TypeScript schema and update the `command/exec` README example for the tagged `permissionProfile` shape. ## Compatibility Legacy `SandboxPolicy` remains available at config/API boundaries as the compatibility projection. Existing rollout lines with the old `PermissionProfile` shape continue to load. The app-server `permissionProfile` field is experimental, so its v2 wire shape is intentionally updated to match the higher-fidelity model. ## Verification - `just write-app-server-schema` - `cargo check --tests` - `cargo test -p codex-protocol permission_profile` - `cargo test -p codex-protocol preserving_deny_entries_keeps_unrestricted_policy_enforceable` - `cargo test -p codex-app-server-protocol permission_profile_file_system_permissions` - `cargo test -p codex-app-server-protocol serialize_client_response` - `cargo test -p codex-core session_configured_reports_permission_profile_for_external_sandbox` - `just fix` - `just fix -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-core` - `just fix -p codex-app-server`	2026-04-23 23:02:18 -07:00
Celia Chen	e8d8080818	feat: let model providers own model discovery (#18950 ) ## Why `codex-models-manager` had grown to own provider-specific concerns: constructing OpenAI-compatible `/models` requests, resolving provider auth, emitting request telemetry, and deciding how provider catalogs should be sourced. That made the manager harder to reuse for providers whose model catalog is not fetched from the OpenAI `/models` endpoint, such as Amazon Bedrock. This change moves provider-specific model discovery behind provider-owned implementations, so the models manager can focus on refresh policy, cache behavior, picker ordering, and model metadata merging. ## What Changed - Introduced a `ModelsManager` trait with separate `OpenAiModelsManager` and `StaticModelsManager` implementations. - Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives outside `codex-models-manager`. - Moved `/models` request construction, provider auth resolution, timeout handling, and request telemetry into `codex-model-provider` via `OpenAiModelsEndpoint`. - Added provider-owned `models_manager(...)` construction so configured OpenAI-compatible providers use `OpenAiModelsManager`, while static/catalog-backed providers can return `StaticModelsManager`. - Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock model IDs. - Updated core/session/thread manager code and tests to depend on `Arc<dyn ModelsManager>`. - Moved offline model test helpers into `codex_models_manager::test_support`. ## Metadata References The Bedrock catalog metadata is based on the official Amazon Bedrock OpenAI model documentation: - [Amazon Bedrock OpenAI models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html) lists the Bedrock model IDs, text input/output modalities, and `128,000` token context window for `gpt-oss-20b` and `gpt-oss-120b`. - [Amazon Bedrock `gpt-oss-120b` model card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html) lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the `bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities, and `128K` context window. - [OpenAI `gpt-oss-120b` model docs](https://developers.openai.com/api/docs/models/gpt-oss-120b) document configurable reasoning effort with `low`, `medium`, and `high`, plus text input/output modality. The display names, default reasoning effort, and priority ordering are Codex-local catalog choices. ## Test Plan - Manually verified app-server model listing with an AWS profile: ```shell CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \ --codex-bin ./target/debug/codex \ -c 'model_provider="amazon-bedrock"' \ -c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \ -c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \ model-list ``` The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0` as the default model and `openai.gpt-oss-20b-1:0` as the second listed model, both text-only and supporting low/medium/high reasoning effort.	2026-04-24 04:28:25 +00:00
xl-openai	53be451673	feat: Use short SHA versions for curated plugin cache entries (#19095 ) Curated plugin cache entries now use an 8-character SHA prefix, instead of the full SHA, as the cache folder version number.	2026-04-23 21:15:03 -07:00
starr-openai	49fb25997f	Add sticky environment API and thread state (#18897 ) ## Summary - add sticky environment selections to app-server v2 thread/start and turn/start request flow - carry thread-level selections through core session/thread state - add app-server coverage for sticky selections and turn overrides ## Stack 1. This PR: API and thread persistence 2. #18898: config.toml named environment loading 3. #18899: downstream tool/runtime consumers ## Validation - Not run locally; split only. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-23 18:57:13 -07:00
cassirer-openai	e3c8720a99	[rollout_trace] Add debug trace reduction command (#18880 ) ## Summary Adds the debug CLI entry point for reducing recorded rollout traces. This gives developers a direct way to inspect whether the emitted trace stream reduces into the expected conversation/runtime model. ## Stack This is PR 5/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This PR is intentionally last: it depends on the trace crate, core recorder, runtime/tool events, and session/agent edge data all existing. The command should remain a debug/developer tool and avoid adding new runtime behavior. The useful review question is whether the CLI exposes the reducer in the smallest practical way for local inspection without turning the debug command into a supported user-facing workflow.	2026-04-24 01:56:48 +00:00
efrazer-oai	5882f3f95e	refactor: route Codex auth through AuthProvider (#18811 ) ## Summary This PR moves Codex backend request authentication from direct bearer-token handling to `AuthProvider`. The new `codex-auth-provider` crate defines the shared request-auth trait. `CodexAuth::provider()` returns a provider that can apply all headers needed for the selected auth mode. This lets ChatGPT token auth and AgentIdentity auth share the same callsite path: - ChatGPT token auth applies bearer auth plus account/FedRAMP headers where needed. - AgentIdentity auth applies AgentAssertion plus account/FedRAMP headers where needed. Reference old stack: https://github.com/openai/codex/pull/17387/changes ## Callsite Migration \| Area \| Change \| \| --- \| --- \| \| backend-client \| accepts an `AuthProvider` instead of a raw token/header \| \| chatgpt client/connectors \| applies auth through `CodexAuth::provider()` \| \| cloud tasks \| keeps Codex-backend gating, applies auth through provider \| \| cloud requirements \| uses Codex-backend auth checks and provider headers \| \| app-server remote control \| applies provider headers for backend calls \| \| MCP Apps/connectors \| gates on `uses_codex_backend()` and keys caches from generic account getters \| \| model refresh \| treats AgentIdentity as Codex-backend auth \| \| OpenAI file upload path \| rejects non-Codex-backend auth before applying headers \| \| core client setup \| keeps model-provider auth flow and allows AgentIdentity through provider-backed OpenAI auth \| ## Stack 1. https://github.com/openai/codex/pull/18757: full revert 2. https://github.com/openai/codex/pull/18871: isolated Agent Identity crate 3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity auth mode and startup task allocation 4. This PR: migrate Codex backend auth callsites through AuthProvider 5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs and load `CODEX_AGENT_IDENTITY` ## Testing Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.	2026-04-23 17:14:02 -07:00
Michael Bolin	040976b218	tests: isolate approval fixtures from host rules (#18288 ) ## Why Several approval-focused tests were unintentionally sensitive to host-level rule files. On machines with broader allowed command prefixes, commonly allowed commands such as `/bin/date` could bypass the approval path these tests were meant to exercise, making the fixtures depend on the developer or CI host configuration. ## What changed - Pins the approval matrix fixture to the explicit user reviewer so it does not inherit a host reviewer. - Changes OTel approval fixtures to request `/usr/bin/touch codex-otel-approval-test`, avoiding a command that may be pre-approved by local rules. - Clears the config layer stack for the permissions-message assertion that needs to compare only the permissions text under test. ## Verification - `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test all approval_matrix_covers_all_modes -- --nocapture` - `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test all permissions_messages -- --nocapture`	2026-04-23 14:12:09 -07:00
Eric Traut	a50cb205b7	Stabilize plugin MCP tools test (#19191 ) ## Summary The plugin MCP tool-listing test could hide MCP startup failures by polling `ListMcpTools` until its own 30s deadline. If the plugin MCP server startup had already failed or timed out, the session-owned MCP manager would keep returning an empty tool list, so CI only reported `discovered tools: []` instead of the startup state that mattered. This makes the test synchronize on `McpStartupComplete` for the sample plugin MCP server before asserting listed tools, and gives the Bazel-launched test server a larger startup window. ## Notes Confidence is about 80%. The source path strongly supports the RCA: a failed MCP startup is represented as an empty tool list through `ListMcpTools`, so the old polling contract could not distinguish "not ready yet" from "startup already failed." I could not retrieve the CI execution-log artifact to confirm the exact hidden startup error, but the observed Ubuntu Bazel failure matches this path: repeated `ListMcpTools` responses with no tools until the test-local timeout fired. I think this is the right solution because it keeps plugin behavior unchanged and fixes only the test contract. Future startup failures should now report the `McpStartupComplete` failure/cancellation instead of timing out on an empty tool snapshot. This test was introduced in https://github.com/openai/codex/pull/12864.	2026-04-23 14:08:40 -07:00
Eric Traut	3f8c06e457	Fix /review interrupt and TUI exit wedges (#18921 ) Addresses #11267 ## Summary `/review` can be interrupted while it is still spawning the review sub-agent. That spawn path lives in `codex-core` and did not observe the task cancellation token until after `Codex::spawn` returned, so an interrupted review could keep building a child session and leave the TUI in a wedged state. The TUI exit path also waited indefinitely for app-server `thread/unsubscribe`, which made Ctrl+C look broken if the app-server was already stuck. This makes interactive delegate startup cancellation-aware and bounds the TUI shutdown-first unsubscribe wait with a short UI escape-hatch timeout. ## Testing I reproed the hang using the steps in the bug report. Confirmed hang no longer exists after fix.	2026-04-23 13:28:12 -07:00
Michael Bolin	9c0eced391	shell-escalation: carry resolved permission profiles (#18287 ) ## Why Shell escalation still has adapter code that expects a legacy sandbox policy, but command approvals should carry the resolved `PermissionProfile` so callers can reason about the granted permissions canonically. ## What changed This introduces profile-shaped resolved escalation permissions while retaining the derived legacy sandbox policy for the Unix escalation adapter. It updates approval types, the escalation server protocol, and tests that inspect escalated command permissions. ## Verification - `cargo test -p codex-core --test all handle_container_exec_ -- --nocapture` - `cargo test -p codex-core --test all handle_sandbox_ -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18287). * #18288 * __->__ #18287	2026-04-23 12:46:19 -07:00
cassirer-openai	6d09b6752d	[rollout_trace] Trace tool and code-mode boundaries (#18878 ) ## Summary Extends rollout tracing across tool dispatch and code-mode runtime boundaries. This records canonical tool-call lifecycle events and links code-mode execution/wait operations back to the model-visible calls that caused them. ## Stack This is PR 3/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This PR is about attribution. Reviewers should focus on whether direct tool calls, code-mode-originated tool calls, waits, outputs, and cancellation boundaries are recorded with enough source information for deterministic reduction without coupling the reducer to live runtime internals. The stack remains valid after this layer: tool and code-mode traces reduce through the existing crate model, while the broader session and multi-agent relationships are added in the next PR.	2026-04-23 12:22:11 -07:00
Michael Bolin	ff22982d75	mcp: include permission profiles in sandbox state (#18286 ) ## Why MCP tool calls can receive a serialized `SandboxState` when a server declares the sandbox-state capability. That state is one of the places MCP runtimes learn what permissions Codex is operating under. As the permissions migration makes `PermissionProfile` the canonical representation, MCP consumers should be able to read that profile directly instead of reconstructing permissions from the legacy `SandboxPolicy`. ## What changed - Adds optional `permissionProfile` to `codex_mcp::SandboxState`, while keeping `sandboxPolicy` for existing MCP consumers. - Populates `permissionProfile` from the current `TurnContext` when serializing sandbox state for MCP tool calls. ## Verification - Current GitHub Actions for this PR are passing. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18286). * #18288 * #18287 * __->__ #18286	2026-04-23 12:21:26 -07:00
Michael Bolin	f90cc0ee64	tui: carry permission profiles on user turns (#18285 ) ## Why Per-turn permission overrides should use the same canonical profile abstraction as session configuration. That lets TUI submissions preserve exact configured permissions without round-tripping through legacy sandbox fields. ## What changed This adds `permission_profile` to user-turn operations, threads it through TUI/app-server submission paths, fills the new field in existing test fixtures, and adds coverage that composer submission includes the configured profile. ## Verification - `cargo test -p codex-tui permissions -- --nocapture` - `cargo test -p codex-core --test all permissions_messages -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285). * #18288 * #18287 * #18286 * __->__ #18285	2026-04-23 11:54:17 -07:00
Rasmus Rygaard	f11583b8f6	Add remote thread config endpoint (#18908 ) ## Why App-server needs a way to fetch thread-scoped config from the remote thread config service when the user config opts into that behavior. This mirrors the existing experimental remote thread store endpoint while keeping local/noop behavior as the default. Startup paths also need to avoid silently dropping the remote config endpoint after the first config load. The stdio app-server path discovers the endpoint from the initial config and installs the real thread config loader for later config builds, while in-process clients used by TUI/exec now select the same remote loader directly from their provided config. ## What changed - Added `experimental_thread_config_endpoint` to `ConfigToml`, `Config`, and `core/config.schema.json`. - Added config parsing coverage for the new setting. - Updated app-server startup to select `RemoteThreadConfigLoader` from the initially loaded config, falling back to `NoopThreadConfigLoader` when unset. - Let `ConfigManager` replace its thread config loader after startup discovery so later config loads use the selected loader. - Updated in-process app-server client startup to pass `RemoteThreadConfigLoader` when its config has `experimental_thread_config_endpoint` set. ## Verification - Added `experimental_thread_config_endpoint_loads_from_config_toml`. - Added `runtime_start_args_use_remote_thread_config_loader_when_configured`. - Ran `cargo check -p codex-app-server --lib`. - Ran `cargo test -p codex-app-server-client`.	2026-04-23 11:46:06 -07:00
maja-openai	cff337e4e3	Use Auto-review wording for fallback rationale (#19168 ) ## Why PR #18797 currently surfaces fallback rationale text that names Guardian directly. ## What changed - Updated the bare allow and bare deny fallback rationales in `codex-rs/core/src/guardian/prompt.rs` from Guardian to Auto-review. - Updated the existing bare allow parser test and added explicit bare deny parser coverage. ## Verification - `cargo test -p codex-core parse_guardian_assessment_treats_bare`	2026-04-23 11:42:43 -07:00
xl-openai	198eddd25d	Move marketplace add/remove and startup sync out of core. (#19099 ) Move more things to core-plugins. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-23 11:27:17 -07:00
Tom	f1061d9d07	[codex] Implement remote thread store methods (#19008 )	2026-04-23 17:49:28 +00:00
Tom	f1923a38b1	[codex] Route live thread writes through ThreadStore (#18882 ) Begin migrating the thread write codepaths to ThreadStore. This starts using ThreadStore inside of core session code, not only in the app server code. Rework the interfaces around thread recording/persistence. We're left with the following: * `ThreadManager`: owns the process-level registry of loaded threads and handles cross-thread orchestration: start, resume, fork, lookup, remove, and route ops to running CodexThreads. * `CodexThread`: represents one loaded/running thread from the outside. It is the handle app-server and callers use to submit ops, inspect session metadata, and shut the thread down. * `LiveThread`: session-owned persistence lifecycle handle for one active thread. Core session code uses it to append rollout items, materialize lazy persistence, flush, shutdown, discard init-failed writers, and load that thread’s persisted history. * `ThreadStore`: storage backend abstraction. It answers “how are threads persisted, read, listed, updated, archived?” Local and remote implementations live behind this trait. * `LocalThreadStore`: local ThreadStore implementation. It owns the file/sqlite-specific details and keeps RolloutRecorder as a local implementation detail. This is a few too many Thread abstractions for my liking, but they do all represent different concepts / needs / layers. Migration note: in places where the core code explicitly requires a path, rather than a thread ID, throw an error if we're running with a remote store. Cover the new local live-writer lifecycle with focused tests and preserve app-server thread-start behavior, including ephemeral pathless sessions.	2026-04-23 10:17:09 -07:00
jif-oai	a2f868c9d6	feat: drop spawned-agent context instructions (#19127 ) ## Why MultiAgentV2 children should not receive an extra model-visible developer fragment just because they were spawned. The parent/configured developer instructions should carry through normally, but the dedicated `<spawned_agent_context>` block is no longer desired. ## What changed - Removed the `SpawnAgentInstructions` context fragment and its `<spawned_agent_context>` wrapper. - Stopped appending spawned-agent instructions in `codex-rs/core/src/tools/handlers/multi_agents_v2/spawn.rs`. - Updated subagent notification coverage to assert inherited parent developer instructions without expecting the spawned-agent wrapper. ## Verification - `cargo test -p codex-core --test all spawned_multi_agent_v2_child_inherits_parent_developer_context -- --nocapture` - `cargo test -p codex-core --test all skills_toggle_skips_instructions_for_parent_and_spawned_child -- --nocapture` - `cargo test -p codex-core --test all subagent_notifications -- --nocapture`	2026-04-23 18:54:45 +02:00

1 2 3 4 5 ...

2963 Commits