codex

mirror of https://github.com/openai/codex.git synced 2026-05-17 09:43:19 +00:00

Author	SHA1	Message	Date
Ahmed Ibrahim	f8e2beeafa	Address OAuth persistence review feedback Keep local OAuth login flows on the no-proxy client while remote flows continue to use the runtime-selected HTTP client. Harden OAuth persistence so expires_in drift does not look like an external token update and transient keyring read failures do not clear in-memory credentials. Co-authored-by: Codex <noreply@openai.com>	2026-04-27 11:58:06 +00:00
Ahmed Ibrahim	35080d5cd0	Address MCP OAuth review feedback Restore stored-token bearer fallback by probing OAuth metadata through the injected runtime HTTP client before creating the OAuth transport. Also broaden discovery to OpenID Connect and protected-resource metadata paths, and avoid writing stale refresh results over newer in-memory credentials. Co-authored-by: Codex <noreply@openai.com>	2026-04-27 11:34:20 +00:00
Ahmed Ibrahim	9df8e3d935	Use no-proxy client for local OAuth preflight Co-authored-by: Codex <noreply@openai.com>	2026-04-27 11:13:44 +00:00
Ahmed Ibrahim	da1000bbb4	Use no-proxy client for local MCP auth status Co-authored-by: Codex <noreply@openai.com>	2026-04-27 11:01:49 +00:00
Ahmed Ibrahim	ec6227c6cd	Stabilize OAuth persistence test Co-authored-by: Codex <noreply@openai.com>	2026-04-27 10:45:14 +00:00
Ahmed Ibrahim	520ee70b73	Address remote OAuth review feedback Co-authored-by: Codex <noreply@openai.com>	2026-04-27 10:38:51 +00:00
Ahmed Ibrahim	276a97340c	Fix OAuth argument comments Co-authored-by: Codex <noreply@openai.com>	2026-04-27 10:25:29 +00:00
Ahmed Ibrahim	2b68005d97	Remove stale app-server OAuth dependency Drop the direct codex-rmcp-client dependency from app-server now that OAuth orchestration is routed through codex-mcp. Co-authored-by: Codex <noreply@openai.com>	2026-04-27 10:14:35 +00:00
Ahmed Ibrahim	c2a2ffa512	Address MCP OAuth review feedback Use the runtime environment manager for MCP list auth status, keep cached tokens available when refresh fails before expiry, avoid restoring deleted OAuth credentials, and serialize refresh attempts. Co-authored-by: Codex <noreply@openai.com>	2026-04-27 10:11:16 +00:00
Ahmed Ibrahim	a3abb8ba6e	Move MCP OAuth orchestration downstream Keep app-server OAuth callsites thin by moving server/runtime-aware OAuth discovery, scope resolution, and runtime HTTP client selection into codex-mcp helpers. Co-authored-by: Codex <noreply@openai.com>	2026-04-27 10:03:34 +00:00
Ahmed Ibrahim	a4de53c661	Remove unused rmcp client dependency Drop the stale codex-client dependency from codex-rmcp-client after the OAuth transport changes stopped using that crate. Co-authored-by: Codex <noreply@openai.com>	2026-04-27 09:38:20 +00:00
Ahmed Ibrahim	177e39f4b7	Add remote runtime OAuth for MCP Route Streamable HTTP OAuth discovery, registration, token exchange, and refresh through the selected MCP runtime HTTP client so remote MCP servers do not silently fall back to local network access. Keep browser launch and callback handling local while storing tokens in the existing local MCP OAuth store. Add regression coverage for injected OAuth HTTP requests and stored-token refresh through the exec-server transport. Co-authored-by: Codex <noreply@openai.com>	2026-04-27 09:33:42 +00:00
Eric Traut	4f1d5f00f0	Add Codex issue digest skill (#19779 ) Problem: Maintainers need a shared way to run Codex GitHub issue digests without copying large prompts or relying on manual GitHub page summaries. Solution: Add a reusable codex-issue-digest skill with a deterministic GitHub collector, owner/all-label windows, reaction-aware activity metrics, scaled attention markers, and focused tests.	2026-04-26 23:16:43 -07:00
Michael Bolin	a6ca39c630	permissions: derive legacy exec policies at boundaries (#19737 ) ## Why After config and requirements store canonical profiles, exec requests should not cache a derived `SandboxPolicy`. The cached legacy value can drift from the richer profile state, and most execution paths already have the filesystem and network runtime policies they need. ## What Changed - Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest` and `codex_core::sandboxing::ExecRequest`. - Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper for the Windows and legacy call sites that still need a `SandboxPolicy` projection. - Updates Windows filesystem override setup and unified exec policy serialization to derive that compatibility policy at the boundary. - Updates Unix escalation reruns and direct shell requests to reconstruct exec requests from `PermissionProfile` plus runtime filesystem/network policy, without carrying a cached legacy policy. - Adjusts sandboxing manager tests to assert the effective profile rather than the removed legacy field. ## Verification - `cargo check -p codex-config -p codex-core -p codex-sandboxing -p codex-app-server -p codex-cli -p codex-tui` - `cargo test -p codex-sandboxing manager` - `cargo test -p codex-core exec_server_params_use_env_policy_overlay_contract` - `cargo test -p codex-core unix_escalation` - `cargo test -p codex-core exec::tests` - `cargo test -p codex-core sandboxing::tests`	2026-04-26 22:11:49 -07:00
Michael Bolin	523e4aa8e3	permissions: constrain requirements as profiles (#19736 )	2026-04-26 21:49:30 -07:00
Michael Bolin	0ccd659b4b	permissions: store only constrained permission profiles (#19735 )	2026-04-26 20:59:58 -07:00
Won Park	8033b6a449	Add /auto-review-denials retry approval flow (#19058 ) ## Why Auto-review can deny an action that the user later decides they want to retry. Today there is no TUI surface for selecting a recent denial and sending explicit approval context back into the session, so users have to restate intent manually and the retry can be reviewed without the original denied action context. This adds a narrow TUI-driven path for approving a recent denied action while still keeping the retry inside the normal auto-review flow. ## What Changed - Added `/auto-review-denials` to open a picker of recent denied auto-review actions. - Added a small in-memory TUI store for the 10 most recent denied auto-review events. - Selecting a denial sends the structured denied event back through the existing core/app-server op path. - Core now injects a developer message containing the approved action JSON rather than the full assessment event. - Auto-review transcript collection now preserves this specific approval developer message so follow-up review sessions can see the user approval context. - Added TUI snapshot/unit coverage for the picker and approval dispatch path. - Added core coverage for retaining the approval developer message in the auto-review transcript. ## Verification - `cargo test -p codex-core collect_guardian_transcript_entries_keeps_manual_approval_developer_message` - `cargo test -p codex-tui auto_review_denials` - `cargo test -p codex-tui approving_recent_denial_emits_structured_core_op_once` ## Notes This intentionally keeps retries going through auto-review. The approval signal is context for the exact previously denied action, not a blanket bypass for similar future actions.	2026-04-27 03:43:53 +00:00
Michael Bolin	0d8cdc0510	permissions: centralize legacy sandbox projection (#19734 ) ## Why The remaining migration work still needs `SandboxPolicy` at a few compatibility boundaries, but those projections should come from one canonical path. Keeping ad hoc legacy projections scattered through app-server, CLI, and config code makes it easy for behavior to drift as `PermissionProfile` gains fidelity that the legacy enum cannot represent. ## What Changed - Adds `Permissions::legacy_sandbox_policy(cwd)` and `Config::legacy_sandbox_policy()` as the compatibility projection from the canonical `PermissionProfile`. - Adds `Permissions::can_set_legacy_sandbox_policy()` so legacy inputs are checked after they are converted into profile semantics. - Updates app-server command handling, Windows sandbox setup, session configuration, and sandbox summaries to use the centralized projection helper. - Leaves `SandboxPolicy` in place only for boundary inputs/outputs that still speak the legacy abstraction. ## Verification - `cargo check -p codex-config -p codex-core -p codex-sandboxing -p codex-app-server -p codex-cli -p codex-tui` - `cargo test -p codex-tui permissions_selection_history_snapshot_full_access_to_default -- --nocapture` - `cargo test -p codex-tui permissions_selection_sends_approvals_reviewer_in_override_turn_context -- --nocapture` - `bazel test //codex-rs/tui:tui-unit-tests-bin --test_arg=permissions_selection_history_snapshot_full_access_to_default --test_output=errors` - `bazel test //codex-rs/tui:tui-unit-tests-bin --test_arg=permissions_selection_sends_approvals_reviewer_in_override_turn_context --test_output=errors` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19734). * #19737 * #19736 * #19735 * __->__ #19734	2026-04-26 20:31:23 -07:00
Abhinav	c3e60849e5	inline hostname resolution for remote sandbox config (#19739 ) # Why Requirements support host-specific `remote_sandbox_config.hostname_patterns`, but config loading previously resolved and passed the system hostname through every config-loading path even when no requirements layer used `remote_sandbox_config`. On machines where hostname lookup is slow, startup and app-server config reads paid for a feature that was not active. We only need the hostname when a requirements layer actually declares `remote_sandbox_config`, so this moves hostname resolution to the single requirements merge point and keeps all other config callers unaware of hostname matching. # What - Removed the eager `host_name` plumbing from `load_config_layers_state`, `load_requirements_toml`, `ConfigBuilder`, app-server `ConfigManager`, network proxy loading, and related call sites. - Resolve the hostname inside `merge_requirements_with_remote_sandbox_config` only when the incoming requirements contain `remote_sandbox_config`.	2026-04-27 03:18:57 +00:00
Michael Bolin	ad57a3fee2	permissions: finish profile-backed app surfaces (#19395 )	2026-04-26 19:42:39 -07:00
Andrey Mishchenko	1f304dd1f2	Allow agents.max_threads to work with multi_agent_v2 (#19733 )	2026-04-26 17:56:05 -07:00
Michael Bolin	2cb8746457	permissions: remove core legacy policy round trips (#19394 ) ## Why Several execution paths still converted profile-backed permissions into `SandboxPolicy` and then rebuilt runtime permissions from that legacy shape. Those round trips are unnecessary after the preceding PRs and can lose split filesystem semantics. Core approval and escalation should carry the resolved profile directly. ## What Changed - Removes `sandbox_policy` from `ResolvedPermissionProfile`; the resolved permission object now carries the canonical `PermissionProfile` directly. - Updates exec-policy fallback, shell/unified-exec interception, escalation reruns, and related tests to pass profiles instead of legacy policies. - Removes legacy additional-permission merge helpers that built an effective `SandboxPolicy` before rebuilding runtime permissions. - Keeps legacy projections only at compatibility boundaries that still require `SandboxPolicy`, not in core permission computation. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394). * #19737 * #19736 * #19735 * #19734 * #19395 * __->__ #19394	2026-04-26 17:43:32 -07:00
Andrey Mishchenko	35bc6e3d01	Delete unused ResponseItem::Message.end_turn (#19605 ) This field is unused. Delete it.	2026-04-26 17:18:09 -07:00
Ahmed Ibrahim	0bda8161a2	Split MCP connection modules (#19725 ) ## Why The MCP connection manager module had grown to mix orchestration, RMCP client startup, elicitation handling, Codex Apps cache and naming behavior, tool qualification and filtering, and runtime data. The previous stacked PRs split these responsibilities incrementally; this PR collapses that work into one self-contained refactor on latest main. ## What changed - Move McpConnectionManager into connection_manager.rs. - Move RMCP client lifecycle, startup, and uncached tool listing into rmcp_client.rs. - Move elicitation request tracking and policy handling into elicitation.rs. - Move Codex Apps cache, key, filtering, and naming helpers into codex_apps.rs. - Rename the tool-name helper module to tools.rs and move ToolInfo, tool filtering, schema masking, and qualification there. - Move runtime and sandbox shared types into runtime.rs. - Preserve latest main PermissionProfile-based MCP elicitation auto-approval behavior. ## Verification - just fmt - cargo check -p codex-mcp - cargo check -p codex-mcp --tests - cargo check -p codex-core --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-26 23:23:34 +00:00
Michael Bolin	4c58e64f08	test: increase core-all-test shard count to 16 (#19727 ) ## Summary Increase `core-all-test`'s Bazel shard count from `8` to `16`. ## Why [#19609](https://github.com/openai/codex/pull/19609) restored `bazel.yml` to a 30-minute timeout and increased `app-server-all-test`'s shard count because the bigger timeout risk was not just a cold Windows build. The more common problem was a long `rust_test()` shard failing and getting retried multiple times. Recent `main` runs show that `//codex-rs/core:core-all-test` still has the same shape of problem on Windows: - [Run 24943931330](https://github.com/openai/codex/actions/runs/24943931330) reported `//codex-rs/core:core-all-test` as flaky after first-attempt failures in shard `5/8` and shard `8/8`. - Those retries were driven by `suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override` and `suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`. - The failed shard attempts in that run took `272.61s` and `259.27s` before retrying, which is exactly the sort of wall-clock cost that burns through the 30-minute budget. - [Run 24966332583](https://github.com/openai/codex/actions/runs/24966332583) also retried `//codex-rs/tui:tui-unit-tests` after `app::tests::update_memory_settings_updates_current_thread_memory_mode` failed once on Windows. - [Run 24965527138](https://github.com/openai/codex/actions/runs/24965527138) and its linked [BuildBuddy invocation](https://app.buildbuddy.io/invocation/ac1a8265-06fa-4da5-9552-4715b7965bce) show the other half of the problem: when Windows cache reuse is weak, the `bazel test //...` step can already consume `24m11s` on its own, leaving very little headroom for flaky retries. Increasing `core-all-test` to `16` shards does not fix the flaky tests, but it does reduce the wall-clock cost when a single shard has to be retried. That matches the mitigation we already applied to `app-server-all-test` in `#19609`. ## What Changed - Update `codex-rs/core/BUILD.bazel` so `core-all-test` uses `16` shards instead of `8`. - Leave `core-unit-tests` unchanged. ## Follow-up Work This change is meant to buy back CI headroom while we fix the flaky tests themselves in subsequent commits. The recent Windows retries that look worth addressing directly include: - `suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override` - `suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request` - `app::tests::update_memory_settings_updates_current_thread_memory_mode` ## Verification - Compared `core-all-test`'s current sharding against the `app-server-all-test` precedent in [#19609](https://github.com/openai/codex/pull/19609). - Inspected recent `main` Bazel workflow logs and the linked BuildBuddy invocation to confirm that Windows retries on long shards are still consuming a meaningful fraction of the 30-minute timeout budget. - Did not run local tests for this change because it only adjusts Bazel sharding metadata.	2026-04-26 23:10:26 +00:00
pakrym-oai	ba159cbc79	Fix codex-core config test type paths (#19726 ) Summary: - Update config tests to reference config requirement types from codex_config after the loader split. Tests: - just fmt - cargo build -p codex-core --tests - cargo clippy -p codex-core --tests -- -D warnings	2026-04-26 15:58:17 -07:00
Michael Bolin	dda8199b73	permissions: migrate approval and sandbox consumers to profiles (#19393 ) ## Why Runtime decisions should not infer permissions from the lossy legacy sandbox projection once `PermissionProfile` is available. In particular, `Disabled` and `External` need to remain distinct, and managed profiles with split filesystem or deny-read rules should not be collapsed before approval, network, safety, or analytics code makes decisions. ## What Changed - Changes managed network proxy setup and network approval logic to use `PermissionProfile` when deciding whether a managed sandbox is active. - Migrates patch safety, Guardian/user-shell approval paths, Landlock helper setup, analytics sandbox classification, and selected turn/session code to profile-backed permissions. - Validates command-level profile overrides against the constrained `PermissionProfile` rather than a strict `SandboxPolicy` round trip. - Preserves configured deny-read restrictions when command profiles are narrowed. - Adds coverage for profile-backed trust, network proxy/approval behavior, patch safety, analytics classification, and command-profile narrowing. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393). * #19395 * #19394 * __->__ #19393	2026-04-26 15:30:40 -07:00
pakrym-oai	9c3abcd46c	[codex] Move config loading into codex-config (#19487 ) ## Why Config loading had become split across crates: `codex-config` owned the config types and merge logic, while `codex-core` still owned the loader that assembled the layer stack. This change consolidates that responsibility in `codex-config`, so the crate that defines config behavior also owns how configs are discovered and loaded. To make that move possible without reintroducing the old dependency cycle, the shell-environment policy types and helpers that `codex-exec-server` needs now live in `codex-protocol` instead of flowing through `codex-config`. This also makes the migrated loader tests more deterministic on machines that already have managed or system Codex config installed by letting tests override the system config and requirements paths instead of reading the host's `/etc/codex`. ## What Changed - moved the config loader implementation from `codex-core` into `codex-config::loader` and deleted the old `core::config_loader` module instead of leaving a compatibility shim - moved shell-environment policy types and helpers into `codex-protocol`, then updated `codex-exec-server` and other downstream crates to import them from their new home - updated downstream callers to use loader/config APIs from `codex-config` - added test-only loader overrides for system config and requirements paths so loader-focused tests do not depend on host-managed config state - cleaned up now-unused dependency entries and platform-specific cfgs that were surfaced by post-push CI ## Testing - `cargo test -p codex-config` - `cargo test -p codex-core config_loader_tests::` - `cargo test -p codex-protocol -p codex-exec-server -p codex-cloud-requirements -p codex-rmcp-client --lib` - `cargo test --lib -p codex-app-server-client -p codex-exec` - `cargo test --no-run --lib -p codex-app-server` - `cargo test -p codex-linux-sandbox --lib` - `cargo shear` - `just bazel-lock-check` ## Notes - I did not chase unrelated full-suite failures outside the migrated loader surface. - `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive failures on this machine, and Windows CI still shows unrelated long-running/timeouting test noise outside the loader migration itself.	2026-04-26 15:10:53 -07:00
pakrym-oai	2a020f1a0a	Lift app-server JSON-RPC error handling to request boundary (#19484 ) ## Why App-server request handling had a lot of repeated JSON-RPC error construction and one-off `send_error`/`return` branches. This made small handlers noisy and pushed error response details into leaf code that otherwise only needed to validate input or call the underlying API. ## What Changed - Added shared JSON-RPC error constructors in `codex-rs/app-server/src/error_code.rs`. - Lifted straightforward request result emission into `codex-rs/app-server/src/message_processor.rs` so response/error dispatch happens at the request boundary. - Reused the result helpers across command exec, config, filesystem, device-key, external-agent config, fs-watch, and outgoing-message paths. - Removed leaf wrapper handlers where the method body was only forwarding to a response helper. - Returned request validation errors upward in the simple cases instead of sending an error locally and immediately returning. ## Verification - `cargo test -p codex-app-server --lib command_exec::tests` - `cargo test -p codex-app-server --lib outgoing_message::tests` - `cargo test -p codex-app-server --lib in_process::tests` - `cargo test -p codex-app-server --test all v2::fs` - `cargo test -p codex-app-server --test all v2::config_rpc` - `cargo test -p codex-app-server --test all v2::external_agent_config` - `cargo test -p codex-app-server --test all v2::initialize` - `just fix -p codex-app-server` - `git diff --check` Note: full `cargo test -p codex-app-server` was attempted and stopped in `message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans` with a stack overflow after unrelated tests had already passed.	2026-04-26 15:10:35 -07:00
Michael Bolin	deaa307fb2	permissions: derive compatibility policies from profiles (#19392 ) ## Why After #19391, `PermissionProfile` and the split filesystem/network policies could still be stored in parallel. That creates drift risk: a profile can preserve deny globs, external enforcement, or split filesystem entries while a cached projection silently loses those details. This PR makes the profile the runtime source and derives compatibility views from it. ## What Changed - Removes stored filesystem/network sandbox projections from `Permissions` and `SessionConfiguration`; their accessors now derive from the canonical `PermissionProfile`. - Derives legacy `SandboxPolicy` snapshots from profiles only where an older API still needs that field. - Updates MCP connection and elicitation state to track `PermissionProfile` instead of `SandboxPolicy` for auto-approval decisions. - Adds semantic filesystem-policy comparison so cwd changes can preserve richer profiles while still recognizing equivalent legacy projections independent of entry ordering. - Updates config/session tests to assert profile-derived projections instead of parallel stored fields. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392). * #19395 * #19394 * #19393 * __->__ #19392	2026-04-26 15:06:42 -07:00
Michael Bolin	4d7ce3447d	permissions: make runtime config profile-backed (#19606 ) ## Why This supersedes #19391. During stack repair, GitHub marked #19391 as merged into a temporary stack branch rather than into `main`, so the runtime-config change needed a fresh PR. `PermissionProfile` is now the canonical permissions shape after #19231 because it can distinguish `Managed`, `Disabled`, and `External` enforcement while also carrying filesystem rules that legacy `SandboxPolicy` cannot represent cleanly. Core config and session state still needed to accept profile-backed permissions without forcing every profile through the strict legacy bridge, which rejected valid runtime profiles such as direct write roots. The unrelated CI/test hardening that previously rode along with this PR has been split into #19683 so this PR stays focused on the permissions model migration. ## What Changed - Adds `Permissions.permission_profile` and `SessionConfiguration.permission_profile` as constrained runtime state, while keeping `sandbox_policy` as a legacy compatibility projection. - Introduces profile setters that keep `PermissionProfile`, split filesystem/network policies, and legacy `SandboxPolicy` projections synchronized. - Uses a compatibility projection for requirement checks and legacy consumers instead of rejecting profiles that cannot round-trip through `SandboxPolicy` exactly. - Updates config loading, config overrides, session updates, turn context plumbing, prompt permission text, sandbox tags, and exec request construction to carry profile-backed runtime permissions. - Preserves configured deny-read entries and `glob_scan_max_depth` when command/session profiles are narrowed. - Adds `PermissionProfile::read_only()` and `PermissionProfile::workspace_write()` presets that match legacy defaults. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606). * #19395 * #19394 * #19393 * #19392 * __->__ #19606	2026-04-26 13:29:54 -07:00
efrazer-oai	fed0a8f4fa	feat: load AgentIdentity from JWT login/env (#18904 ) ## Summary This PR lets programmatic AgentIdentity users provide one token through either stdin login or environment auth. `codex login --with-agent-identity` reads an Agent Identity JWT from stdin, validates that it has the required claims, and stores that token as the `agent_identity` value in `auth.json`. The file format is token-only; the decoded account and key fields are runtime state, not hand-authored auth.json fields. The Agent Identity JWT claim shape and decoder live in `codex-agent-identity`; `codex-login` only owns env/storage precedence and conversion into `CodexAuth::AgentIdentity`. When env auth is enabled, `CODEX_AGENT_IDENTITY` can provide the same JWT without writing auth state to disk. `CODEX_API_KEY` still wins if both env vars are set. Reference old stack: https://github.com/openai/codex/pull/17387/changes Reference JWT/env stack: https://github.com/openai/codex/pull/18176 ## Stack 1. https://github.com/openai/codex/pull/18757: full revert 2. https://github.com/openai/codex/pull/18871: isolated Agent Identity crate 3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity auth mode and startup task allocation 4. https://github.com/openai/codex/pull/18811: migrate Codex backend auth callsites through AuthProvider 5. This PR: accept AgentIdentity JWTs through login/env ## Testing Tests: targeted login and Agent Identity crate tests, CLI checks, scoped formatter/linter cleanup, and CI. --------- Co-authored-by: Shijie Rao <shijie.rao@openai.com>	2026-04-26 19:49:54 +00:00
Michael Bolin	ac2bffa443	test: harden app-server integration tests (#19683 ) ## Why Windows Bazel runs in the permissions stack exposed that app-server integration tests were launching normal plugin startup warmups in every subprocess. Those warmups can call `https://chatgpt.com/backend-api/plugins/featured` when a test is not specifically exercising plugin startup, which adds slow background work, noisy stderr, and dependence on external network state. The relevant startup/featured-plugin behavior was introduced across #15042 and #15264. A few app-server tests also had long optional waits or unbounded cleanup paths, making failures expensive to diagnose and contributing to slow Windows shards. One external-agent config test from #18246 used a GitHub-style marketplace source, which was enough to exercise the pending remote-import path but also meant the background completion task could attempt a real clone. ## What Changed - Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks` plumbing and a hidden debug-only `--disable-plugin-startup-tasks-for-tests` app-server flag, so integration tests can suppress startup plugin warmups without adding a production env-var gate. - Has the app-server test harness pass that hidden flag by default, while opting plugin-startup coverage back in for tests that intentionally exercise startup sync and featured-plugin warmup behavior. - Lowers normal app-server subprocess logging from `info`/`debug` to `warn` to avoid multi-megabyte stderr output in Bazel logs. - Prevents the external-agent config test from attempting a real marketplace clone by using an invalid non-local source while still exercising the pending-import completion path. - Bounds optional filesystem/realtime waits and fake WebSocket test-server shutdown so failures produce targeted timeouts instead of hanging a shard. - Fixes the Unix script-resolution test in `rmcp-client` to exercise PATH resolution directly and include the actual spawn error in failures. ## Verification - `cargo check -p codex-app-server` - `cargo clippy -p codex-app-server --tests -- -D warnings` - `cargo test -p codex-rmcp-client program_resolver::tests::test_unix_executes_script_without_extension` - `cargo test -p codex-app-server --test all external_agent_config_import_sends_completion_notification_after_pending_plugins_finish -- --nocapture` - `cargo test -p codex-app-server --test all plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request -- --nocapture` - Windows Local Bazel passed with this test-hardening bundle before it was extracted from #19606. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683). * #19395 * #19394 * #19393 * #19392 * #19606 * __->__ #19683	2026-04-26 12:43:16 -07:00
Thibault Sottiaux	87bc72408c	[codex] remove responses command (#19640 ) This removes the hidden `codex responses` CLI subcommand after confirming no downstream callers rely on it, deleting the raw Responses passthrough implementation, unregistering the subcommand, and dropping the now-unused CLI dependencies on `codex-api` and `codex-model-provider`.	2026-04-25 23:10:38 -07:00
Andrey Mishchenko	355c40ad7e	Support end_turn in response.completed (#19610 ) Some providers of Responses API forward a model-defined `end_turn` boolean indicating explicitly the model's indication of whether it would like to end the turn or to be inferenced again. In this PR, we update the sampling loop to use this field correctly if it's set. If the field is not set by the provider, we fall back to the existing sampling logic.	2026-04-25 21:57:42 -07:00
Felipe Coury	5591912f0b	fix(tui): reflow scrollback on terminal resize (#18575 ) Fixes multiple scrollback and terminal resize issues: #5538, #5576, #8352, #12223, #16165, and #15380. ## Why Codex writes finalized transcript output into terminal scrollback after wrapping it for the current viewport width. A later terminal resize could leave that scrollback shaped for the old width, so wider windows kept narrow output and narrower windows could show stale wrapping artifacts until enough new output replaced the visible area. This is also the foundation PR for responsive markdown tables. Table rendering needs finalized transcript content to be width-sensitive after insertion, not only while content is first streaming. Markdown table rendering itself stays in #18576. ## Stack - PR1: resize backlog reflow and interrupt cleanup - #18576: markdown table support ## What Changed - Rebuild source-backed transcript history when the terminal width changes. `terminal_resize_reflow` is introduced through the experimental feature system, but is enabled by default for this rollout so we can validate behavior across real terminals. - Preserve assistant and plan stream source so finalized streaming output can participate in resize reflow after consolidation. - Debounce resize work, but force a final source-backed reflow when a resize happened during active or unconsolidated streaming output. - Clear stale pending history lines on resize so old-width wrapped output is not emitted just before rebuilt scrollback. - Bound replay work with `[tui.terminal_resize_reflow].max_rows`: omitted uses terminal-specific defaults, `0` keeps all rendered rows, and a positive value sets an explicit cap. The cap applies both while initially replaying a resumed transcript into scrollback and when rebuilding scrollback after terminal resize. - Consolidate interrupted assistant streams before cleanup, then clear pending stream output and active-tail state consistently. - Move resize reflow and thread event buffering helpers out of `app.rs` into dedicated TUI modules. - Add focused coverage for resize reflow, feature-gated behavior, streaming source preservation, interrupted output cleanup, unicode-neutral text, terminal-specific row caps, and composer/layout stability. ## Runtime Bounds Resize reflow keeps only the most recent rendered rows when a row cap is active. The default is `auto`, which maps to the detected terminal's default scrollback size where Codex can identify it: VS Code `1000`, Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`. Terminals without a dedicated mapping use the conservative fallback of `1000` rows. Users can override this with `[tui.terminal_resize_reflow] max_rows = N`, or set `max_rows = 0` to disable row limiting. ## Validation - `just fmt` - `git diff --check` - `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow` - `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui transcript_reflow` - `just fix -p codex-tui` - PR CI in progress on the squashed branch	2026-04-25 22:00:32 -03:00
Shijie Rao	4e30281a13	Guard npm update readiness (#19389 ) ## Why For npm/Bun-managed installs, the update prompt was treating the latest GitHub release as ready to install. During the `0.124.0` release, GitHub and npm visibility were not atomic: the root npm wrapper could become visible before the npm registry marked that version as the package `latest`. That left a window where users could be prompted to upgrade before npm was ready for the release. ## What changed - Keep GitHub Releases as the candidate latest-version source for npm/Bun installs, but only write the existing `version.json` cache after npm registry metadata proves that same root version is ready. - Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by checking `dist-tags.latest` and root package `dist` metadata for the GitHub candidate version. - Move version parsing helpers into `codex-rs/tui/src/update_versions.rs` so that logic can be tested without compiling the release-only `updates.rs` module under tests. - Update `.github/workflows/rust-release.yml` so the six known platform tarballs publish before the root `@openai/codex` wrapper. Other npm tarballs publish before the root wrapper, and the SDK publishes after the root package it depends on.	2026-04-25 17:09:29 -07:00
Michael Bolin	9881dc7306	fix: restore 30-minute timeout for Bazel builds (#19609 ) I think raising it to 45 minutes in https://github.com/openai/codex/pull/19578 was a mistake for the reasons explained in the comments in the code. Instead, we attempt to defend against timeouts by increasing the number of shards in `app-server-all-test` so that a "true failure" that gets run 3x should not take as much wall clock time.	2026-04-25 16:34:06 -07:00
Michael Bolin	d54493ba1c	test: stabilize app-server path assertions on Windows (#19604 ) ## Why Windows can represent the same canonical local path with either a normal drive path or a verbatim device path prefix. The failure pattern that motivated this PR was an assertion diff like `C:\...` versus `\\?\C:\...`: different spellings, same file. That became visible while validating the permissions stack above this PR. The stack increasingly routes paths through `AbsolutePathBuf`, which normalizes supported Windows device prefixes, while several existing tests still built expected values directly with `std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a raw `PathBuf`. On Windows, that can make tests fail because the two sides choose different textual forms for an otherwise equivalent canonical path. This PR is intentionally split out as the bottom PR below #19606. The runtime permissions migration should not carry unrelated Windows test stabilization, and reviewers should be able to verify this as a test-only change before looking at the larger permissions changes. ## Failure Modes Covered - `conversation_summary` expected rollout paths were built from raw canonicalized `PathBuf`s, while app-server responses could carry `AbsolutePathBuf`-normalized paths. - `thread_resume` compared returned thread paths directly to previously stored or fixture paths, so a verbatim-prefix spelling could fail an otherwise correct resume. - `marketplace_add` compared plugin install roots through `as_path()` against raw canonicalized paths, reproducing the same `C:\...` versus `\\?\C:\...` mismatch in both app-server and core-plugin coverage. ## What Changed - In `app-server/tests/suite/conversation_summary.rs`, normalize both expected rollout paths and received `ConversationSummary.path` values through `AbsolutePathBuf` before comparing the full summary object. - In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides of thread path comparisons before asserting equality. This keeps the tests focused on whether resume returned the same existing path, not whether Windows used the same string spelling. - In `app-server/tests/suite/v2/marketplace_add.rs` and `core-plugins/src/marketplace_add.rs`, compare install roots as `AbsolutePathBuf` values instead of comparing an absolute-path wrapper to a raw canonicalized `PathBuf`. ## Behavior This PR does not change production app-server or marketplace behavior. It only changes tests to assert semantic path identity across Windows path spelling variants. It also leaves API response values untouched; the normalization happens inside assertions only. ## Verification Targeted local checks run while extracting this fix: - `cargo test -p codex-app-server get_conversation_summary_by_thread_id_reads_rollout` - `cargo test -p codex-app-server get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home` - `cargo test -p codex-app-server thread_resume_prefers_path_over_thread_id` Windows-specific confidence comes from the Bazel Windows CI job for this PR, since the failure is platform-specific. ## Docs No docs update is needed because this is test-only infrastructure stabilization. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604). * #19395 * #19394 * #19393 * #19392 * #19606 * __->__ #19604	2026-04-25 16:25:28 -07:00
viyatb-oai	9aaa5d9358	[codex] Bypass managed network for escalated exec (#19595 ) ## Why `sandbox_permissions = "require_escalated"` is treated as an explicit request to approve the command and run it outside the filesystem/platform sandbox. Before this change, shell and unified exec still registered managed network approval context and could inject Codex-managed proxy state into the child process, which meant an approved escalated command could still hit a second network approval path. This PR makes that escalation boundary consistent: once a command is explicitly approved to run outside the sandbox, Codex does not also route that process through the managed network proxy. ## Security impact Command/filesystem sandbox approval now implies network approval for that command. If an untrusted command or script is allowed to run with `require_escalated`, its network calls are unsandboxed: Codex-managed network allowlists and denylists are not respected for that process, so the command can exfiltrate any data it can read. ## What changed - Skip managed network approval specs for `SandboxPermissions::RequireEscalated`. - Pass `network: None` into shell, zsh-fork shell, and unified exec sandbox preparation for explicitly escalated requests. - Strip Codex-managed proxy environment variables when `CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env when the Codex marker is absent. - Add regression coverage for the prepared exec request so the old behavior cannot silently reappear. ## Verification - `cargo test -p codex-core explicit_escalation` - `cargo clippy -p codex-core --all-targets -- -D warnings`	2026-04-25 23:23:58 +00:00
Eric Traut	0c785598b3	Keep slash command popup columns stable while scrolling (#19511 ) ## Why Fixes #19499. The slash-command popup recalculated the command-name column from only the rows visible in the current viewport. That made the description column shift horizontally while scrolling through `/` commands whenever longer command names entered or left the visible window. ## What Changed `codex-rs/tui/src/bottom_pane/command_popup.rs` now uses the shared selection-popup `AutoAllRows` column-width mode for both height measurement and rendering. This keeps the command description column based on the full filtered slash-command list instead of the current viewport. ## Verification - `cargo test -p codex-tui bottom_pane::command_popup`	2026-04-25 14:25:58 -07:00
Michael Bolin	f41306b4f3	test: isolate remote thread store regression from plugin warmups (#19593 ) Follow-up to #19266. ## Why `thread_start_with_non_local_thread_store_does_not_create_local_persistence` is meant to catch accidental local thread persistence when a non-local thread store is configured. The Windows flake reported in [this BuildBuddy invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d) showed that the assertion was tripping on an unexpected top-level `.tmp` entry: ```diff { + ".tmp", "config.toml", "installation_id", "memories", "skills", } ``` That `.tmp` does not appear to come from `tempfile::TempDir`; it comes from unrelated plugin startup work that can legitimately materialize `codex_home/.tmp`, including the startup remote plugin sync marker in [`core/src/plugins/startup_sync.rs`](`bce74c70ce/codex-rs/core/src/plugins/startup_sync.rs (L13-L15)`) and the curated plugin snapshot under [`.tmp/plugins`](`bce74c70ce/codex-rs/core-plugins/src/startup_sync.rs (L25-L26)`). That makes the regression race unrelated background startup tasks instead of validating the thread-store invariant it was added to cover. Rather than weakening the assertion to allow arbitrary `.tmp` entries, this change isolates the test from plugin warmups so it can stay strict about unexpected local thread persistence artifacts. ## What changed - disable plugins in the generated config used by `app-server/tests/suite/v2/remote_thread_store.rs` - keep the existing `codex_home` assertions unchanged so the test still fails if local session or sqlite persistence is introduced ## Verification - `cargo test -p codex-app-server suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence -- --exact`	2026-04-25 20:45:31 +00:00
Eric Traut	bce74c70ce	Restore persisted model provider on thread resume (#19287 ) Fixes #15219. ## Why `thread/resume` should continue a persisted thread with the same model provider that created the thread. The app server already restores the persisted model and reasoning effort before resuming, but it was leaving `model_provider` unset. If a user created a thread with one provider and later switched their active profile to another provider, resumed encrypted history could be sent to the wrong endpoint and fail with `invalid_encrypted_content`. The thread metadata already records the original provider, so resume should apply it when the caller has not explicitly requested a different model/provider/reasoning configuration. ## What changed This updates `merge_persisted_resume_metadata` in `app-server/src/codex_message_processor.rs` to copy `ThreadMetadata::model_provider` into `ConfigOverrides::model_provider` alongside the persisted model. The existing resume metadata tests now also assert that: - the persisted provider is restored for normal resume - explicit model, provider, or reasoning-effort overrides still prevent persisted resume metadata from being applied - a thread with no persisted model or reasoning effort still resumes with its persisted provider ## Verification - `cargo test -p codex-app-server` passed the app-server unit tests, including the updated resume metadata coverage. The broader integration portion of that command failed in an unrelated environment-sensitive skills-budget warning assertion, where this run saw 8 omitted skills instead of the expected 7. - `just fix -p codex-app-server` completed successfully.	2026-04-25 12:40:00 -07:00
Michael Bolin	88f300d74d	fix: increase Bazel timeout to 45 minutes (#19578 ) Unfortunately, if most of the build graph is invalidated such that there are few cache hits, the Windows Bazel build for all the tests often takes more than `30` minutes, so this PR increases the timeout to `45` minutes until we set up distributed builds.	2026-04-25 10:03:01 -07:00
Ahmed Ibrahim	022f81df1f	[codex] Order codex-mcp items by visibility (#19526 ) ## Why The visibility cleanup in the base PR reduced what `codex-mcp` exposes, but several files still made reviewers read private support machinery before the public or crate-facing entry points. This ordering pass makes each file easier to scan: exported API first, crate-visible MCP internals next, then private helpers in breadth-first order from the higher-level MCP flows to leaf utilities. ## What Changed - Reordered `codex-mcp` exports so the runtime, configuration, snapshot, auth, and helper surfaces are grouped by visibility and reader importance. - Moved public and crate-visible MCP items ahead of private helpers in the auth, MCP planning/snapshot, connection manager, and tool-name modules. - Kept the change mechanical, with no behavior changes intended. ## Verification - `cargo check -p codex-mcp`	2026-04-25 07:17:30 -07:00
Ahmed Ibrahim	706490ab1b	[codex] Prune unused codex-mcp API and duplicate helpers (#19524 ) ## Why `codex-mcp` currently exposes more API than the rest of the workspace uses. Some of that surface is simply visibility that can be tightened, and some of it is public helper code that remains compiler-valid because it is exported even though no workspace caller uses it. That distinction matters: Rust does not warn on exported API just because the current workspace does not call it. This PR intentionally treats those exported-but-workspace-unreferenced paths as stale `codex-mcp` surface. The main example is MCP skill dependency collection, where the active implementation now lives in `codex-rs/core/src/mcp_skill_dependencies.rs`; keeping the older `codex-mcp` copy makes it unclear which implementation owns skill MCP installation. ## What Changed - Pruned unused `codex-mcp` re-exports from `codex-mcp/src/lib.rs`. - Removed non-runtime helper methods from `McpConnectionManager` so it stays focused on live MCP clients. - Made `ToolPluginProvenance` lookup methods crate-private. - Removed workspace-unreferenced snapshot wrapper APIs and qualified-tool grouping helpers. - Deleted the duplicate `codex-mcp` skill dependency module and tests now that skill MCP dependency handling is owned by `core`. ## Verification - `cargo check -p codex-mcp`	2026-04-25 06:36:07 -07:00
Matthew Zeng	6e838a19fa	Enable unavailable dummy tools by default (#19459 ) ## Summary - Mark `unavailable_dummy_tools` as a stable feature and enable it by default - Update the feature registry test to match the new default state ## Testing - `just fmt` - `cargo test -p codex-features`	2026-04-25 08:46:57 +00:00
Eric Traut	a2db6f97fb	Fix codex-rs README grammar (#19514 ) ## Why Issue #19418 points out a small grammar issue in `codex-rs/README.md` under "Code Organization." The current sentence says "we hope this to be," which reads awkwardly. Fixes #19418. ## What changed Updated the `core/` crate description so the sentence reads "we hope this becomes a library crate." ## Verification Documentation-only change. Reviewed the Markdown diff.	2026-04-24 23:31:47 -07:00
Dylan Hurd	f5497f4d65	Split approval matrix test groups (#19454 ) ## Why Recent `main` CI repeatedly timed out in: - `codex-core::all suite::approvals::approval_matrix_covers_all_modes` It failed in runs [24909500958](https://github.com/openai/codex/actions/runs/24909500958), [24908076251](https://github.com/openai/codex/actions/runs/24908076251), [24906197645](https://github.com/openai/codex/actions/runs/24906197645), [24905823212](https://github.com/openai/codex/actions/runs/24905823212), [24903439629](https://github.com/openai/codex/actions/runs/24903439629), [24903336028](https://github.com/openai/codex/actions/runs/24903336028), and [24898949647](https://github.com/openai/codex/actions/runs/24898949647). The failure pattern was a 60s Linux remote timeout. Logs showed many approval scenarios completing before the single matrix test timed out. ## Root Cause `approval_matrix_covers_all_modes` packed every approval/sandbox/tool scenario into one test case. That made the test vulnerable to normal CI variance: one slow scenario or a slow process startup could push the whole monolithic case past the 60s per-test timeout. It also hid which part of the matrix was slow because the runner only reported the one large matrix test. ## What Changed - Keep the shared `scenarios()` table as the single source of approval matrix coverage. - Use one `#[test_case]` per `ScenarioGroup` to generate five async Tokio tests: danger/full-access, read-only, workspace-write, apply-patch, and unified-exec. - Keep the group runner small and add per-scenario error context so a failure still reports the specific scenario name. ## Why This Should Be Reliable Each scenario group now has its own test harness timeout instead of sharing one timeout window with the full matrix. That removes the long sequential loop from a single test while keeping the implementation compact and easy to scan. The tests still run through the same scenario definitions and runner, so this preserves coverage. `test-case` already composes with `#[tokio::test]` in this crate and is already available for test code. ## Verification - `cargo test -p codex-core --test all approval_matrix_ -- --list` - `cargo test -p codex-core --test all approval_matrix_`	2026-04-24 21:38:27 -07:00
Eric Traut	f1c963d77e	Add goal TUI UX (5 / 5) (#18077 ) Adds the TUI user experience for goals on top of the core runtime from PR 4. ## Why Users need a direct TUI control surface for long-running goals. The UI should make the current goal visible, support common goal actions without waiting for a model turn, and avoid confusing end-of-turn notifications while an active goal is immediately continuing. ## What changed - Added `/goal` summary rendering for the current goal, including active, paused, budget-limited, and complete states. - Added `/goal <objective>` creation/replacement through the app-server goal API rather than a model prompt. - Added `/goal clear`, `/goal pause`, and `/goal unpause` command variants. - Added a confirmation menu when the user enters a new goal while another goal already exists. - Updated `/goal` help and summary tip text so it reflects the supported command variants without advertising slash-command token budgets. - Added footer/statusline goal indicators, including elapsed time and token budget display when a budget exists from API/tool-created goals. - Consumes goal updated/cleared notifications so the TUI stays in sync with external app-server changes. - Suppresses end-of-turn desktop notifications only when a goal is still active and follow-up work is expected. - Preserves slash-command history behavior and avoids leaking queued `/goal` state into unrelated submissions. ## Verification - Added TUI unit and snapshot coverage for goal command availability, summary rendering, control commands, replacement menu behavior, status/footer display, notification handling, and command history.	2026-04-24 21:16:45 -07:00

1 2 3 4 5 ...

5854 Commits