codex

mirror of https://github.com/openai/codex.git synced 2026-04-30 17:36:40 +00:00

Author	SHA1	Message	Date
Michael Bolin	b23789b770	[codex] import token_data from codex-login directly (#15903 ) ## Why `token_data` is owned by `codex-login`, but `codex-core` was still re-exporting it. That let callers pull auth token types through `codex-core`, which keeps otherwise unrelated crates coupled to `codex-core` and makes `codex-core` more of a build-graph bottleneck. ## What changed - remove the `codex-core` re-export of `codex_login::token_data` - update the remaining `codex-core` internals that used `crate::token_data` to import `codex_login::token_data` directly - update downstream callers in `codex-rs/chatgpt`, `codex-rs/tui_app_server`, `codex-rs/app-server/tests/common`, and `codex-rs/core/tests` to import `codex_login::token_data` directly - add explicit `codex-login` workspace dependencies and refresh lock metadata for crates that now depend on it directly ## Validation - `cargo test -p codex-chatgpt --locked` - `just argument-comment-lint` - `just bazel-lock-update` - `just bazel-lock-check` ## Notes - attempted `cargo test -p codex-core --locked` and `cargo test -p codex-core auth_refresh --locked`, but both ran out of disk while linking `codex-core` test binaries in the local environment	2026-03-26 13:34:02 -07:00
rreichel3-oai	86764af684	Protect first-time project .codex creation across Linux and macOS sandboxes (#15067 ) ## Problem Codex already treated an existing top-level project `./.codex` directory as protected, but there was a gap on first creation. If `./.codex` did not exist yet, a turn could create files under it, such as `./.codex/config.toml`, without going through the same approval path as later modifications. That meant the initial write could bypass the intended protection for project-local Codex state. ## What this changes This PR closes that first-creation gap in the Unix enforcement layers: - `codex-protocol` - treat the top-level project `./.codex` path as a protected carveout even when it does not exist yet - avoid injecting the default carveout when the user already has an explicit rule for that exact path - macOS Seatbelt - deny writes to both the exact protected path and anything beneath it, so creating `./.codex` itself is blocked in addition to writes inside it - Linux bubblewrap - preserve the same protected-path behavior for first-time creation under `./.codex` - tests - add protocol regressions for missing `./.codex` and explicit-rule collisions - add Unix sandbox coverage for blocking first-time `./.codex` creation - tighten Seatbelt policy assertions around excluded subpaths ## Scope This change is intentionally scoped to protecting the top-level project `.codex` subtree from agent writes. It does not make `.codex` unreadable, and it does not change the product behavior around loading project skills from `.codex` when project config is untrusted. ## Why this shape The fix is pointed rather than broad: - it preserves the current model of “project `.codex` is protected from writes” - it closes the security-relevant first-write hole - it avoids folding a larger permissions-model redesign into this PR ## Validation - `cargo test -p codex-protocol` - `cargo test -p codex-sandboxing seatbelt` - `cargo test -p codex-exec --test all sandbox_blocks_first_time_dot_codex_creation -- --nocapture` --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-03-26 16:06:53 -04:00
Ruslan Nigmatullin	9736fa5e3d	app-server: Split transport module (#15811 ) `transport.rs` is getting pretty big, split individual transport implementations into separate files.	2026-03-26 13:01:35 -07:00
Michael Bolin	b3e069e8cb	skills: remove unused skill permission metadata (#15900 ) ## Why Skill metadata accepted a `permissions` block and stored the result on `SkillMetadata`, but that data was never consumed by runtime behavior. Leaving the dead parsing path in place makes it look like skills can widen or otherwise influence execution permissions when, in practice, declared skill permissions are ignored. This change removes that misleading surface area so the skill metadata model matches what the system actually uses. ## What changed - removed `permission_profile` and `managed_network_override` from `core-skills::SkillMetadata` - stopped parsing `permissions` from skill metadata in `core-skills/src/loader.rs` - deleted the loader tests that only exercised the removed permissions parsing path - cleaned up dependent `SkillMetadata` constructors in tests and TUI code that were only carrying `None` for those fields ## Testing - `cargo test -p codex-core-skills` - `cargo test -p codex-tui submission_prefers_selected_duplicate_skill_path` - `just argument-comment-lint`	2026-03-26 19:33:23 +00:00
viyatb-oai	b6050b42ae	fix: resolve bwrap from trusted PATH entry (#15791 ) ## Summary - resolve system bwrap from PATH instead of hardcoding /usr/bin/bwrap - skip PATH entries that resolve inside the current workspace before launching the sandbox helper - keep the vendored bubblewrap fallback when no trusted system bwrap is found ## Validation - cargo test -p codex-core bwrap --lib - cargo test -p codex-linux-sandbox - just fix -p codex-core - just fix -p codex-linux-sandbox - just fmt - just argument-comment-lint - cargo clean	2026-03-26 12:13:51 -07:00
Matthew Zeng	3360f128f4	[plugins] Polish tool suggest prompts. (#15891 ) - [x] Polish tool suggest prompts to distinguish between missing connectors and discoverable plugins, and be very precise about the triggering conditions.	2026-03-26 18:52:59 +00:00
Matthew Zeng	25134b592c	[mcp] Fix legacy_tools (#15885 ) - [x] Fix legacy_tools	2026-03-26 11:08:49 -07:00
Felipe Coury	2c54d4b160	feat(tui): add terminal title support to tui app server (#15860 ) ## TR;DR Replicates the `/title` command from `tui` to `tui_app_server`. ## Problem The classic `tui` crate supports customizing the terminal window/tab title via `/title`, but the `tui_app_server` crate does not. Users on the app-server path have no way to configure what their terminal title shows (project name, status, spinner, thread, etc.), making it harder to identify Codex sessions across tabs or windows. ## Mental model The terminal title is a status surface -- conceptually parallel to the footer status line. Both surfaces are configurable lists of items, both share expensive inputs (git branch lookup, project root discovery), and both must be refreshed at the same lifecycle points. This change ports the classic `tui`'s design verbatim: 1. `terminal_title.rs` owns the low-level OSC write path and input sanitization. It strips control characters and bidi/invisible codepoints before placing untrusted text (model output, thread names, project paths) inside an escape sequence. 2. `title_setup.rs` defines `TerminalTitleItem` (the 8 configurable items) and `TerminalTitleSetupView` (the interactive picker that wraps `MultiSelectPicker`). 3. `status_surfaces.rs` is the shared refresh pipeline. It parses both surface configs once per refresh, warns about invalid items once per session, synchronizes the git-branch cache, then renders each surface from the same `StatusSurfaceSelections` snapshot. 4. `chatwidget.rs` sets `TerminalTitleStatusKind` at each state transition (Working, Thinking, Undoing, WaitingForBackgroundTerminal) and calls `refresh_terminal_title()` whenever relevant state changes. 5. `app.rs` handles the three setup events (confirm/preview/cancel), persists config via `ConfigEditsBuilder`, and clears the managed title on `Drop`. ## Non-goals - Restoring the previous terminal title on exit. There is no portable way to read the terminal's current title, so `Drop` clears the managed title rather than restoring it. - Sharing code between `tui` and `tui_app_server`. The implementation is a parallel copy, matching the existing pattern for the status-line feature. Extracting a shared crate is future work. ## Tradeoffs - Duplicate code across crates. The three core files (`terminal_title.rs`, `title_setup.rs`, `status_surfaces.rs`) are byte-for-byte copies from the classic `tui`. This was chosen for consistency with the existing status-line port and to avoid coupling the two crates at the dependency level. Future changes must be applied in both places. - `status_surfaces.rs` is large (~660 lines). It absorbs logic that previously lived inline in `chatwidget.rs` (status-line refresh, git branch management, project root discovery) plus all new terminal-title logic. This consolidation trades file size for a single place where both surfaces are coordinated. - Spinner scheduling on every refresh. The terminal title spinner (when active) schedules a frame every 100ms. This is the same pattern the status-indicator spinner already uses; the overhead is a timer registration, not a redraw. ## Architecture ``` /title command -> SlashCommand::Title -> open_terminal_title_setup() -> TerminalTitleSetupView (MultiSelectPicker) -> on_change: AppEvent::TerminalTitleSetupPreview -> preview_terminal_title() -> on_confirm: AppEvent::TerminalTitleSetup -> ConfigEditsBuilder + setup_terminal_title() -> on_cancel: AppEvent::TerminalTitleSetupCancelled -> cancel_terminal_title_setup() Runtime title refresh: state change (turn start, reasoning, undo, plan update, thread rename, ...) -> set terminal_title_status_kind -> refresh_terminal_title() -> status_surface_selections() (parse configs, collect invalids) -> refresh_terminal_title_from_selections() -> terminal_title_value_for_item() for each configured item -> assemble title string with separators -> skip if identical to last_terminal_title (dedup OSC writes) -> set_terminal_title() (sanitize + OSC 0 write) -> schedule spinner frame if animating Widget replacement: replace_chat_widget_with_app_server_thread() -> transfer last_terminal_title from old widget to new -> avoids redundant OSC clear+rewrite on session switch ``` ## Observability - Invalid terminal-title item IDs in config emit a one-per-session warning via `on_warning()` (gated by `terminal_title_invalid_items_warned` `AtomicBool`). - OSC write failures are logged at `tracing::debug` level. - Config persistence failures are logged at `tracing::error` and surfaced to the user via `add_error_message()`. ## Tests - `terminal_title.rs`: 4 unit tests covering sanitization (control chars, bidi codepoints, truncation) and OSC output format. - `title_setup.rs`: 3 tests covering setup view snapshot rendering, parse order preservation, and invalid-ID rejection. - `chatwidget/tests.rs`: Updated test helpers with new fields; existing tests continue to pass. --------- Co-authored-by: Eric Traut <etraut@openai.com>	2026-03-26 11:59:12 -06:00
jif-oai	970386e8b2	fix: root as std agent (#15881 )	2026-03-26 18:57:34 +01:00
evawong-oai	0bd34c28c7	Add wildcard in the middle test coverage (#15813 ) ## Summary Add a focused codex network proxy unit test for the denylist pattern with wildcard in the middle `region.some.malicious.tunnel.com`. This does not change how existing code works, just ensure that behavior stays the same and we got CI guards to guard existin behavior. ## Why The managed Codex denylist update relies on this mid label glob form, and the existing tests only covered exact hosts, `.` subdomains, and `**.` apex plus subdomains. ## Validation `cargo test -p codex-network-proxy compile_globset_supports_mid_label_wildcards` `cargo test -p codex-network-proxy` `./tools/argument-comment-lint/run-prebuilt-linter.sh -p codex-network-proxy`	2026-03-26 17:53:31 +00:00
Adrian	af04273778	[codex] Block unsafe git global options from safe allowlist (#15796 ) ## Summary - block git global options that can redirect config, repository, or helper lookup from being auto-approved as safe - share the unsafe global-option predicate across the Unix and Windows git safety checks - add regression coverage for inline and split forms, including `bash -lc` and PowerShell wrappers ## Root cause The Unix safe-command gate only rejected `-c` and `--config-env`, even though the shared git parser already knew how to skip additional pre-subcommand globals such as `--git-dir`, `--work-tree`, `--exec-path`, `--namespace`, and `--super-prefix`. That let those arguments slip through safe-command classification on otherwise read-only git invocations and bypass approval. The Windows-specific safe-command path had the same trust-boundary gap for git global options.	2026-03-26 10:46:04 -07:00
Michael Bolin	e36ebaa3da	fix: box apply_patch test harness futures (#15835 ) ## Why `#[large_stack_test]` made the `apply_patch_cli` tests pass by giving them more stack, but it did not address why those tests needed the extra stack in the first place. The real problem is the async state built by the `apply_patch_cli` harness path. Those tests await three helper boundaries directly: harness construction, turn submission, and apply-patch output collection. If those helpers inline their full child futures, the test future grows to include the whole harness startup and request/response path. This change replaces the workaround from #12768 with the same basic approach used in #13429, but keeps the fix narrower: only the helper boundaries awaited directly by `apply_patch_cli` stay boxed. ## What Changed - removed `#[large_stack_test]` from `core/tests/suite/apply_patch_cli.rs` - restored ordinary `#[tokio::test(flavor = "multi_thread", worker_threads = 2)]` annotations in that suite - deleted the now-unused `codex-test-macros` crate and removed its workspace wiring - boxed only the three helper boundaries that the suite awaits directly: - `apply_patch_harness_with(...)` - `TestCodexHarness::submit(...)` - `TestCodexHarness::apply_patch_output(...)` - added comments at those boxed boundaries explaining why they remain boxed ## Testing - `cargo test -p codex-core --test all suite::apply_patch_cli -- --nocapture` ## References - #12768 - #13429	2026-03-26 17:32:04 +00:00
Eric Traut	e7139e14a2	Enable `tui_app_server` feature by default (#15661 )	2026-03-26 11:28:25 -06:00
nicholasclark-openai	8d479f741c	Add MCP connector metrics (#15805 ) ## Summary - enrich `codex.mcp.call` with `tool`, `connector_id`, and sanitized `connector_name` for actual MCP executions - record `codex.mcp.call.duration_ms` for actual MCP executions so connector-level latency is visible in metrics - keep skipped, blocked, declined, and cancelled paths on the plain status-only `codex.mcp.call` counter ## Included Changes - `codex-rs/core/src/mcp_tool_call.rs`: add connector-sliced MCP count and duration metrics only for executed tool calls, while leaving non-executed outcomes as status-only counts - `codex-rs/core/src/mcp_tool_call_tests.rs`: cover metric tag shaping, connector-name sanitization, and the new duration metric tags ## Testing - `cargo test -p codex-core` - `just fix -p codex-core` - `just fmt` ## Notes - `cargo test -p codex-core` still hits existing unrelated failures in approvals-reviewer config tests and the sandboxed JS REPL `mktemp` test - full workspace `cargo test` was not run --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-26 17:08:02 +00:00
Eric Traut	0d44bd708e	Fix duplicate /review messages in app-server TUI (#15839 ) ## Symptoms When `/review` ran through `tui_app_server`, the TUI could show duplicate review content: - the `>> Code review started: ... <<` banner appeared twice - the final review body could also appear twice ## Problem `tui_app_server` was treating review lifecycle items as renderable content on more than one delivery path. Specifically: - `EnteredReviewMode` was rendered both when the item started and again when it completed - `ExitedReviewMode` rendered the review text itself, even though the same review text was also delivered later as the assistant message item That meant the same logical review event was committed into history multiple times. ## Solution Make review lifecycle items control state transitions only once, and keep the final review body sourced from the assistant message item: - render the review-start banner from the live `ItemStarted` path, while still allowing replay to restore it once - treat `ExitedReviewMode` as a mode-exit/finish-banner event instead of rendering the review body from it - preserve the existing assistant-message rendering path as the single source of final review text	2026-03-26 10:55:18 -06:00
jif-oai	352f37db03	fix: max depth agent still has v2 tools (#15880 )	2026-03-26 17:36:12 +01:00
Matthew Zeng	c9214192c5	[plugins] Update the suggestable plugins list. (#15829 ) - [x] Update the suggestable plugins list to be featured plugins.	2026-03-26 15:53:22 +00:00
jif-oai	6d2f4aaafc	feat: use `ProcessId` in `exec-server` (#15866 ) Use a full struct for the ProcessId to increase readability and make it easier in the future to make it evolve if needed	2026-03-26 16:45:36 +01:00
jif-oai	26c66f3ee1	fix: flaky (#15869 )	2026-03-26 16:07:32 +01:00
Michael Bolin	01fa4f0212	core: remove special execve handling for skill scripts (#15812 )	2026-03-26 07:46:04 -07:00
jif-oai	6dcac41d53	chore: drop artifacts lib (#15864 )	2026-03-26 15:28:59 +01:00
jif-oai	7dac332c93	feat: exec-server prep for unified exec (#15691 ) This PR partially rebase `unified_exec` on the `exec-server` and adapt the `exec-server` accordingly. ## What changed in `exec-server` 1. Replaced the old "broadcast-driven; process-global" event model with process-scoped session events. The goal is to be able to have dedicated handler for each process. 2. Add to protocol contract to support explicit lifecycle status and stream ordering: - `WriteResponse` now returns `WriteStatus` (Accepted, UnknownProcess, StdinClosed, Starting) instead of a bool. - Added seq fields to output/exited notifications. - Added terminal process/closed notification. 3. Demultiplexed remote notifications into per-process channels. Same as for the event sys 4. Local and remote backends now both implement ExecBackend. 5. Local backend wraps internal process ID/operations into per-process ExecProcess objects. 6. Remote backend registers a session channel before launch and unregisters on failed launch. ## What changed in `unified_exec` 1. Added unified process-state model and backend-neutral process wrapper. This will probably disappear in the future, but it makes it easier to keep the work flowing on both side. - `UnifiedExecProcess` now handles both local PTY sessions and remote exec-server processes through a shared `ProcessHandle`. - Added `ProcessState` to track has_exited, exit_code, and terminal failure message consistently across backends. 2. Routed write and lifecycle handling through process-level methods. ## Some rationals 1. The change centralizes execution transport in exec-server while preserving policy and orchestration ownership in core, avoiding duplicated launch approval logic. This comes from internal discussion. 2. Session-scoped events remove coupling/cross-talk between processes and make stream ordering and terminal state explicit (seq, closed, failed). 3. The failure-path surfacing (remote launch failures, write failures, transport disconnects) makes command tool output and cleanup behavior deterministic ## Follow-ups: * Unify the concept of thread ID behind an obfuscated struct * FD handling * Full zsh-fork compatibility * Full network sandboxing compatibility * Handle ws disconnection	2026-03-26 15:22:34 +01:00
jif-oai	4a5635b5a0	feat: clean spawn v1 (#15861 ) Avoid the usage of path in the v1 spawn	2026-03-26 15:01:00 +01:00
jif-oai	b00a05c785	feat: drop artifact tool and feature (#15851 )	2026-03-26 13:21:24 +01:00
jif-oai	7ef3cfe63e	feat: replace askama by custom lib (#15784 ) Finalise the drop of `askama` to use our internal lib instead	2026-03-26 10:33:25 +01:00
viyatb-oai	937cb5081d	fix: fix old system bubblewrap compatibility without falling back to vendored bwrap (#15693 ) Fixes #15283. ## Summary Older system bubblewrap builds reject `--argv0`, which makes our Linux sandbox fail before the helper can re-exec. This PR keeps using system `/usr/bin/bwrap` whenever it exists and only falls back to vendored bwrap when the system binary is missing. That matters on stricter AppArmor hosts, where the distro bwrap package also provides the policy setup needed for user namespaces. For old system bwrap, we avoid `--argv0` instead of switching binaries: - pass the sandbox helper a full-path `argv0`, - keep the existing `current_exe() + --argv0` path when the selected launcher supports it, - otherwise omit `--argv0` and re-exec through the helper's own `argv[0]` path, whose basename still dispatches as `codex-linux-sandbox`. Also updates the launcher/warning tests and docs so they match the new behavior: present-but-old system bwrap uses the compatibility path, and only absent system bwrap falls back to vendored. ### Validation 1. Install Ubuntu 20.04 in a VM 2. Compile codex and run without bubblewrap installed - see a warning about falling back to the vendored bwrap 3. Install bwrap and verify version is 0.4.0 without `argv0` support 4. run codex and use apply_patch tool without errors <img width="802" height="631" alt="Screenshot 2026-03-25 at 11 48 36 PM" src="https://github.com/user-attachments/assets/77248a29-aa38-4d7c-9833-496ec6a458b8" /> <img width="807" height="634" alt="Screenshot 2026-03-25 at 11 47 32 PM" src="https://github.com/user-attachments/assets/5af8b850-a466-489b-95a6-455b76b5050f" /> <img width="812" height="635" alt="Screenshot 2026-03-25 at 11 45 45 PM" src="https://github.com/user-attachments/assets/438074f0-8435-4274-a667-332efdd5cb57" /> <img width="801" height="623" alt="Screenshot 2026-03-25 at 11 43 56 PM" src="https://github.com/user-attachments/assets/0dc8d3f5-e8cf-4218-b4b4-a4f7d9bf02e3" /> --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-03-25 23:51:39 -07:00
Tiffany Citra	6d0525ae70	Expand home-relative paths on Windows (#15817 ) Follow up to: https://github.com/openai/codex/pull/9193, also support this for Windows. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-03-25 21:19:57 -07:00
Eric Traut	1ff39b6fa8	Wire remote app-server auth through the client (#14853 ) For app-server websocket auth, support the two server-side mechanisms from PR #14847: - `--ws-auth capability-token --ws-token-file /abs/path` - `--ws-auth signed-bearer-token --ws-shared-secret-file /abs/path` with optional `--ws-issuer`, `--ws-audience`, and `--ws-max-clock-skew-seconds` On the client side, add interactive remote support via: - `--remote ws://host:port` or `--remote wss://host:port` - `--remote-auth-token-env <ENV_VAR>` Codex reads the bearer token from the named environment variable and sends it as `Authorization: Bearer <token>` during the websocket handshake. Remote auth tokens are only allowed for `wss://` URLs or loopback `ws://` URLs. Testing: - tested both auth methods manually to confirm connection success and rejection for both auth types	2026-03-25 22:17:03 -06:00
Eric Traut	b565f05d79	Fix quoted command rendering in tui_app_server (#15825 ) When `tui_app_server` is enabled, shell commands in the transcript render as fully quoted invocations like `/bin/zsh -lc "..."`. The non-app-server TUI correctly shows the parsed command body. Root cause: The app-server stores `ThreadItem::CommandExecution.command` as a shell-quoted string. When `tui_app_server` bridges that item back into the exec renderer, it was passing `vec![command]` unchanged instead of splitting the string back into argv. That prevented `strip_bash_lc_and_escape()` from recognizing the shell wrapper, so the renderer displayed the wrapper literally. Solution: Add a shared command-string splitter that round-trips shell-quoted commands back into argv when it is safe to do so, while preserving non-roundtrippable inputs as a single string. Use that helper everywhere `tui_app_server` reconstructs exec commands from app-server payloads, including live command-execution items, replayed thread items, and exec approval requests. This restores the same command display behavior as the direct TUI path without breaking Windows-style commands that cannot be safely round-tripped.	2026-03-25 22:03:29 -06:00
Matthew Zeng	4b50446ffa	[plugins] Flip flags on. (#15820 ) - [x] Flip flags on.	2026-03-26 03:24:06 +00:00
Andrei Eternal	c4d9887f9a	[hooks] add non-streaming (non-stdin style) shell-only PostToolUse support (#15531 ) CHAINED PR - note that base is eternal/hooks-pretooluse-bash, not main -- so the following PR should be first Matching post-tool hook to the pre-tool functionality here: https://github.com/openai/codex/pull/15211 So, PreToolUse calls for plain shell calls, allows blocking. This PostToolUse call runs after the command executed example run: ``` › as a test, run in parallel the following commands: - echo 'one' - echo '[block-pre-tool-use]' - echo '[block-post-tool-use]' ⚠ MCP startup incomplete (failed: notion, linear) • Cruising through those three commands in parallel now, and I’ll share the exact outputs right after they land. • Running PreToolUse hook: checking the observatory runes • Running PreToolUse hook: checking the observatory runes • Running PreToolUse hook: checking the observatory runes PreToolUse hook (blocked) warning: wizard-tower PreToolUse demo blocked a Bash command on purpose. feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue. PreToolUse hook (completed) warning: wizard-tower PreToolUse demo inspected Bash: echo 'one' PreToolUse hook (completed) warning: wizard-tower PreToolUse demo inspected Bash: echo '[block-post-tool-use]' • Ran echo '[block-post-tool-use]' └ [block-post-tool-use] • Ran echo 'one' └ one • Running PostToolUse hook: reading the observatory aftermath • Running PostToolUse hook: reading the observatory aftermath PostToolUse hook (blocked) warning: wizard-tower PostToolUse demo blocked the Bash follow-through on purpose. feedback: PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue. PostToolUse hook (completed) warning: wizard-tower PostToolUse demo inspected Bash: echo 'one' hook context: PostToolUse demo saw the Bash result. Response preview: one ────────────────────────────────────────────────────────────────────────────────────────────────────── • Ran them in parallel. Little wave report: - echo 'one' completed and returned one. - echo '[block-pre-tool-use]' was blocked before execution by the PreToolUse hook. - echo '[block-post-tool-use]' ran, but its result was blocked afterward by the PostToolUse hook. The hook messages were: - PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue. - PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue. › for the blocked messages (correctly blocked in this test) did the tool output land back into context? • Yeah, from this run: - PreToolUse block: the command itself did not execute, so no command output landed in context. Only the block message did. - PostToolUse block: the command did execute, but the actual stdout did not come back into context. Only the post-tool block message did. So the blocked notifications landed in context, but the blocked command payloads themselves did not. Nice clean guardrail, aloha. ```	2026-03-25 19:18:03 -07:00
Matthew Zeng	78799c1bcf	[mcp] Improve custom MCP elicitation (#15800 ) - [x] Support don't ask again for custom MCP tool calls. - [x] Don't run arc in yolo mode. - [x] Run arc for custom MCP tools in always allow mode.	2026-03-26 01:02:37 +00:00
Ruslan Nigmatullin	d7e35e56cf	app-server: Organize app-server to allow more transports (#15810 ) Make `run_main_with_transport` slightly more flexible by consolidating logic spread across stdio and websocket transports.	2026-03-25 17:11:22 -07:00
canvrno-oai	2794e27849	Add ReloadUserConfig to tui_app_server (#15806 ) - Adds ReloadUserConfig to `tui_app_server`	2026-03-25 17:03:18 -07:00
pakrym-oai	8fa88fa8ca	Add cached environment manager for exec server URL (#15785 ) Add environment manager that is a singleton and is created early in app-server (before skill manager, before config loading). Use an environment variable to point to a running exec server.	2026-03-25 16:14:36 -07:00
canvrno-oai	f24c55f0d5	TUI plugin menu polish (#15802 ) - Add "OpenAI Curated" display name for `openai-curated` marketplace - Hide /apps menu - Change app install phase display text	2026-03-25 16:09:19 -07:00
arnavdugar-openai	eee692e351	Treat ChatGPT `hc` plan as Enterprise (#15789 )	2026-03-25 15:41:29 -07:00
nicholasclark-openai	b6524514c1	Add MCP tool call spans (#15659 ) ## Summary - add an explicit `mcp.tools.call` span around MCP tool execution in core - keep MCP span validation local to `mcp_tool_call_tests` instead of broadening the integration test suite - inline the turn/session correlation fields directly in the span initializer ## Included Changes - `codex-rs/core/src/mcp_tool_call.rs`: wrap the existing MCP tool call in `mcp.tools.call` and inline `conversation.id`, `session.id`, and `turn.id` in the span initializer - `codex-rs/core/src/mcp_tool_call_tests.rs`: assert the MCP span records the expected correlation and server fields ## Testing - `cargo test -p codex-core` - `just fmt` ## Notes - `cargo test -p codex-core` still hits existing unrelated failures in guardian-config tests and the sandboxed JS REPL `mktemp` test - metric work moved to stacked PR #15792 - transport-level RMCP spans and trace propagation remain in stacked PR #15792 - full workspace `cargo test` was not run --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-25 22:13:02 +00:00
Eric Traut	2c67a27a71	Avoid duplicate auth refreshes in `getAuthStatus` (#15798 ) I've seen several intermittent failures of `get_auth_status_returns_token_after_proactive_refresh_recovery` today. I investigated, and I found a couple of issues. First, `getAuthStatus(refreshToken=true)` could refresh twice in one request: once via `refresh_token_if_requested()` and again via the proactive refresh path inside `auth_manager.auth()`. In the permanent-failure case this produced an extra `/oauth/token` call and made the app-server auth tests flaky. Use `auth_cached()` after an explicit refresh request so the handler reuses the post-refresh auth state instead of immediately re-entering proactive refresh logic. Keep the existing proactive path for `refreshToken=false`. Second, serialize auth refresh attempts in `AuthManager` have a startup/request race. One proactive refresh could already be in flight while a `getAuthStatus(refreshToken=false)` request entered `auth().await`, causing a second `/oauth/token` call before the first failure or refresh result had been recorded. Guarding the refresh flow with a single async lock makes concurrent callers share one refresh result, which prevents duplicate refreshes and stabilizes the proactive-refresh auth tests.	2026-03-25 16:03:53 -06:00
Ahmed Ibrahim	9dbe098349	Extract codex-core-skills crate (#15749 ) ## Summary - move skill loading and management into codex-core-skills - leave codex-core with the thin integration layer and shared wiring ## Testing - CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-25 12:57:42 -07:00
Felipe Coury	e9996ec62a	fix(tui_app_server): preserve transcript events under backpressure (#15759 ) ## TL;DR When running codex with `-c features.tui_app_server=true` we see corruption when streaming large amounts of data. This PR marks other event types as _critical_ by making them _must-deliver_. ## Problem When the TUI consumer falls behind the app-server event stream, the bounded `mpsc` channel fills up and the forwarding layer drops events via `try_send`. Previously only `TurnCompleted` was marked as must-deliver. Streamed assistant text (`AgentMessageDelta`) and the authoritative final item (`ItemCompleted`) were treated as droppable — the same as ephemeral command output deltas. Because the TUI renders markdown incrementally from these deltas, dropping any of them produces permanently corrupted or incomplete paragraphs that persist for the rest of the session. ## Mental model The app-server event stream has two tiers of importance: 1. Lossless (transcript + terminal): Events that form the authoritative record of what the assistant said or that signal turn lifecycle transitions. Losing any of these corrupts the visible output or leaves surfaces waiting forever. These are: `AgentMessageDelta`, `PlanDelta`, `ReasoningSummaryTextDelta`, `ReasoningTextDelta`, `ItemCompleted`, and `TurnCompleted`. 2. Best-effort (everything else): Ephemeral status events like `CommandExecutionOutputDelta` and progress notifications. Dropping these under load causes cosmetic gaps but no permanent corruption. The forwarding layer uses `try_send` for best-effort events (dropping on backpressure) and blocking `send().await` for lossless events (applying back-pressure to the producer until the consumer catches up). ## Non-goals - Eliminating backpressure entirely. The bounded queue is intentional; this change only widens the set of events that survive it. - Changing the event protocol or adding new notification types. - Addressing root causes of consumer slowness (e.g. TUI render cost). ## Tradeoffs Blocking on transcript events means a slow consumer can now stall the producer for the duration of those events. This is acceptable because: (a) the alternative is permanently broken output, which is worse; (b) the consumer already had to keep up with `TurnCompleted` blocking sends; and (c) transcript events arrive at model-output speed, not burst speed, so sustained saturation is unlikely in practice. ## Architecture Two parallel changes, one per transport: - In-process path (`lib.rs`): The inline forwarding logic was extracted into `forward_in_process_event`, a standalone async function that encapsulates the lag-marker / must-deliver / try-send decision tree. The worker loop now delegates to it. A new `server_notification_requires_delivery` function (shared `pub(crate)`) centralizes the notification classification. - Remote path (`remote.rs`): The local `event_requires_delivery` now delegates to the same shared `server_notification_requires_delivery`, keeping both transports in sync. ## Observability No new metrics or log lines. The existing `warn!` on event drops continues to fire for best-effort events. Lossless events that block will not produce a log line (they simply wait). ## Tests - `event_requires_delivery_marks_transcript_and_terminal_events`: unit test confirming the expanded classification covers `AgentMessageDelta`, `ItemCompleted`, `TurnCompleted`, and excludes `CommandExecutionOutputDelta` and `Lagged`. - `forward_in_process_event_preserves_transcript_notifications_under_backpressure`: integration-style test that fills a capacity-1 channel, verifies a best-effort event is dropped (skipped count increments), then sends lossless transcript events and confirms they all arrive in order with the correct lag marker preceding them. - `remote_backpressure_preserves_transcript_notifications`: end-to-end test over a real websocket that verifies the remote transport preserves transcript events under the same backpressure scenario. - `event_requires_delivery_marks_transcript_and_disconnect_events` (remote): unit test confirming the remote-side classification covers transcript events and `Disconnected`. --------- Co-authored-by: Eric Traut <etraut@openai.com>	2026-03-25 13:50:39 -06:00
viyatb-oai	6124564297	feat: add websocket auth for app-server (#14847 ) ## Summary This change adds websocket authentication at the app-server transport boundary and enforces it before JSON-RPC `initialize`, so authenticated deployments reject unauthenticated clients during the websocket handshake rather than after a connection has already been admitted. During rollout, websocket auth is opt-in for non-loopback listeners so we do not break existing remote clients. If `--ws-auth ...` is configured, the server enforces auth during websocket upgrade. If auth is not configured, non-loopback listeners still start, but app-server logs a warning and the startup banner calls out that auth should be configured before real remote use. The server supports two auth modes: a file-backed capability token, and a standard HMAC-signed JWT/JWS bearer token verified with the `jsonwebtoken` crate, with optional issuer, audience, and clock-skew validation. Capability tokens are normalized, hashed, and compared in constant time. Short shared secrets for signed bearer tokens are rejected at startup. Requests carrying an `Origin` header are rejected with `403` by transport middleware, and authenticated clients present credentials as `Authorization: Bearer <token>` during websocket upgrade. ## Validation - `cargo test -p codex-app-server transport::auth` - `cargo test -p codex-cli app_server_` - `cargo clippy -p codex-app-server --all-targets -- -D warnings` - `just bazel-lock-check` Note: in the broad `cargo test -p codex-app-server connection_handling_websocket` run, the touched websocket auth cases passed, but unrelated Unix shutdown tests failed with a timeout in this environment. --------- Co-authored-by: Eric Traut <etraut@openai.com>	2026-03-25 12:35:57 -07:00
Matthew Zeng	91337399fe	[apps][tool_suggest] Remove tool_suggest's dependency on tool search. (#14856 ) - [x] Remove tool_suggest's dependency on tool search.	2026-03-25 12:26:02 -07:00
Felipe Coury	79359fb5e7	fix(tui_app_server): fix remote subagent switching and agent names (#15513 ) ## TL;DR This PR changes the `tui_app_server` _path_ in the following ways: - add missing feature to show agent names (shows only UUIDs today) - add `Cmd/Alt+Arrows` navigation between agent conversations ## Problem When the TUI connects to a remote app server, collab agent tool-call items (spawn, wait, delegate, etc.) render thread UUIDs instead of human-readable agent names because the `ChatWidget` never receives nickname/role metadata for receiver threads. Separately, keyboard next/previous agent navigation silently does nothing when the local `AgentNavigationState` cache has not yet been populated with subagent threads that the remote server already knows about. Both issues share a root cause: in the remote (app-server) code path the TUI never proactively fetches thread metadata. In the local code path this metadata arrives naturally via spawn events the TUI itself orchestrates, but in the remote path those events were processed by a different client and the TUI only sees the resulting collab tool-call notifications. ## Mental model Collab agent tool-call notifications reference receiver threads by id, but carry no nickname or role. The TUI needs that metadata in two places: 1. Rendering -- `ChatWidget` converts `CollabAgentToolCall` items into history cells. Without metadata, agent status lines show raw UUIDs. 2. Navigation -- `AgentNavigationState` tracks known threads for the `/agent` picker and keyboard cycling. Without entries for remote subagents, next/previous has nowhere to go. This change closes the gap with two complementary strategies: - Eager hydration: when any notification carries `receiver_thread_ids`, the TUI fetches metadata (`thread/read`) for threads it has not yet cached before the notification is rendered. - Backfill on thread switch: when the user resumes, forks, or starts a new app-server thread, the TUI fetches the full `thread/loaded/list`, walks the parent-child spawn tree, and registers every descendant subagent in both the navigation cache and the `ChatWidget` metadata map. A new `collab_agent_metadata` side-table in `ChatWidget` stores nickname/role keyed by `ThreadId`, kept in sync by `App` whenever it calls `upsert_agent_picker_thread`. The `replace_chat_widget` helper re-seeds this map from `AgentNavigationState` so that thread switches (which reconstruct the widget) do not lose previously discovered metadata. ## Non-goals - This change does not alter the local (non-app-server) collab code path. That path already receives metadata via spawn events and is unaffected. - No new protocol messages are introduced. The change uses existing `thread/read` and `thread/loaded/list` RPCs. - No changes to how `AgentNavigationState` orders or cycles through threads. The traversal logic is unchanged; only the population of entries is extended. ## Tradeoffs - Extra RPCs on notification path: `hydrate_collab_agent_metadata_for_notification` issues a `thread/read` for each unknown receiver thread before the notification is forwarded to rendering. This adds latency on the notification path but only fires once per thread (the result is cached). The alternative -- rendering first and backfilling names later -- would cause visible flicker as UUIDs are replaced with names. - Backfill fetches all loaded threads: `backfill_loaded_subagent_threads` fetches the full loaded-thread list and walks the spawn tree even when the user may only care about one subagent. This is simple and correct but O(loaded_threads) per thread switch. For typical session sizes this is negligible; it could become a concern for sessions with hundreds of subagents. - Metadata duplication: agent nickname/role is now stored in both `AgentNavigationState` (for picker/label) and `ChatWidget::collab_agent_metadata` (for rendering). The two are kept in sync through `upsert_agent_picker_thread` and `replace_chat_widget`, but there is no compile-time enforcement of this coupling. ## Architecture ### New module: `app::loaded_threads` Pure function `find_loaded_subagent_threads_for_primary` that takes a flat list of `Thread` objects and a primary thread id, then walks the `SessionSource::SubAgent` parent-child edges to collect all transitive descendants. Returns a sorted vec of `LoadedSubagentThread` (thread_id + nickname + role). No async, no side effects -- designed for unit testing. ### New methods on `App` \| Method \| Purpose \| \|--------\|---------\| \| `collab_receiver_thread_ids` \| Extracts `receiver_thread_ids` from `ItemStarted` / `ItemCompleted` collab notifications \| \| `hydrate_collab_agent_metadata_for_notification` \| Fetches and caches metadata for unknown receiver threads before a notification is rendered \| \| `backfill_loaded_subagent_threads` \| Bulk-fetches all loaded threads and registers descendants of the primary thread \| \| `adjacent_thread_id_with_backfill` \| Attempts navigation, falls back to backfill if the cache has no adjacent entry \| \| `replace_chat_widget` \| Replaces the widget and re-seeds its metadata map from `AgentNavigationState` \| ### New state in `ChatWidget` `collab_agent_metadata: HashMap<ThreadId, CollabAgentMetadata>` -- a lookup table that rendering functions consult to attach human-readable names to collab tool-call items. Populated externally by `App` via `set_collab_agent_metadata`. ### New method on `AppServerSession` `thread_loaded_list` -- thin wrapper around `ClientRequest::ThreadLoadedList`. ## Observability - `tracing::warn` on invalid thread ids during hydration and backfill. - `tracing::warn` on failed `thread/read` or `thread/loaded/list` RPCs (with thread id and error). - No new metrics or feature flags. ## Tests - `loaded_threads::tests::finds_loaded_subagent_tree_for_primary_thread` -- unit test for the spawn-tree walk: verifies child and grandchild are included, unrelated threads are excluded, and metadata is carried through. - `app::tests::replace_chat_widget_reseeds_collab_agent_metadata_for_replay` -- integration test that creates a `ChatWidget`, replaces it via `replace_chat_widget`, replays a collab wait notification, and asserts the rendered history cell contains the agent name rather than a UUID. - Updated snapshot `app_server_collab_wait_items_render_history` -- the existing collab wait rendering test now sets metadata before sending notifications, so the snapshot shows `Robie [explorer]` / `Ada [reviewer]` instead of raw thread ids. --------- Co-authored-by: Eric Traut <etraut@openai.com>	2026-03-25 12:50:42 -06:00
evawong-oai	6566ab7e02	Clarify codex_home base for MDM path resolution (#15707 ) ## Summary Add the follow up code comment Michael asked for at the MDM `managed_config_from_mdm` - a follow up from https://github.com/openai/codex/pull/15351. ## Validation 1. `cargo fmt --all --check` 2. `cargo test -p codex-core managed_preferences_expand_home_directory_in_workspace_write_roots -- --nocapture` 3. `cargo test -p codex-core write_value_succeeds_when_managed_preferences_expand_home_directory_paths -- --nocapture` 4. `./tools/argument-comment-lint/run-prebuilt-linter.sh -p codex-core`	2026-03-25 18:40:43 +00:00
Ahmed Ibrahim	d273efc0f3	Extract codex-analytics crate (#15748 ) ## Summary - move the analytics events client into codex-analytics - update codex-core and app-server callsites to use the new crate ## Testing - CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-25 11:08:05 -07:00
Ahmed Ibrahim	2bb1027e37	Extract codex-plugin crate (#15747 ) ## Summary - extract plugin identifiers and load-outcome types into codex-plugin - update codex-core to consume the new plugin crate ## Testing - CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-25 11:07:31 -07:00
Ahmed Ibrahim	ad74543a6f	Extract codex-utils-plugins crate (#15746 ) ## Summary - extract shared plugin path and manifest helpers into codex-utils-plugins - update codex-core to consume the utility crate ## Testing - CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-25 11:05:35 -07:00
Jeremy Rose	6b10e186c4	Add non-interactive resume filter option (#15339 ) ## Summary - add `codex resume --include-non-interactive` to include non-interactive sessions in the picker and `--last` - keep current-provider and cwd filtering behavior unchanged - replace the picker API boolean with a `SessionSourceFilter` enum to avoid a boolean trap ## Tests - `cargo test -p codex-cli` - `cargo test -p codex-tui` - `just fmt` - `just fix -p codex-cli` - `just fix -p codex-tui`	2026-03-25 11:05:07 -07:00
Ahmed Ibrahim	fba3c79885	Extract codex-instructions crate (#15744 ) ## Summary - extract instruction fragment and user-instruction types into codex-instructions - update codex-core to consume the new crate ## Testing - CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-25 10:43:49 -07:00

... 7 8 9 10 11 ...

4560 Commits