codex

mirror of https://github.com/openai/codex.git synced 2026-04-29 00:55:38 +00:00

Author	SHA1	Message	Date
Michael Bolin	b23789b770	[codex] import token_data from codex-login directly (#15903 ) ## Why `token_data` is owned by `codex-login`, but `codex-core` was still re-exporting it. That let callers pull auth token types through `codex-core`, which keeps otherwise unrelated crates coupled to `codex-core` and makes `codex-core` more of a build-graph bottleneck. ## What changed - remove the `codex-core` re-export of `codex_login::token_data` - update the remaining `codex-core` internals that used `crate::token_data` to import `codex_login::token_data` directly - update downstream callers in `codex-rs/chatgpt`, `codex-rs/tui_app_server`, `codex-rs/app-server/tests/common`, and `codex-rs/core/tests` to import `codex_login::token_data` directly - add explicit `codex-login` workspace dependencies and refresh lock metadata for crates that now depend on it directly ## Validation - `cargo test -p codex-chatgpt --locked` - `just argument-comment-lint` - `just bazel-lock-update` - `just bazel-lock-check` ## Notes - attempted `cargo test -p codex-core --locked` and `cargo test -p codex-core auth_refresh --locked`, but both ran out of disk while linking `codex-core` test binaries in the local environment	2026-03-26 13:34:02 -07:00
Michael Bolin	e36ebaa3da	fix: box apply_patch test harness futures (#15835 ) ## Why `#[large_stack_test]` made the `apply_patch_cli` tests pass by giving them more stack, but it did not address why those tests needed the extra stack in the first place. The real problem is the async state built by the `apply_patch_cli` harness path. Those tests await three helper boundaries directly: harness construction, turn submission, and apply-patch output collection. If those helpers inline their full child futures, the test future grows to include the whole harness startup and request/response path. This change replaces the workaround from #12768 with the same basic approach used in #13429, but keeps the fix narrower: only the helper boundaries awaited directly by `apply_patch_cli` stay boxed. ## What Changed - removed `#[large_stack_test]` from `core/tests/suite/apply_patch_cli.rs` - restored ordinary `#[tokio::test(flavor = "multi_thread", worker_threads = 2)]` annotations in that suite - deleted the now-unused `codex-test-macros` crate and removed its workspace wiring - boxed only the three helper boundaries that the suite awaits directly: - `apply_patch_harness_with(...)` - `TestCodexHarness::submit(...)` - `TestCodexHarness::apply_patch_output(...)` - added comments at those boxed boundaries explaining why they remain boxed ## Testing - `cargo test -p codex-core --test all suite::apply_patch_cli -- --nocapture` ## References - #12768 - #13429	2026-03-26 17:32:04 +00:00
jif-oai	26c66f3ee1	fix: flaky (#15869 )	2026-03-26 16:07:32 +01:00
Michael Bolin	01fa4f0212	core: remove special execve handling for skill scripts (#15812 )	2026-03-26 07:46:04 -07:00
viyatb-oai	937cb5081d	fix: fix old system bubblewrap compatibility without falling back to vendored bwrap (#15693 ) Fixes #15283. ## Summary Older system bubblewrap builds reject `--argv0`, which makes our Linux sandbox fail before the helper can re-exec. This PR keeps using system `/usr/bin/bwrap` whenever it exists and only falls back to vendored bwrap when the system binary is missing. That matters on stricter AppArmor hosts, where the distro bwrap package also provides the policy setup needed for user namespaces. For old system bwrap, we avoid `--argv0` instead of switching binaries: - pass the sandbox helper a full-path `argv0`, - keep the existing `current_exe() + --argv0` path when the selected launcher supports it, - otherwise omit `--argv0` and re-exec through the helper's own `argv[0]` path, whose basename still dispatches as `codex-linux-sandbox`. Also updates the launcher/warning tests and docs so they match the new behavior: present-but-old system bwrap uses the compatibility path, and only absent system bwrap falls back to vendored. ### Validation 1. Install Ubuntu 20.04 in a VM 2. Compile codex and run without bubblewrap installed - see a warning about falling back to the vendored bwrap 3. Install bwrap and verify version is 0.4.0 without `argv0` support 4. run codex and use apply_patch tool without errors <img width="802" height="631" alt="Screenshot 2026-03-25 at 11 48 36 PM" src="https://github.com/user-attachments/assets/77248a29-aa38-4d7c-9833-496ec6a458b8" /> <img width="807" height="634" alt="Screenshot 2026-03-25 at 11 47 32 PM" src="https://github.com/user-attachments/assets/5af8b850-a466-489b-95a6-455b76b5050f" /> <img width="812" height="635" alt="Screenshot 2026-03-25 at 11 45 45 PM" src="https://github.com/user-attachments/assets/438074f0-8435-4274-a667-332efdd5cb57" /> <img width="801" height="623" alt="Screenshot 2026-03-25 at 11 43 56 PM" src="https://github.com/user-attachments/assets/0dc8d3f5-e8cf-4218-b4b4-a4f7d9bf02e3" /> --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-03-25 23:51:39 -07:00
Andrei Eternal	c4d9887f9a	[hooks] add non-streaming (non-stdin style) shell-only PostToolUse support (#15531 ) CHAINED PR - note that base is eternal/hooks-pretooluse-bash, not main -- so the following PR should be first Matching post-tool hook to the pre-tool functionality here: https://github.com/openai/codex/pull/15211 So, PreToolUse calls for plain shell calls, allows blocking. This PostToolUse call runs after the command executed example run: ``` › as a test, run in parallel the following commands: - echo 'one' - echo '[block-pre-tool-use]' - echo '[block-post-tool-use]' ⚠ MCP startup incomplete (failed: notion, linear) • Cruising through those three commands in parallel now, and I’ll share the exact outputs right after they land. • Running PreToolUse hook: checking the observatory runes • Running PreToolUse hook: checking the observatory runes • Running PreToolUse hook: checking the observatory runes PreToolUse hook (blocked) warning: wizard-tower PreToolUse demo blocked a Bash command on purpose. feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue. PreToolUse hook (completed) warning: wizard-tower PreToolUse demo inspected Bash: echo 'one' PreToolUse hook (completed) warning: wizard-tower PreToolUse demo inspected Bash: echo '[block-post-tool-use]' • Ran echo '[block-post-tool-use]' └ [block-post-tool-use] • Ran echo 'one' └ one • Running PostToolUse hook: reading the observatory aftermath • Running PostToolUse hook: reading the observatory aftermath PostToolUse hook (blocked) warning: wizard-tower PostToolUse demo blocked the Bash follow-through on purpose. feedback: PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue. PostToolUse hook (completed) warning: wizard-tower PostToolUse demo inspected Bash: echo 'one' hook context: PostToolUse demo saw the Bash result. Response preview: one ────────────────────────────────────────────────────────────────────────────────────────────────────── • Ran them in parallel. Little wave report: - echo 'one' completed and returned one. - echo '[block-pre-tool-use]' was blocked before execution by the PreToolUse hook. - echo '[block-post-tool-use]' ran, but its result was blocked afterward by the PostToolUse hook. The hook messages were: - PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue. - PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue. › for the blocked messages (correctly blocked in this test) did the tool output land back into context? • Yeah, from this run: - PreToolUse block: the command itself did not execute, so no command output landed in context. Only the block message did. - PostToolUse block: the command did execute, but the actual stdout did not come back into context. Only the post-tool block message did. So the blocked notifications landed in context, but the blocked command payloads themselves did not. Nice clean guardrail, aloha. ```	2026-03-25 19:18:03 -07:00
Matthew Zeng	78799c1bcf	[mcp] Improve custom MCP elicitation (#15800 ) - [x] Support don't ask again for custom MCP tool calls. - [x] Don't run arc in yolo mode. - [x] Run arc for custom MCP tools in always allow mode.	2026-03-26 01:02:37 +00:00
pakrym-oai	8fa88fa8ca	Add cached environment manager for exec server URL (#15785 ) Add environment manager that is a singleton and is created early in app-server (before skill manager, before config loading). Use an environment variable to point to a running exec server.	2026-03-25 16:14:36 -07:00
Matthew Zeng	91337399fe	[apps][tool_suggest] Remove tool_suggest's dependency on tool search. (#14856 ) - [x] Remove tool_suggest's dependency on tool search.	2026-03-25 12:26:02 -07:00
pakrym-oai	504aeb0e09	Use AbsolutePathBuf for cwd state (#15710 ) Migrate `cwd` and related session/config state to `AbsolutePathBuf` so downstream consumers consistently see absolute working directories. Add test-only `.abs()` helpers for `Path`, `PathBuf`, and `TempDir`, and update branch-local tests to use them instead of `AbsolutePathBuf::try_from(...)`. For the remaining TUI/app-server snapshot coverage that renders absolute cwd values, keep the snapshots unchanged and skip the Windows-only cases where the platform-specific absolute path layout differs.	2026-03-25 16:02:22 +00:00
jif-oai	178c3b15b4	chore: remove grep_files handler (#15775 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-25 16:01:45 +00:00
Fouad Matin	32c4993c8a	fix(core): default approval behavior for mcp missing annotations (#15519 ) - Changed `requires_mcp_tool_approval` to apply MCP spec defaults when annotations are missing. - Unannotated tools now default to: - `readOnlyHint = false` - `destructiveHint = true` - `openWorldHint = true` - This means unannotated MCP tools now go through approval/ARC monitoring instead of silently bypassing it. - Explicitly read-only tools still skip approval unless they are also explicitly marked destructive. Previous behavior Failed open for missing annotations, which was unsafe for custom MCP tools that omitted or forgot annotations. --------- Co-authored-by: colby-oai <228809017+colby-oai@users.noreply.github.com>	2026-03-25 07:55:41 -07:00
Matthew Zeng	e590fad50b	[plugins] Add a flag for tool search. (#15722 ) - [x] Add a flag for tool search.	2026-03-25 07:00:25 +00:00
Charley Cunningham	d72fa2a209	[codex] Defer fork context injection until first turn (#15699 ) ## Summary - remove the fork-startup `build_initial_context` injection - keep the reconstructed `reference_context_item` as the fork baseline until the first real turn - update fork-history tests and the request snapshot, and add a `TODO(ccunningham)` for remaining nondiffable initial-context inputs ## Why Fork startup was appending current-session initial context immediately after reconstructing the parent rollout, then the first real turn could emit context updates again. That duplicated model-visible context in the child rollout. ## Impact Forked sessions now behave like resume for context seeding: startup reconstructs history and preserves the prior baseline, and the first real turn handles any current-session context emission. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-24 18:34:44 -07:00
Ahmed Ibrahim	0f957a93cd	Move git utilities into a dedicated crate (#15564 ) - create `codex-git-utils` and move the shared git helpers into it with file moves preserved for diff readability - move the `GitInfo` helpers out of `core` so stacked rollout work can depend on the shared crate without carrying its own git info module --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-24 13:26:23 -07:00
Charley Cunningham	2d61357c76	Trim pre-turn context updates during rollback (#15577 ) ## Summary - trim contiguous developer/contextual-user pre-turn updates when rollback cuts back to a user turn - add a focused history regression test for the trim behavior - update the rollback request-boundary snapshots to show the fixed non-duplicating context shape --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-24 12:43:53 -07:00
Celia Chen	88694e8417	chore: stop app-server auth refresh storms after permanent token failure (#15530 ) built from #14256. PR description from @etraut-openai: This PR addresses a hole in [PR 11802](https://github.com/openai/codex/pull/11802). The previous PR assumed that app server clients would respond to token refresh failures by presenting the user with an error ("you must log in again") and then not making further attempts to call network endpoints using the expired token. While they do present the user with this error, they don't prevent further attempts to call network endpoints and can repeatedly call `getAuthStatus(refreshToken=true)` resulting in many failed calls to the token refresh endpoint. There are three solutions I considered here: 1. Change the getAuthStatus app server call to return a null auth if the caller specified "refreshToken" on input and the refresh attempt fails. This will cause clients to immediately log out the user and return them to the log in screen. This is a really bad user experience. It's also a breaking change in the app server contract that could break third-party clients. 2. Augment the getAuthStatus app server call to return an additional field that indicates the state of "token could not be refreshed". This is a non-breaking change to the app server API, but it requires non-trivial changes for all clients to properly handle this new field properly. 3. Change the getAuthStatus implementation to handle the case where a token refresh fails by marking the AuthManager's in-memory access and refresh tokens as "poisoned" so it they are no longer used. This is the simplest fix that requires no client changes. I chose option 3. Here's Codex's explanation of this change: When an app-server client asks `getAuthStatus(refreshToken=true)`, we may try to refresh a stale ChatGPT access token. If that refresh fails permanently (for example `refresh_token_reused`, expired, or revoked), the old behavior was bad in two ways: 1. We kept the in-memory auth snapshot alive as if it were still usable. 2. Later auth checks could retry refresh again and again, creating a storm of doomed `/oauth/token` requests and repeatedly surfacing the same failure. This is especially painful for app-server clients because they poll auth status and can keep driving the refresh path without any real chance of recovery. This change makes permanent refresh failures terminal for the current managed auth snapshot without changing the app-server API contract. What changed: - `AuthManager` now poisons the current managed auth snapshot in memory after a permanent refresh failure, keyed to the unchanged `AuthDotJson`. - Once poisoned, later refresh attempts for that same snapshot fail fast locally without calling the auth service again. - The poison is cleared automatically when auth materially changes, such as a new login, logout, or reload of different auth state from storage. - `getAuthStatus(includeToken=true)` now omits `authToken` after a permanent refresh failure instead of handing out the stale cached bearer token. This keeps the current auth method visible to clients, avoids forcing an immediate logout flow, and stops repeated refresh attempts for credentials that cannot recover. --------- Co-authored-by: Eric Traut <etraut@openai.com>	2026-03-24 12:39:58 -07:00
Celia Chen	7dc2cd2ebe	chore: use access token expiration for proactive auth refresh (#15545 ) Follow up to #15357 by making proactive ChatGPT auth refresh depend on the access token's JWT expiration instead of treating `last_refresh` age as the primary source of truth.	2026-03-24 19:34:48 +00:00
Charley Cunningham	910cf49269	[codex] Stabilize second compaction history test (#15605 ) ## Summary - replace the second-compaction test fixtures with a single ordered `/responses` sequence - assert against the real recorded request order instead of aggregating per-mock captures - realign the second-summary assertion to the first post-compaction user turn where the summary actually appears ## Root cause `compact_resume_after_second_compaction_preserves_history` collected requests from multiple `mount_sse_once_match` recorders. Overlapping matchers could record the same HTTP request more than once, so the test indexed into a duplicated synthetic list rather than the true request stream. That made the summary assertion depend on matcher evaluation order and platform-specific behavior. ## Impact - makes the flaky test deterministic by removing duplicate request capture from the assertion path - keeps the change scoped to the test only ## Validation - `just fmt` - `just argument-comment-lint` - `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core compact_resume_after_second_compaction_preserves_history -- --nocapture` - repeated the same targeted test 10 times --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-24 10:14:21 -07:00
pakrym-oai	f49eb8e9d7	Extract sandbox manager and transforms into codex-sandboxing (#15603 ) Extract sandbox manager	2026-03-24 08:20:57 -07:00
dhruvgupta-oai	c2410060ea	[codex-cli][app-server] Update self-serve business usage limit copy in error returned (#15478 ) ## Summary - update the self-serve business usage-based limit message to direct users to their admin for additional credits - add a focused unit test for the self_serve_business_usage_based plan branch Added also: If you are at a rate limit but you still have credits, codex cli would tell you to switch the model. We shouldnt do this if you have credits so fixed this. ## Test - launched the source-built CLI and verified the updated message is shown for the self-serve business usage-based plan ![Test screenshot](https://raw.githubusercontent.com/openai/codex/5cc3c013ef17ac5c66dfd9395c0d3c4837602231/docs/images/self-serve-business-usage-limit.png)	2026-03-24 04:41:38 +00:00
Charley Cunningham	f547b79bd0	Add fork snapshot modes (#15239 ) ## Summary - add `ForkSnapshotMode` to `ThreadManager::fork_thread` so callers can request either a committed snapshot or an interrupted snapshot - share the model-visible `<turn_aborted>` history marker between the live interrupt path and interrupted forks - update the small set of direct fork callsites to pass `ForkSnapshotMode::Committed` Note: this enables /btw to work similarly as Esc to interrupt (hopefully somewhat in distribution) --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-23 19:05:42 -07:00
Charley Cunningham	0f34b14b41	[codex] Add rollback context duplication snapshot (#15562 ) ## What changed - adds a targeted snapshot test for rollback with contextual diffs in `codex_tests.rs` - snapshots the exact model-visible request input before the rolled-back turn and on the follow-up request after rollback - shows the duplicate developer and environment context pair appearing again before the follow-up user message ## Why Rollback currently rewinds the reference context baseline without rewinding the live session overrides. On the next turn, the same contextual diff is emitted again and duplicated in the request sent to the model. ## Impact - makes the regression visible in a canonical snapshot test - keeps the snapshot on the shared `context_snapshot` path without adding new formatting helpers - gives a direct repro for future fixes to rollback/context reconstruction --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-23 15:36:23 -07:00
Dylan Hurd	67c1c7c054	chore(core) Add approvals reviewer to UserTurn (#15426 ) ## Summary Adds support for approvals_reviewer to `Op::UserTurn` so we can migrate `[CodexMessageProcessor::turn_start]` to use Op::UserTurn ## Testing - [x] Adds quick test for the new field Co-authored-by: Codex <noreply@openai.com>	2026-03-23 15:19:01 -07:00
Andrei Eternal	73bbb07ba8	[hooks] add non-streaming (non-stdin style) shell-only PreToolUse support (#15211 ) - add `PreToolUse` hook for bash-like tool execution only at first - block shell execution before dispatch with deny-only hook behavior - introduces common.rs matcher framework for matching when hooks are run example run: ``` › run three parallel echo commands, and the second one should echo "[block-pre-tool-use]" as a test • Running the three echo commands in parallel now and I’ll report the output directly. • Running PreToolUse hook: name for demo pre tool use hook • Running PreToolUse hook: name for demo pre tool use hook • Running PreToolUse hook: name for demo pre tool use hook PreToolUse hook (completed) warning: wizard-tower PreToolUse demo inspected Bash: echo "first parallel echo" PreToolUse hook (blocked) warning: wizard-tower PreToolUse demo blocked a Bash command on purpose. feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue. PreToolUse hook (completed) warning: wizard-tower PreToolUse demo inspected Bash: echo "third parallel echo" • Ran echo "first parallel echo" └ first parallel echo • Ran echo "third parallel echo" └ third parallel echo • Three little waves went out in parallel. 1. printed first parallel echo 2. was blocked before execution because it contained the exact test string [block-pre-tool-use] 3. printed third parallel echo There was also an unrelated macOS defaults warning around the successful commands, but the echoes themselves worked fine. If you want, I can rerun the second one with a slightly modified string so it passes cleanly. ```	2026-03-23 14:32:59 -07:00
Celia Chen	f55f5c258f	Fix: proactive auth refresh to reload guarded disk state first (#15357 ) ## Summary Fix a managed ChatGPT auth bug where a stale Codex process could proactively refresh using an old in-memory refresh token even after another process had already rotated auth on disk. This changes the proactive `AuthManager::auth()` path to reuse the existing guarded `refresh_token()` flow instead of calling the refresh endpoint directly from cached auth state. ## Original Issue Users reported repeated `codexd` log lines like: ```text ERROR codex_core::auth: Failed to refresh token: error sending request for url (https://auth.openai.com/oauth/token) ``` In practice this showed up most often when multiple `codexd` processes were left running. Killing the extra processes stopped the noise, which suggested the issue was caused by stale auth state across processes rather than invalid user credentials. ## Diagnosis The bug was in the proactive refresh path used by `AuthManager::auth()`: - Process A could refresh successfully, rotate refresh token `R0` to `R1`, and persist the updated auth state plus `last_refresh` to disk. - Process B could keep an older auth snapshot cached in memory, still holding `R0` and the old `last_refresh`. - Later, when Process B called `auth()`, it checked staleness from its cached in-memory auth instead of first reloading from disk. - Because that cached `last_refresh` was stale, Process B would proactively call `/oauth/token` with stale refresh token `R0`. - On failure, `auth()` logged the refresh error but kept returning the same stale cached auth, so repeated `auth()` calls could keep retrying with dead state. This differed from the existing unauthorized-recovery flow, which already did the safer thing: guarded reload from disk first, then refresh only if the on-disk auth was unchanged. ## What Changed - Switched proactive refresh in `AuthManager::auth()` to: - do a pure staleness check on cached auth - call `refresh_token()` when stale - return the original cached auth on genuine refresh failure, preserving existing outward behavior - Removed the direct proactive refresh-from-cached-state path - Added regression tests covering: - stale cached auth with newer same-account auth already on disk - the same scenario even when the refresh endpoint would fail if called ## Why This Fix `refresh_token()` already contains the right cross-process safety behavior: - guarded reload from disk - same-account verification - skip-refresh when another process already changed auth Reusing that path makes proactive refresh consistent with unauthorized recovery and prevents stale processes from trying to refresh already-rotated tokens. ## Testing Test shape: - create a fresh temp `CODEX_HOME` from `~/.codex/auth.json` - force `last_refresh` to an old timestamp so proactive refresh is required - start two long-lived helper processes against the same auth file - start `B` first so it caches stale auth and sleeps - start `A` second so it refreshes first - point both at a local mock `/oauth/token` server - inspect whether `B` makes a second refresh request with the stale in-memory token, or reloads the rotated token from disk ### Before the fix The repro showed the bug clearly: the mock server saw two refreshes with the same stale token, `A` rotated to a new token, and `B` still returned the stale token instead of reloading from disk. ```text POST /oauth/token refresh_token=rt_j6s0... POST /oauth/token refresh_token=rt_j6s0... B:cached_before=rt_j6s0... B:cached_after=rt_j6s0... B:returned=rt_j6s0... A:cached_before=rt_j6s0... A:cached_after=rotated-refresh-token-logged-run-v2 A:returned=rotated-refresh-token-logged-run-v2 ``` ### After the fix After the fix, the mock server saw only one refresh request. `A` refreshed once, and `B` started with the stale token but reloaded and returned the rotated token. ```text POST /oauth/token refresh_token=rt_j6s0... B:cached_before=rt_j6s0... B:cached_after=rotated-refresh-token-fix-branch B:returned=rotated-refresh-token-fix-branch A:cached_before=rt_j6s0... A:cached_after=rotated-refresh-token-fix-branch A:returned=rotated-refresh-token-fix-branch ``` This shows the new behavior: `A` refreshes once, then `B` reuses the updated auth from disk instead of making a second refresh request with the stale token.	2026-03-23 12:07:59 -07:00
Dylan Hurd	0d9bb8ea58	chore(context) Include guardian approval context (#15366 ) ## Summary Include the guardian context in the developer message for approvals ## Testing - [x] Updated unit tests	2026-03-21 16:31:22 +00:00
Channing Conger	e4eedd6170	Code mode on v8 (#15276 ) Moves Code Mode to a new crate with no dependencies on codex. This create encodes the code mode semantics that we want for lifetime, mounting, tool calling. The model-facing surface is mostly unchanged. `exec` still runs raw JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools are still available through `tools.`, and helpers like `text`, `image`, `store`, `load`, `notify`, `yield_control`, and `exit` still exist. The major change is underneath that surface: - Old code mode was an external Node runtime. - New code mode is an in-process V8 runtime embedded directly in Rust. - Old code mode managed cells inside a long-lived Node runner process. - New code mode manages cells in Rust, with one V8 runtime thread per active `exec`. - Old code mode used JSON protocol messages over child stdin/stdout plus Node worker-thread messages. - New code mode uses Rust channels and direct V8 callbacks/events. This PR also fixes the two migration regressions that fell out of that substrate change: - `wait { terminate: true }` now waits for the V8 runtime to actually stop before reporting termination. - synchronous top-level `exit()` now succeeds again instead of surfacing as a script error. --- - `core/src/tools/code_mode/` is now mostly an adapter layer for the public `exec` / `wait` tools. - `code-mode/src/service.rs` owns cell sessions and async control flow in Rust. - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and JavaScript execution. - each `exec` spawns a dedicated runtime thread plus a Rust session-control task. - helper globals are installed directly into the V8 context instead of being injected through a source prelude. - helper modules like `tools.js` and `@openai/code_mode` are synthesized through V8 module resolution callbacks in Rust. --- Also added a benchmark for showing the speed of init and use of a code mode env: ``` $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128 Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae) exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128] scenario tools samples warmups iters mean/exec p95/exec rssΔ p50 rssΔ max cold_exec 0 30 0 1 1.13ms 1.20ms 8.05MiB 8.06MiB warm_exec 0 30 1 25 473.43us 512.49us 912.00KiB 1.33MiB cold_exec 32 30 0 1 1.03ms 1.15ms 8.08MiB 8.11MiB warm_exec 32 30 1 25 509.73us 545.76us 960.00KiB 1.30MiB cold_exec 128 30 0 1 1.14ms 1.19ms 8.30MiB 8.34MiB warm_exec 128 30 1 25 575.08us 591.03us 736.00KiB 864.00KiB memory uses a fresh-process max RSS delta for each scenario ``` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-20 23:36:58 -07:00
Dylan Hurd	ea8b07e680	chore(core) Remove Feature::PowershellUtf8 (#15128 ) ## Summary This feature has been enabled for powershell for a while now, let's get rid of the logic ## Testing - [x] Unit tests	2026-03-20 22:03:31 +00:00
jif-oai	79ad7b247b	feat: change multi-agent to use path-like system instead of uuids (#15313 ) This PR add an URI-based system to reference agents within a tree. This comes from a sync between research and engineering. The main agent (the one manually spawned by a user) is always called `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example where `agent_1` is chosen by the model. Any agent can contact any agents using the path. Paths can be used either in absolute or relative to the calling agents Resume is not supported for now on this new path	2026-03-20 18:23:48 +00:00
pakrym-oai	4ddde54c19	Add remote test skill (#15324 ) Teach codex to run remote tests.	2026-03-20 10:37:57 -07:00
pakrym-oai	ba85a58039	Add remote env CI matrix and integration test (#14869 ) `CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor "remotely" (inside a docker container) turning any integration test into remote test.	2026-03-20 08:02:50 -07:00
Ahmed Ibrahim	2e22885e79	Split features into codex-features crate (#15253 ) - Split the feature system into a new `codex-features` crate. - Cut `codex-core` and workspace consumers over to the new config and warning APIs. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-19 20:12:07 -07:00
Michael Bolin	a3e59e9e85	core: add a full-buffer exec capture policy (#15254 )	2026-03-20 02:38:12 +00:00
Ahmed Ibrahim	2aa4873802	Move auth code into login crate (#15150 ) - Move the auth implementation and token data into codex-login. - Keep codex-core re-exporting that surface from codex-login for existing callers. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-19 18:58:17 -07:00
Won Park	6b8175c734	changed save directory to codex_home (#15222 ) saving image gen default save directory to codex_home/imagegen/thread_id/	2026-03-19 15:16:26 -07:00
nicholasclark-openai	2bee37fe69	Plumb MCP turn metadata through _meta (#15190 ) ## Summary Some background. We're looking to instrument GA turns end to end. Right now a big gap is grouping mcp tool calls with their codex sessions. We send session id and turn id headers to the responses call but not the mcp/wham calls. Ideally we could pass the args as headers like with responses, but given the setup of the rmcp client, we can't send as headers without either changing the rmcp package upstream to allow per request headers or introducing a mutex which break concurrency. An earlier attempt made the assumption that we had 1 client per thread, which allowed us to set headers at the start of a turn. @pakrym mentioned that this assumption might break in the near future. So the solution now is to package the turn metadata/session id into the _meta field in the post body and pull out in codex-backend. - send turn metadata to MCP servers via `tools/call` `_meta` instead of assuming per-thread request headers on shared clients - preserve the existing `_codex_apps` metadata while adding `x-codex-turn-metadata` for all MCP tool calls - extend tests to cover both custom MCP servers and the codex apps search flow --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-19 22:05:13 +00:00
Andrei Eternal	267499bed8	[hooks] use a user message > developer message for prompt continuation (#14867 ) ## Summary Persist Stop-hook continuation prompts as `user` messages instead of hidden `developer` messages + some requested integration tests This is a followup to @pakrym 's comment in https://github.com/openai/codex/pull/14532 to make sure stop-block continuation prompts match training for turn loops - Stop continuation now writes `<hook_prompt hook_run_id="...">stop hook's user prompt<hook_prompt>` - Introduces quick-xml dependency, though we already indirectly depended on it anyway via syntect - This PR only has about 500 lines of actual logic changes, the rest is tests/schema ## Testing Example run (with a sessionstart hook and 3 stop hooks) - this shows context added by session start, then two stop hooks sending their own additional prompts in a new turn. The model responds with a single message addressing both. Then when that turn ends, the hooks detect that they just ran using `stop_hook_active` and decide not to infinite loop test files for this (unzip, move codex -> .codex): [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip) ``` › cats • Running SessionStart hook: lighting the observatory SessionStart hook (completed) warning: Hi, I'm a session start hook for wizard-tower (startup). hook context: A wimboltine stonpet is an exotic cuisine from hyperspace • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk cat facts, cat breeds, cat names, or build something cat-themed in this repo. • Running Stop hook: checking the tower wards • Running Stop hook: sacking the guards • Running Stop hook: hiring the guards Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (177 chars). Stop hook (blocked) warning: Wizard Tower Stop hook continuing conversation feedback: cook the stonpet Stop hook (blocked) warning: Wizard Tower Stop hook continuing conversation feedback: eat the cooked stonpet • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and rested until the hyperspace juices settle. Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf: smoky, bright, and totally out of this dimension. • Running Stop hook: checking the tower wards • Running Stop hook: sacking the guards • Running Stop hook: hiring the guards Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (285 chars). Stop hook (completed) warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop. Stop hook (completed) warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop. ```	2026-03-19 10:53:08 -07:00
Andrei Eternal	42e932d7bf	[hooks] turn_id extension for Stop & UserPromptSubmit (#15118 ) ## Description Adding an extension to the spec that exposes the turn_id to hook scripts. This is a codex-specific mechanic that allows connecting the hook runs with particular turns ## Testing hooks config / sample hooks to use. Extract this, rename codex -> .codex, and place this into a repo or your home folder. It includes: config.toml that enables hooks, hooks.json, and sample python hooks: [codex.zip](https://github.com/user-attachments/files/26102671/codex.zip) example run (note the turn_ids change between turns): ``` › hello • Running SessionStart hook: lighting the observatory SessionStart hook (completed) warning: Hi, I'm a session start hook for wizard-tower (startup). hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace • Running UserPromptSubmit hook: lighting the observatory lanterns UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: hello for turn: 019d036d-c7fa-72d2-b6fd- 78878bfe34e4 hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' near the end. • Aloha! Grateful to be here and ready to build with you. Show me what you want to tackle in wizard- tower, and we’ll surf the next wave together. observatory lanterns lit • Running Stop hook: back to shore Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (170 chars) for turn: 019d036d-c7fa- 72d2-b6fd-78878bfe34e4 › what's a stonpet? • Running UserPromptSubmit hook: lighting the observatory lanterns UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: what's a stonpet? for turn: 019d036e-3164- 72c3-a170-98925564c4fc hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' near the end. • A stonpet isn’t a standard real-world word, brah. In our shared context here, a wimboltine stonpet is an exotic cuisine from hyperspace, so “stonpet” sounds like the dish or food itself. If you want, we can totally invent the lore for it next. observatory lanterns lit • Running Stop hook: back to shore Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (271 chars) for turn: 019d036e-3164- 72c3-a170-98925564c4fc ```	2026-03-18 21:48:31 -07:00
Owen Lin	20f2a216df	feat(core, tracing): create turn spans over websockets (#14632 ) ## Description Dependent on: - [responsesapi] https://github.com/openai/openai/pull/760991 - [codex-backend] https://github.com/openai/openai/pull/760985 `codex app-server -> codex-backend -> responsesapi` now reuses a persistent websocket connection across many turns. This PR updates tracing when using websockets so that each `response.create` websocket request propagates the current tracing context, so we can get a holistic end-to-end trace for each turn. Tracing is propagated via special keys (`ws_request_header_traceparent`, `ws_request_header_tracestate`) set in the `client_metadata` param in Responses API. Currently tracing on websockets is a bit broken because we only set tracing context on ws connection time, so it's detached from a `turn/start` request.	2026-03-19 03:41:06 +00:00
pakrym-oai	56d0c6bf67	Add apply_patch code mode result (#15100 ) It's empty !	2026-03-18 16:11:10 -07:00
pakrym-oai	3590e181fa	Add update_plan code mode result (#15103 ) It's empty!	2026-03-18 16:10:51 -07:00
Charley Cunningham	ebbbc52ce4	Align SQLite feedback logs with feedback formatter (#13494 ) ## Summary - store a pre-rendered `feedback_log_body` in SQLite so `/feedback` exports keep span prefixes and structured event fields - render SQLite feedback exports with timestamps and level prefixes to match the old in-memory feedback formatter, while preserving existing trailing newlines - count `feedback_log_body` in the SQLite retention budget so structured or span-prefixed rows still prune correctly - bound `/feedback` row loading in SQL with the retention estimate, then apply exact whole-line truncation in Rust so uploads stay capped without splitting lines ## Details - add a `feedback_log_body` column to `logs` and backfill it from `message` for existing rows - capture span names plus formatted span and event fields at write time, since SQLite does not retain enough structure to reconstruct the old formatter later - keep SQLite feedback queries scoped to the requested thread plus same-process threadless rows - restore a SQL-side cumulative `estimated_bytes` cap for feedback export queries so over-retained partitions do not load every matching row before truncation - add focused formatting coverage for exported feedback lines and parity coverage against `tracing_subscriber` ## Testing - cargo test -p codex-state - just fix -p codex-state - just fmt codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 22:44:31 +00:00
Ahmed Ibrahim	7b37a0350f	Add final message prefix to realtime handoff output (#15077 ) - prefix realtime handoff output with the agent final message label for both realtime v1 and v2 - update realtime websocket and core expectations to match	2026-03-18 15:19:49 -07:00
pakrym-oai	5cada46ddf	Return image URL from view_image tool (#15072 ) Cleanup image semantics in code mode. `view_image` now returns `{image_url:string, details?: string}` `image()` now allows both string parameter and `{image_url:string, details?: string}`	2026-03-18 13:58:20 -07:00
pakrym-oai	88e5382fc4	Propagate tool errors to code mode (#15075 ) Clean up error flow to push the FunctionCallError all the way up to dispatcher and allow code mode to surface as exception.	2026-03-18 13:57:55 -07:00
pakrym-oai	606d85055f	Add notify to code-mode (#14842 ) Allows model to send an out-of-band notification. The notification is injected as another tool call output for the same call_id.	2026-03-18 09:37:13 -07:00
Dylan Hurd	84f4e7b39d	fix(subagents) share execpolicy by default (#13702 ) ## Summary If a subagent requests approval, and the user persists that approval to the execpolicy, it should (by default) propagate. We'll need to rethink this a bit in light of coming Permissions changes, though I think this is closer to the end state that we'd want, which is that execpolicy changes to one permissions profile should be synced across threads. ## Testing - [x] Added integration test --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 06:42:26 +00:00
Andrei Eternal	6fef421654	[hooks] userpromptsubmit - hook before user's prompt is executed (#14626 ) - this allows blocking the user's prompts from executing, and also prevents them from entering history - handles the edge case where you can both prevent the user's prompt AND add n amount of additionalContexts - refactors some old code into common.rs where hooks overlap functionality - refactors additionalContext being previously added to user messages, instead we use developer messages for them - handles queued messages correctly Sample hook for testing - if you write "[block-user-submit]" this hook will stop the thread: example run ``` › sup • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: sup hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' exactly once near the end. • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory lanterns lit › and [block-user-submit] • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (stopped) warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose. stop: Wizard Tower demo block: remove [block-user-submit] to continue. ``` .codex/config.toml ``` [features] codex_hooks = true ``` .codex/hooks.json ``` { "hooks": { "UserPromptSubmit": [ { "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py", "timeoutSec": 10, "statusMessage": "reading the observatory notes" } ] } ] } } ``` .codex/hooks/user_prompt_submit_demo.py ``` #!/usr/bin/env python3 import json import sys from pathlib import Path def prompt_from_payload(payload: dict) -> str: prompt = payload.get("prompt") if isinstance(prompt, str) and prompt.strip(): return prompt.strip() event = payload.get("event") if isinstance(event, dict): user_prompt = event.get("user_prompt") if isinstance(user_prompt, str): return user_prompt.strip() return "" def main() -> int: payload = json.load(sys.stdin) prompt = prompt_from_payload(payload) cwd = Path(payload.get("cwd", ".")).name or "wizard-tower" if "[block-user-submit]" in prompt: print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo blocked the prompt on purpose." ), "decision": "block", "reason": ( "Wizard Tower demo block: remove [block-user-submit] to continue." ), } ) ) return 0 prompt_preview = prompt or "(empty prompt)" if len(prompt_preview) > 80: prompt_preview = f"{prompt_preview[:77]}..." print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}" ), "hookSpecificOutput": { "hookEventName": "UserPromptSubmit", "additionalContext": ( "Wizard Tower UserPromptSubmit demo fired. " "For this reply only, include the exact phrase " "'observatory lanterns lit' exactly once near the end." ), }, } ) ) return 0 if __name__ == "__main__": raise SystemExit(main()) ```	2026-03-17 22:09:22 -07:00
Ahmed Ibrahim	3ce879c646	Handle realtime conversation end in the TUI (#14903 ) - close live realtime sessions on errors, ctrl-c, and active meter removal - centralize TUI realtime cleanup and avoid duplicate follow-up close info --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 21:04:58 -07:00

1 2 3 4 5 ...

933 Commits