codex

mirror of https://github.com/openai/codex.git synced 2026-04-28 00:25:56 +00:00

Author	SHA1	Message	Date
Channing Conger	e4eedd6170	Code mode on v8 (#15276 ) Moves Code Mode to a new crate with no dependencies on codex. This create encodes the code mode semantics that we want for lifetime, mounting, tool calling. The model-facing surface is mostly unchanged. `exec` still runs raw JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools are still available through `tools.`, and helpers like `text`, `image`, `store`, `load`, `notify`, `yield_control`, and `exit` still exist. The major change is underneath that surface: - Old code mode was an external Node runtime. - New code mode is an in-process V8 runtime embedded directly in Rust. - Old code mode managed cells inside a long-lived Node runner process. - New code mode manages cells in Rust, with one V8 runtime thread per active `exec`. - Old code mode used JSON protocol messages over child stdin/stdout plus Node worker-thread messages. - New code mode uses Rust channels and direct V8 callbacks/events. This PR also fixes the two migration regressions that fell out of that substrate change: - `wait { terminate: true }` now waits for the V8 runtime to actually stop before reporting termination. - synchronous top-level `exit()` now succeeds again instead of surfacing as a script error. --- - `core/src/tools/code_mode/` is now mostly an adapter layer for the public `exec` / `wait` tools. - `code-mode/src/service.rs` owns cell sessions and async control flow in Rust. - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and JavaScript execution. - each `exec` spawns a dedicated runtime thread plus a Rust session-control task. - helper globals are installed directly into the V8 context instead of being injected through a source prelude. - helper modules like `tools.js` and `@openai/code_mode` are synthesized through V8 module resolution callbacks in Rust. --- Also added a benchmark for showing the speed of init and use of a code mode env: ``` $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128 Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae) exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128] scenario tools samples warmups iters mean/exec p95/exec rssΔ p50 rssΔ max cold_exec 0 30 0 1 1.13ms 1.20ms 8.05MiB 8.06MiB warm_exec 0 30 1 25 473.43us 512.49us 912.00KiB 1.33MiB cold_exec 32 30 0 1 1.03ms 1.15ms 8.08MiB 8.11MiB warm_exec 32 30 1 25 509.73us 545.76us 960.00KiB 1.30MiB cold_exec 128 30 0 1 1.14ms 1.19ms 8.30MiB 8.34MiB warm_exec 128 30 1 25 575.08us 591.03us 736.00KiB 864.00KiB memory uses a fresh-process max RSS delta for each scenario ``` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-20 23:36:58 -07:00
Dylan Hurd	ea8b07e680	chore(core) Remove Feature::PowershellUtf8 (#15128 ) ## Summary This feature has been enabled for powershell for a while now, let's get rid of the logic ## Testing - [x] Unit tests	2026-03-20 22:03:31 +00:00
jif-oai	79ad7b247b	feat: change multi-agent to use path-like system instead of uuids (#15313 ) This PR add an URI-based system to reference agents within a tree. This comes from a sync between research and engineering. The main agent (the one manually spawned by a user) is always called `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example where `agent_1` is chosen by the model. Any agent can contact any agents using the path. Paths can be used either in absolute or relative to the calling agents Resume is not supported for now on this new path	2026-03-20 18:23:48 +00:00
pakrym-oai	4ddde54c19	Add remote test skill (#15324 ) Teach codex to run remote tests.	2026-03-20 10:37:57 -07:00
pakrym-oai	ba85a58039	Add remote env CI matrix and integration test (#14869 ) `CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor "remotely" (inside a docker container) turning any integration test into remote test.	2026-03-20 08:02:50 -07:00
Ahmed Ibrahim	2e22885e79	Split features into codex-features crate (#15253 ) - Split the feature system into a new `codex-features` crate. - Cut `codex-core` and workspace consumers over to the new config and warning APIs. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-19 20:12:07 -07:00
Michael Bolin	a3e59e9e85	core: add a full-buffer exec capture policy (#15254 )	2026-03-20 02:38:12 +00:00
Ahmed Ibrahim	2aa4873802	Move auth code into login crate (#15150 ) - Move the auth implementation and token data into codex-login. - Keep codex-core re-exporting that surface from codex-login for existing callers. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-19 18:58:17 -07:00
Won Park	6b8175c734	changed save directory to codex_home (#15222 ) saving image gen default save directory to codex_home/imagegen/thread_id/	2026-03-19 15:16:26 -07:00
nicholasclark-openai	2bee37fe69	Plumb MCP turn metadata through _meta (#15190 ) ## Summary Some background. We're looking to instrument GA turns end to end. Right now a big gap is grouping mcp tool calls with their codex sessions. We send session id and turn id headers to the responses call but not the mcp/wham calls. Ideally we could pass the args as headers like with responses, but given the setup of the rmcp client, we can't send as headers without either changing the rmcp package upstream to allow per request headers or introducing a mutex which break concurrency. An earlier attempt made the assumption that we had 1 client per thread, which allowed us to set headers at the start of a turn. @pakrym mentioned that this assumption might break in the near future. So the solution now is to package the turn metadata/session id into the _meta field in the post body and pull out in codex-backend. - send turn metadata to MCP servers via `tools/call` `_meta` instead of assuming per-thread request headers on shared clients - preserve the existing `_codex_apps` metadata while adding `x-codex-turn-metadata` for all MCP tool calls - extend tests to cover both custom MCP servers and the codex apps search flow --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-19 22:05:13 +00:00
Andrei Eternal	267499bed8	[hooks] use a user message > developer message for prompt continuation (#14867 ) ## Summary Persist Stop-hook continuation prompts as `user` messages instead of hidden `developer` messages + some requested integration tests This is a followup to @pakrym 's comment in https://github.com/openai/codex/pull/14532 to make sure stop-block continuation prompts match training for turn loops - Stop continuation now writes `<hook_prompt hook_run_id="...">stop hook's user prompt<hook_prompt>` - Introduces quick-xml dependency, though we already indirectly depended on it anyway via syntect - This PR only has about 500 lines of actual logic changes, the rest is tests/schema ## Testing Example run (with a sessionstart hook and 3 stop hooks) - this shows context added by session start, then two stop hooks sending their own additional prompts in a new turn. The model responds with a single message addressing both. Then when that turn ends, the hooks detect that they just ran using `stop_hook_active` and decide not to infinite loop test files for this (unzip, move codex -> .codex): [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip) ``` › cats • Running SessionStart hook: lighting the observatory SessionStart hook (completed) warning: Hi, I'm a session start hook for wizard-tower (startup). hook context: A wimboltine stonpet is an exotic cuisine from hyperspace • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk cat facts, cat breeds, cat names, or build something cat-themed in this repo. • Running Stop hook: checking the tower wards • Running Stop hook: sacking the guards • Running Stop hook: hiring the guards Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (177 chars). Stop hook (blocked) warning: Wizard Tower Stop hook continuing conversation feedback: cook the stonpet Stop hook (blocked) warning: Wizard Tower Stop hook continuing conversation feedback: eat the cooked stonpet • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and rested until the hyperspace juices settle. Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf: smoky, bright, and totally out of this dimension. • Running Stop hook: checking the tower wards • Running Stop hook: sacking the guards • Running Stop hook: hiring the guards Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (285 chars). Stop hook (completed) warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop. Stop hook (completed) warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop. ```	2026-03-19 10:53:08 -07:00
Andrei Eternal	42e932d7bf	[hooks] turn_id extension for Stop & UserPromptSubmit (#15118 ) ## Description Adding an extension to the spec that exposes the turn_id to hook scripts. This is a codex-specific mechanic that allows connecting the hook runs with particular turns ## Testing hooks config / sample hooks to use. Extract this, rename codex -> .codex, and place this into a repo or your home folder. It includes: config.toml that enables hooks, hooks.json, and sample python hooks: [codex.zip](https://github.com/user-attachments/files/26102671/codex.zip) example run (note the turn_ids change between turns): ``` › hello • Running SessionStart hook: lighting the observatory SessionStart hook (completed) warning: Hi, I'm a session start hook for wizard-tower (startup). hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace • Running UserPromptSubmit hook: lighting the observatory lanterns UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: hello for turn: 019d036d-c7fa-72d2-b6fd- 78878bfe34e4 hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' near the end. • Aloha! Grateful to be here and ready to build with you. Show me what you want to tackle in wizard- tower, and we’ll surf the next wave together. observatory lanterns lit • Running Stop hook: back to shore Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (170 chars) for turn: 019d036d-c7fa- 72d2-b6fd-78878bfe34e4 › what's a stonpet? • Running UserPromptSubmit hook: lighting the observatory lanterns UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: what's a stonpet? for turn: 019d036e-3164- 72c3-a170-98925564c4fc hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' near the end. • A stonpet isn’t a standard real-world word, brah. In our shared context here, a wimboltine stonpet is an exotic cuisine from hyperspace, so “stonpet” sounds like the dish or food itself. If you want, we can totally invent the lore for it next. observatory lanterns lit • Running Stop hook: back to shore Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (271 chars) for turn: 019d036e-3164- 72c3-a170-98925564c4fc ```	2026-03-18 21:48:31 -07:00
Owen Lin	20f2a216df	feat(core, tracing): create turn spans over websockets (#14632 ) ## Description Dependent on: - [responsesapi] https://github.com/openai/openai/pull/760991 - [codex-backend] https://github.com/openai/openai/pull/760985 `codex app-server -> codex-backend -> responsesapi` now reuses a persistent websocket connection across many turns. This PR updates tracing when using websockets so that each `response.create` websocket request propagates the current tracing context, so we can get a holistic end-to-end trace for each turn. Tracing is propagated via special keys (`ws_request_header_traceparent`, `ws_request_header_tracestate`) set in the `client_metadata` param in Responses API. Currently tracing on websockets is a bit broken because we only set tracing context on ws connection time, so it's detached from a `turn/start` request.	2026-03-19 03:41:06 +00:00
pakrym-oai	56d0c6bf67	Add apply_patch code mode result (#15100 ) It's empty !	2026-03-18 16:11:10 -07:00
pakrym-oai	3590e181fa	Add update_plan code mode result (#15103 ) It's empty!	2026-03-18 16:10:51 -07:00
Charley Cunningham	ebbbc52ce4	Align SQLite feedback logs with feedback formatter (#13494 ) ## Summary - store a pre-rendered `feedback_log_body` in SQLite so `/feedback` exports keep span prefixes and structured event fields - render SQLite feedback exports with timestamps and level prefixes to match the old in-memory feedback formatter, while preserving existing trailing newlines - count `feedback_log_body` in the SQLite retention budget so structured or span-prefixed rows still prune correctly - bound `/feedback` row loading in SQL with the retention estimate, then apply exact whole-line truncation in Rust so uploads stay capped without splitting lines ## Details - add a `feedback_log_body` column to `logs` and backfill it from `message` for existing rows - capture span names plus formatted span and event fields at write time, since SQLite does not retain enough structure to reconstruct the old formatter later - keep SQLite feedback queries scoped to the requested thread plus same-process threadless rows - restore a SQL-side cumulative `estimated_bytes` cap for feedback export queries so over-retained partitions do not load every matching row before truncation - add focused formatting coverage for exported feedback lines and parity coverage against `tracing_subscriber` ## Testing - cargo test -p codex-state - just fix -p codex-state - just fmt codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 22:44:31 +00:00
Ahmed Ibrahim	7b37a0350f	Add final message prefix to realtime handoff output (#15077 ) - prefix realtime handoff output with the agent final message label for both realtime v1 and v2 - update realtime websocket and core expectations to match	2026-03-18 15:19:49 -07:00
pakrym-oai	5cada46ddf	Return image URL from view_image tool (#15072 ) Cleanup image semantics in code mode. `view_image` now returns `{image_url:string, details?: string}` `image()` now allows both string parameter and `{image_url:string, details?: string}`	2026-03-18 13:58:20 -07:00
pakrym-oai	88e5382fc4	Propagate tool errors to code mode (#15075 ) Clean up error flow to push the FunctionCallError all the way up to dispatcher and allow code mode to surface as exception.	2026-03-18 13:57:55 -07:00
pakrym-oai	606d85055f	Add notify to code-mode (#14842 ) Allows model to send an out-of-band notification. The notification is injected as another tool call output for the same call_id.	2026-03-18 09:37:13 -07:00
Dylan Hurd	84f4e7b39d	fix(subagents) share execpolicy by default (#13702 ) ## Summary If a subagent requests approval, and the user persists that approval to the execpolicy, it should (by default) propagate. We'll need to rethink this a bit in light of coming Permissions changes, though I think this is closer to the end state that we'd want, which is that execpolicy changes to one permissions profile should be synced across threads. ## Testing - [x] Added integration test --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 06:42:26 +00:00
Andrei Eternal	6fef421654	[hooks] userpromptsubmit - hook before user's prompt is executed (#14626 ) - this allows blocking the user's prompts from executing, and also prevents them from entering history - handles the edge case where you can both prevent the user's prompt AND add n amount of additionalContexts - refactors some old code into common.rs where hooks overlap functionality - refactors additionalContext being previously added to user messages, instead we use developer messages for them - handles queued messages correctly Sample hook for testing - if you write "[block-user-submit]" this hook will stop the thread: example run ``` › sup • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: sup hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' exactly once near the end. • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory lanterns lit › and [block-user-submit] • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (stopped) warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose. stop: Wizard Tower demo block: remove [block-user-submit] to continue. ``` .codex/config.toml ``` [features] codex_hooks = true ``` .codex/hooks.json ``` { "hooks": { "UserPromptSubmit": [ { "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py", "timeoutSec": 10, "statusMessage": "reading the observatory notes" } ] } ] } } ``` .codex/hooks/user_prompt_submit_demo.py ``` #!/usr/bin/env python3 import json import sys from pathlib import Path def prompt_from_payload(payload: dict) -> str: prompt = payload.get("prompt") if isinstance(prompt, str) and prompt.strip(): return prompt.strip() event = payload.get("event") if isinstance(event, dict): user_prompt = event.get("user_prompt") if isinstance(user_prompt, str): return user_prompt.strip() return "" def main() -> int: payload = json.load(sys.stdin) prompt = prompt_from_payload(payload) cwd = Path(payload.get("cwd", ".")).name or "wizard-tower" if "[block-user-submit]" in prompt: print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo blocked the prompt on purpose." ), "decision": "block", "reason": ( "Wizard Tower demo block: remove [block-user-submit] to continue." ), } ) ) return 0 prompt_preview = prompt or "(empty prompt)" if len(prompt_preview) > 80: prompt_preview = f"{prompt_preview[:77]}..." print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}" ), "hookSpecificOutput": { "hookEventName": "UserPromptSubmit", "additionalContext": ( "Wizard Tower UserPromptSubmit demo fired. " "For this reply only, include the exact phrase " "'observatory lanterns lit' exactly once near the end." ), }, } ) ) return 0 if __name__ == "__main__": raise SystemExit(main()) ```	2026-03-17 22:09:22 -07:00
Ahmed Ibrahim	3ce879c646	Handle realtime conversation end in the TUI (#14903 ) - close live realtime sessions on errors, ctrl-c, and active meter removal - centralize TUI realtime cleanup and avoid duplicate follow-up close info --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 21:04:58 -07:00
pakrym-oai	770616414a	Prefer websockets when providers support them (#13592 ) Remove all flags and model settings. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 19:46:44 -07:00
Ahmed Ibrahim	98be562fd3	Unify realtime shutdown in core (#14902 ) - route realtime startup, input, and transport failures through a single shutdown path - emit one realtime error/closed lifecycle while clearing session state once --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 15:58:52 -07:00
Ahmed Ibrahim	c6ab4ee537	Gate realtime audio interruption logic to v2 (#14984 ) - thread the realtime version into conversation start and app-server notifications - keep playback-aware mic gating and playback interruption behavior on v2 only, leaving v1 on the legacy path	2026-03-17 15:24:37 -07:00
pakrym-oai	ee756eb80f	Rename exec_wait tool to wait (#14983 ) Summary - document that code mode only exposes `exec` and the renamed `wait` tool - update code mode tool spec and descriptions to match the new tool name - rename tests and helper references from `exec_wait` to `wait` Testing - Not run (not requested)	2026-03-17 14:22:26 -07:00
Ahmed Ibrahim	4d9d4b7b0f	Stabilize approval matrix write-file command (#14968 ) ## What is flaky The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in CI even though the approval logic is unchanged, because the test delegates the file write and readback to shell parsing instead of deterministic file I/O. ## Why it was flaky The test generated a command shaped like `printf ... > file && cat file`. That means the scenario depended on shell quoting, redirection, newline handling, and encoding behavior in addition to the approval system it was actually trying to validate. If the shell interpreted the payload differently, the test would report an approval failure even though the product logic was fine. That also made failures hard to diagnose, because the test did not log the exact generated command or the parsed result payload. ## How this PR fixes it This PR replaces the shell-redirection path with a deterministic `python3 -c` script that writes the file with `Path.write_text(..., encoding='utf-8')` and then reads it back with the same UTF-8 path. It also logs the generated command and the resulting exit code/stdout for the approval scenario so any future failure is directly attributable. ## Why this fix fixes the flakiness The scenario no longer depends on shell parsing and redirection semantics. The file contents are produced and read through explicit UTF-8 file I/O, so the approval test is measuring approval behavior instead of shell behavior. The added diagnostics mean a future failure will show the exact command/result pair instead of looking like a generic intermittent mismatch. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 13:52:36 -07:00
Ahmed Ibrahim	b02388672f	Stabilize Windows cmd-based shell test harnesses (#14958 ) ## What is flaky The Windows shell-driven integration tests in `codex-rs/core` were intermittently unstable, especially: - `apply_patch_cli_can_use_shell_command_output_as_patch_input` - `websocket_test_codex_shell_chain` - `websocket_v2_test_codex_shell_chain` ## Why it was flaky These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the `apply_patch` test also nested a PowerShell read inside `cmd /c`. There were multiple independent sources of nondeterminism in that setup: - The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise. - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it. - Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout. - The `apply_patch` test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input. So the failures were driven by shell startup and output-shape variance, not by the `apply_patch` or websocket logic themselves. ## How this PR fixes it - Add a test-only `user_shell_override` path so Windows integration tests can pin `cmd.exe` explicitly. - Use that override in the websocket shell-chain tests and in the `apply_patch` harness. - Change the nested Windows file read in `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8 PowerShell `-EncodedCommand` script. - Run that nested PowerShell process with `-NonInteractive`, set `$ProgressPreference = 'SilentlyContinue'`, and read the file with `[System.IO.File]::ReadAllText(...)`. ## Why this fix fixes the flakiness The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile `cmd` quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior. --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:21:46 +00:00
Owen Lin	6ea041032b	fix(core): prevent hanging turn/start due to websocket warming issues (#14838 ) ## Description This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change, `initialize -> thread/start -> turn/start` could sit behind the prewarm for up to five minutes, so the client would not see `turn/started`, and even `turn/interrupt` would block because the turn had not actually started yet. Now, we: - set a (configurable) timeout of 15s for websocket startup time, exposed as `websocket_startup_timeout_ms` in config.toml - `turn/started` is sent immediately on `turn/start` even if the websocket is still connecting - `turn/interrupt` can be used to cancel a turn that is still waiting on the websocket warmup - the turn task will wait for the full 15s websocket warming timeout before falling back ## Why The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes. ## Configuring the websocket startup timeout Can set it in config.toml like this: ``` [model_providers.openai] supports_websockets = true websocket_connect_timeout_ms = 15000 ```	2026-03-17 10:07:46 -07:00
Ahmed Ibrahim	fbd7f9b986	[stack 2/4] Align main realtime v2 wire and runtime flow (#14830 ) ## Stack Position 2/4. Built on top of #14828. ## Base - #14828 ## Unblocks - #14829 - #14827 ## Scope - Port the realtime v2 wire parsing, session, app-server, and conversation runtime behavior onto the split websocket-method base. - Branch runtime behavior directly on the current realtime session kind instead of parser-derived flow flags. - Keep regression coverage in the existing e2e suites. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-16 21:38:07 -07:00
pakrym-oai	a3ba10b44b	Add exit helper to code mode scripts (#14851 ) - Summary - expose `exit` through the code mode bridge and module so scripts can stop mid-flight - surface the helper in the description documentation - add a regression test ensuring `exit()` terminates execution cleanly - Testing - Not run (not requested)	2026-03-16 22:07:58 +00:00
friel-openai	ba463a9dc7	Preserve background terminals on interrupt and rename cleanup command to /stop (#14602 ) ### Motivation - Interrupting a running turn (Ctrl+C / Esc) currently also terminates long‑running background shells, which is surprising for workflows like local dev servers or file watchers. - The existing cleanup command name was confusing; callers expect an explicit command to stop background terminals rather than a UI clear action. - Make background‑shell termination explicit and surface a clearer command name while preserving backward compatibility. ### Description - Renamed the background‑terminal cleanup slash command from `Clean` (`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the command parsing/visibility layer, updated the user descriptions and command popup wiring accordingly. - Updated the unified‑exec footer text and snapshots to point to `/stop` (and trimmed corresponding snapshot output to match the new label). - Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt) no longer closes or clears tracked unified exec / background terminal processes in the TUI or core cleanup path; background shells are now preserved after an interrupt. - Updated protocol/docs to clarify that `turn/interrupt` (or `Op::Interrupt`) interrupts the active turn but does not terminate background terminals, and that `thread/backgroundTerminals/clean` is the explicit API to stop those shells. - Updated unit/integration tests and insta snapshots in the TUI and core unified‑exec suites to reflect the new semantics and command name. ### Testing - Ran formatting with `just fmt` in `codex-rs` (succeeded). - Ran `cargo test -p codex-protocol` (succeeded). - Attempted `cargo test -p codex-tui` but the build could not complete in this environment due to a native build dependency that requires `libcap` development headers (the `codex-linux-sandbox` vendored build step); install `libcap-dev` / make `libcap.pc` available in `PKG_CONFIG_PATH` to run the TUI test suite locally. - Updated and accepted the affected `insta` snapshots for the TUI changes so visual diffs reflect the new `/stop` wording and preserved interrupt behavior. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)	2026-03-15 22:17:25 -07:00
Matthew Zeng	d4af6053e2	[apps] Improve search tool fallback. (#14732 ) - [x] Bypass tool search and stuff tool specs directly into model context when either a. Tool search is not available for the model or b. There are not that many tools to search for.	2026-03-15 21:41:55 -07:00
Matthew Zeng	49edf311ac	[apps] Add tool call meta. (#14647 ) - [x] Add resource_uri and other things to _meta to shortcut resource lookup and speed things up.	2026-03-14 22:24:13 -07:00
Channing Conger	70eddad6b0	dynamic tool calls: add param `exposeToContext` to optionally hide tool (#14501 ) This extends dynamic_tool_calls to allow us to hide a tool from the model context but still use it as part of the general tool calling runtime (for ex from js_repl/code_mode)	2026-03-14 01:58:43 -07:00
sayan-oai	d272f45058	move plugin/skill instructions into dev msg and reorder (#14609 ) Move the general `Apps`, `Skills` and `Plugins` instructions blocks out of `user_instructions` and into the developer message, with new `Apps -> Skills -> Plugins` order for better clarity. Also wrap those sections in stable XML-style instruction tags (like other sections) and update prompt-layout tests/snapshots. This makes the tests less brittle in snapshot output (we can parse the sections), and it consolidates the capability instructions in one place. #### Tests Updated snapshots, added tests. `<AGENTS_MD>` disappearing in snapshots is expected: before this change, the wrapped user-instructions message was kept alive by `Skills` content. Now that `Skills` and `Plugins` are in the developer message, that wrapper only appears when there is real project-doc/user-instructions content. --------- Co-authored-by: Charley Cunningham <ccunningham@openai.com>	2026-03-13 20:51:01 -07:00
Eric Traut	4b9d5c8c1b	Add openai_base_url config override for built-in provider (#12031 ) We regularly get bug reports from users who mistakenly have the `OPENAI_BASE_URL` environment variable set. This PR deprecates this environment variable in favor of a top-level config key `openai_base_url` that is used for the same purpose. By making it a config key, it will be more visible to users. It will also participate in all of the infrastructure we've added for layered and managed configs. Summary - introduce the `openai_base_url` top-level config key, update schema/tests, and route the built-in openai provider through it while - fall back to deprecated `OPENAI_BASE_URL` env var but warn user of deprecation when no `openai_base_url` config key is present - update CLI, SDK, and TUI code to prefer the new config path (with a deprecated env-var fallback) and document the SDK behavior change	2026-03-13 20:12:25 -06:00
Andrei Eternal	9a44a7e499	[hooks] stop continuation & stop_hook_active mechanics (#14532 ) Stop hooks now receive `stop_hook_active` and enable stop hooks to loop forever if they'd like to. In the initial hooks PR, we implemented a simpler mechanic that the stop-blocking could only happen once in a row - support stop hook adding a continuation prompt to add a further task - if multiple stop-blocks happen that have continuation prompts, they are concatenated example run: ``` › hey :) • Running SessionStart hook: lighting the observatory SessionStart hook (completed) warning: Hi, I'm a session start hook for wizard-tower (startup). hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace • Aloha :) Happy to jam with you. What are we building today? • Running Stop hook: updating the guards Stop hook (blocked) warning: Wizard Tower Stop hook continuing conversation feedback: cook the stonpet • Aloha, here’s the hyperspace move for cooking a wimboltine stonpet: 1. Sear the stonpet in a hot pan with moon-oil until the edges shimmer. 2. Add star-lime, black salt, and a little fermented nebula paste. 3. Lower the heat and let it braise for 8 cosmic minutes with a splash of comet broth. 4. Finish with sky herbs and serve over warm asteroid rice. The vibe: crispy outside, tender center, deep interdimensional savor. • Running Stop hook: updating the guards Stop hook (completed) warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop. ``` .codex/config.toml ``` [features] codex_hooks = true ``` .codex/hooks.json ``` { "hooks": { "SessionStart": [ { "matcher": "startup\|resume", "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/session_start_demo.py", "timeoutSec": 10, "statusMessage": "lighting the observatory" } ] } ], "Stop": [ { "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/stop_demo_block.py", "timeoutSec": 10, "statusMessage": "updating the guards" } ] } ] } } ``` .codex/hooks/session_start_demo.py ``` #!/usr/bin/env python3 import json import sys from pathlib import Path def main() -> int: payload = json.load(sys.stdin) cwd = Path(payload.get("cwd", ".")).name or "wizard-tower" source = payload.get("source", "startup") source_label = "resume" if source == "resume" else "startup" source_prefix = ( "Resume context:" if source == "resume" else "Startup context:" ) output = { "systemMessage": ( f"Hi, I'm a session start hook for {cwd} ({source_label})." ), "hookSpecificOutput": { "hookEventName": "SessionStart", "additionalContext": ( f"{source_prefix} A wimboltine stonpet is an exotic cuisine from hyperspace" ), }, } print(json.dumps(output)) return 0 if __name__ == "__main__": raise SystemExit(main()) ``` .codex/hooks/stop_demo_block.py ``` #!/usr/bin/env python3 import json import sys def main() -> int: payload = json.load(sys.stdin) stop_hook_active = payload.get("stop_hook_active", False) last_assistant_message = payload.get("last_assistant_message") or "" char_count = len(last_assistant_message.strip()) if stop_hook_active: system_message = ( "Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop." ) print(json.dumps({"systemMessage": system_message})) else: system_message = ( f"Wizard Tower Stop hook continuing conversation" ) print(json.dumps({"systemMessage": system_message, "decision": "block", "reason": "cook the stonpet"})) return 0 if __name__ == "__main__": raise SystemExit(main()) ```	2026-03-13 15:51:19 -07:00
Charley Cunningham	bc24017d64	Add Smart Approvals guardian review across core, app-server, and TUI (#13860 ) ## Summary - add `approvals_reviewer = "user" \| "guardian_subagent"` as the runtime control for who reviews approval requests - route Smart Approvals guardian review through core for command execution, file changes, managed-network approvals, MCP approvals, and delegated/subagent approval flows - expose guardian review in app-server with temporary unstable `item/autoApprovalReview/{started,completed}` notifications carrying `targetItemId`, `review`, and `action` - update the TUI so Smart Approvals can be enabled from `/experimental`, aligned with the matching `/approvals` mode, and surfaced clearly while reviews are pending or resolved ## Runtime model This PR does not introduce a new `approval_policy`. Instead: - `approval_policy` still controls when approval is needed - `approvals_reviewer` controls who reviewable approval requests are routed to: - `user` - `guardian_subagent` `guardian_subagent` is a carefully prompted reviewer subagent that gathers relevant context and applies a risk-based decision framework before approving or denying the request. The `smart_approvals` feature flag is a rollout/UI gate. Core runtime behavior keys off `approvals_reviewer`. When Smart Approvals is enabled from the TUI, it also switches the current `/approvals` settings to the matching Smart Approvals mode so users immediately see guardian review in the active thread: - `approval_policy = on-request` - `approvals_reviewer = guardian_subagent` - `sandbox_mode = workspace-write` Users can still change `/approvals` afterward. Config-load behavior stays intentionally narrow: - plain `smart_approvals = true` in `config.toml` remains just the rollout/UI gate and does not auto-set `approvals_reviewer` - the deprecated `guardian_approval = true` alias migration does backfill `approvals_reviewer = "guardian_subagent"` in the same scope when that reviewer is not already configured there, so old configs preserve their original guardian-enabled behavior ARC remains a separate safety check. For MCP tool approvals, ARC escalations now flow into the configured reviewer instead of always bypassing guardian and forcing manual review. ## Config stability The runtime reviewer override is stable, but the config-backed app-server protocol shape is still settling. - `thread/start`, `thread/resume`, and `turn/start` keep stable `approvalsReviewer` overrides - the config-backed `approvals_reviewer` exposure returned via `config/read` (including profile-level config) is now marked `[UNSTABLE]` / experimental in the app-server protocol until we are more confident in that config surface ## App-server surface This PR intentionally keeps the guardian app-server shape narrow and temporary. It adds generic unstable lifecycle notifications: - `item/autoApprovalReview/started` - `item/autoApprovalReview/completed` with payloads of the form: - `{ threadId, turnId, targetItemId, review, action? }` `review` is currently: - `{ status, riskScore?, riskLevel?, rationale? }` - where `status` is one of `inProgress`, `approved`, `denied`, or `aborted` `action` carries the guardian action summary payload from core when available. This lets clients render temporary standalone pending-review UI, including parallel reviews, even when the underlying tool item has not been emitted yet. These notifications are explicitly documented as `[UNSTABLE]` and expected to change soon. This PR does not persist guardian review state onto `thread/read` tool items. The intended follow-up is to attach guardian review state to the reviewed tool item lifecycle instead, which would improve consistency with manual approvals and allow thread history / reconnect flows to replay guardian review state directly. ## TUI behavior - `/experimental` exposes the rollout gate as `Smart Approvals` - enabling it in the TUI enables the feature and switches the current session to the matching Smart Approvals `/approvals` mode - disabling it in the TUI clears the persisted `approvals_reviewer` override when appropriate and returns the session to default manual review when the effective reviewer changes - `/approvals` still exposes the reviewer choice directly - the TUI renders: - pending guardian review state in the live status footer, including parallel review aggregation - resolved approval/denial state in history ## Scope notes This PR includes the supporting core/runtime work needed to make Smart Approvals usable end-to-end: - shell / unified-exec / apply_patch / managed-network / MCP guardian review - delegated/subagent approval routing into guardian review - guardian review risk metadata and action summaries for app-server/TUI - config/profile/TUI handling for `smart_approvals`, `guardian_approval` alias migration, and `approvals_reviewer` - a small internal cleanup of delegated approval forwarding to dedupe fallback paths and simplify guardian-vs-parent approval waiting (no intended behavior change) Out of scope for this PR: - redesigning the existing manual approval protocol shapes - persisting guardian review state onto app-server `ThreadItem`s - delegated MCP elicitation auto-review (the current delegated MCP guardian shim only covers the legacy `RequestUserInput` path) --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-13 15:27:00 -07:00
Charley Cunningham	e3cbf913e8	Fix wait_agent expectations in core tests (#14637 ) ## Summary - update stale core tool-spec expectations from `wait` to `wait_agent` - update the prompt-caching tool-name assertion to match the renamed tool - fix the Bazel regressions introduced after #14631 renamed the multi-agent wait tool ## Testing - cargo test -p codex-core tools::spec::tests - cargo test -p codex-core suite::prompt_caching::prompt_tools_are_consistent_across_requests Co-authored-by: Codex <noreply@openai.com>	2026-03-13 15:15:59 -07:00
pakrym-oai	cb7d8f45a1	Normalize MCP tool names to code-mode safe form (#14605 ) Code mode doesn't allow `-` in names and it's better if function names and code-mode names are the same.	2026-03-13 14:50:16 -07:00
Ahmed Ibrahim	36dfb84427	Stabilize multi-agent feature flag (#14622 ) - make multi_agent stable and enabled by default - update feature and tool-spec coverage to match the new default --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-13 14:38:15 -07:00
pakrym-oai	477a2dd345	Add code_mode_only feature (#14617 ) Summary - add the code_mode_only feature flag/config schema and wire its dependency on code_mode - update code mode tool descriptions to list nested tools with detailed headers - restrict available tools for prompt and exec descriptions when code_mode_only is enabled and test the behavior Testing - Not run (not requested)	2026-03-13 13:30:19 -07:00
sayan-oai	9f2da5a9ce	chore: clarify plugin + app copy in model instructions (#14541 ) - clarify app mentions are in user messages - clarify what it means for tools to be provided via `codex_apps` MCP - add plugin descriptions (with basic sanitization) to top-level `## Plugins` section alongside the corresponding plugin names - explain that skills from plugins are prefixed with `plugin_name:` in top-level `##Plugins` section changes to more logically organize `Apps`, `Skills`, and `Plugins` instructions will be in a separate PR, as that shuffles dev + user instructions in ways that change tests broadly. ### Tests confirmed in local rollout, some new tests.	2026-03-13 10:57:41 -07:00
Jack Mousseau	59b588b8ec	Improve granular approval policy prompt (#14553 )	2026-03-13 10:42:17 -07:00
Won Park	958f93f899	sending back imagaegencall response back to responseapi (#14558 ) Sending back the ResponseItem::ImageGenerationCall as is, because it is now supported from the API-side.	2026-03-13 17:29:19 +00:00
iceweasel-oai	6b3d82daca	Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400 ) ## Summary - launch Windows sandboxed children on a private desktop instead of `Winsta0\Default` - make private desktop the default while keeping `windows.sandbox_private_desktop=false` as the escape hatch - centralize process launch through the shared `create_process_as_user(...)` path - scope the private desktop ACL to the launching logon SID ## Why Today sandboxed Windows commands run on the visible shared desktop. That leaves an avoidable same-desktop attack surface for window interaction, spoofing, and related UI/input issues. This change moves sandboxed commands onto a dedicated per-launch desktop by default so the sandbox no longer shares `Winsta0\Default` with the user session. The implementation stays conservative on security with no silent fallback back to `Winsta0\Default` If private-desktop setup fails on a machine, users can still opt out explicitly with `windows.sandbox_private_desktop=false`. ## Validation - `cargo build -p codex-cli` - elevated-path `codex exec` desktop-name probe returned `CodexSandboxDesktop-*` - elevated-path `codex exec` smoke sweep for shell commands, nested `pwsh`, jobs, and hidden `notepad` launch - unelevated-path full private-desktop compatibility sweep via `codex exec` with `-c windows.sandbox=unelevated`	2026-03-13 10:13:39 -07:00
pakrym-oai	9c9867c9fa	code mode: single line tool declarations (#14526 ) ## Summary - render code mode tool declarations as single-line TypeScript snippets - make the JSON schema renderer emit inline object shapes for these declarations - update code mode/spec expectations to match the new inline rendering ## Testing - `just fmt` - `cargo test -p codex-core render_json_schema_to_typescript` - `cargo test -p codex-core code_mode_augments_` - `cargo test -p codex-core --test all exports_all_tools_metadata -- --nocapture`	2026-03-13 10:08:34 -07:00
Ahmed Ibrahim	c7e847aaeb	Add diagnostics for read_only_unless_trusted timeout flake (#14518 ) ## Summary - add targeted diagnostic logging for the read_only_unless_trusted_requires_approval scenarios in approval_matrix_covers_all_modes - add a scoped timeout buffer only for ro_unless_trusted write-file scenarios: 1000ms -> 2000ms - keep all other write-file scenarios at 1000ms ## Why The last two main failures were both in codex-core::all suite::approvals::approval_matrix_covers_all_modes with exit_code=124 in the same scenario. This points to execution-time jitter in CI rather than a semantic approval-policy mismatch. ## Notes - This does not introduce any >5s timeout and does not disable/quarantine tests. - The timeout increase is tightly scoped to the single flaky path and keeps the matrix deterministic under CI scheduling variance.	2026-03-12 23:51:03 -07:00

1 2 3 4 5 ...

797 Commits