codex

mirror of https://github.com/openai/codex.git synced 2026-04-29 17:06:51 +00:00

Author	SHA1	Message	Date
Owen Lin	20f2a216df	feat(core, tracing): create turn spans over websockets (#14632 ) ## Description Dependent on: - [responsesapi] https://github.com/openai/openai/pull/760991 - [codex-backend] https://github.com/openai/openai/pull/760985 `codex app-server -> codex-backend -> responsesapi` now reuses a persistent websocket connection across many turns. This PR updates tracing when using websockets so that each `response.create` websocket request propagates the current tracing context, so we can get a holistic end-to-end trace for each turn. Tracing is propagated via special keys (`ws_request_header_traceparent`, `ws_request_header_tracestate`) set in the `client_metadata` param in Responses API. Currently tracing on websockets is a bit broken because we only set tracing context on ws connection time, so it's detached from a `turn/start` request.	2026-03-19 03:41:06 +00:00
pakrym-oai	56d0c6bf67	Add apply_patch code mode result (#15100 ) It's empty !	2026-03-18 16:11:10 -07:00
pakrym-oai	3590e181fa	Add update_plan code mode result (#15103 ) It's empty!	2026-03-18 16:10:51 -07:00
Charley Cunningham	ebbbc52ce4	Align SQLite feedback logs with feedback formatter (#13494 ) ## Summary - store a pre-rendered `feedback_log_body` in SQLite so `/feedback` exports keep span prefixes and structured event fields - render SQLite feedback exports with timestamps and level prefixes to match the old in-memory feedback formatter, while preserving existing trailing newlines - count `feedback_log_body` in the SQLite retention budget so structured or span-prefixed rows still prune correctly - bound `/feedback` row loading in SQL with the retention estimate, then apply exact whole-line truncation in Rust so uploads stay capped without splitting lines ## Details - add a `feedback_log_body` column to `logs` and backfill it from `message` for existing rows - capture span names plus formatted span and event fields at write time, since SQLite does not retain enough structure to reconstruct the old formatter later - keep SQLite feedback queries scoped to the requested thread plus same-process threadless rows - restore a SQL-side cumulative `estimated_bytes` cap for feedback export queries so over-retained partitions do not load every matching row before truncation - add focused formatting coverage for exported feedback lines and parity coverage against `tracing_subscriber` ## Testing - cargo test -p codex-state - just fix -p codex-state - just fmt codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 22:44:31 +00:00
Ahmed Ibrahim	7b37a0350f	Add final message prefix to realtime handoff output (#15077 ) - prefix realtime handoff output with the agent final message label for both realtime v1 and v2 - update realtime websocket and core expectations to match	2026-03-18 15:19:49 -07:00
pakrym-oai	5cada46ddf	Return image URL from view_image tool (#15072 ) Cleanup image semantics in code mode. `view_image` now returns `{image_url:string, details?: string}` `image()` now allows both string parameter and `{image_url:string, details?: string}`	2026-03-18 13:58:20 -07:00
pakrym-oai	88e5382fc4	Propagate tool errors to code mode (#15075 ) Clean up error flow to push the FunctionCallError all the way up to dispatcher and allow code mode to surface as exception.	2026-03-18 13:57:55 -07:00
pakrym-oai	606d85055f	Add notify to code-mode (#14842 ) Allows model to send an out-of-band notification. The notification is injected as another tool call output for the same call_id.	2026-03-18 09:37:13 -07:00
Dylan Hurd	84f4e7b39d	fix(subagents) share execpolicy by default (#13702 ) ## Summary If a subagent requests approval, and the user persists that approval to the execpolicy, it should (by default) propagate. We'll need to rethink this a bit in light of coming Permissions changes, though I think this is closer to the end state that we'd want, which is that execpolicy changes to one permissions profile should be synced across threads. ## Testing - [x] Added integration test --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 06:42:26 +00:00
Andrei Eternal	6fef421654	[hooks] userpromptsubmit - hook before user's prompt is executed (#14626 ) - this allows blocking the user's prompts from executing, and also prevents them from entering history - handles the edge case where you can both prevent the user's prompt AND add n amount of additionalContexts - refactors some old code into common.rs where hooks overlap functionality - refactors additionalContext being previously added to user messages, instead we use developer messages for them - handles queued messages correctly Sample hook for testing - if you write "[block-user-submit]" this hook will stop the thread: example run ``` › sup • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: sup hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' exactly once near the end. • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory lanterns lit › and [block-user-submit] • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (stopped) warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose. stop: Wizard Tower demo block: remove [block-user-submit] to continue. ``` .codex/config.toml ``` [features] codex_hooks = true ``` .codex/hooks.json ``` { "hooks": { "UserPromptSubmit": [ { "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py", "timeoutSec": 10, "statusMessage": "reading the observatory notes" } ] } ] } } ``` .codex/hooks/user_prompt_submit_demo.py ``` #!/usr/bin/env python3 import json import sys from pathlib import Path def prompt_from_payload(payload: dict) -> str: prompt = payload.get("prompt") if isinstance(prompt, str) and prompt.strip(): return prompt.strip() event = payload.get("event") if isinstance(event, dict): user_prompt = event.get("user_prompt") if isinstance(user_prompt, str): return user_prompt.strip() return "" def main() -> int: payload = json.load(sys.stdin) prompt = prompt_from_payload(payload) cwd = Path(payload.get("cwd", ".")).name or "wizard-tower" if "[block-user-submit]" in prompt: print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo blocked the prompt on purpose." ), "decision": "block", "reason": ( "Wizard Tower demo block: remove [block-user-submit] to continue." ), } ) ) return 0 prompt_preview = prompt or "(empty prompt)" if len(prompt_preview) > 80: prompt_preview = f"{prompt_preview[:77]}..." print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}" ), "hookSpecificOutput": { "hookEventName": "UserPromptSubmit", "additionalContext": ( "Wizard Tower UserPromptSubmit demo fired. " "For this reply only, include the exact phrase " "'observatory lanterns lit' exactly once near the end." ), }, } ) ) return 0 if __name__ == "__main__": raise SystemExit(main()) ```	2026-03-17 22:09:22 -07:00
Ahmed Ibrahim	3ce879c646	Handle realtime conversation end in the TUI (#14903 ) - close live realtime sessions on errors, ctrl-c, and active meter removal - centralize TUI realtime cleanup and avoid duplicate follow-up close info --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 21:04:58 -07:00
pakrym-oai	770616414a	Prefer websockets when providers support them (#13592 ) Remove all flags and model settings. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 19:46:44 -07:00
Ahmed Ibrahim	98be562fd3	Unify realtime shutdown in core (#14902 ) - route realtime startup, input, and transport failures through a single shutdown path - emit one realtime error/closed lifecycle while clearing session state once --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 15:58:52 -07:00
Ahmed Ibrahim	c6ab4ee537	Gate realtime audio interruption logic to v2 (#14984 ) - thread the realtime version into conversation start and app-server notifications - keep playback-aware mic gating and playback interruption behavior on v2 only, leaving v1 on the legacy path	2026-03-17 15:24:37 -07:00
pakrym-oai	ee756eb80f	Rename exec_wait tool to wait (#14983 ) Summary - document that code mode only exposes `exec` and the renamed `wait` tool - update code mode tool spec and descriptions to match the new tool name - rename tests and helper references from `exec_wait` to `wait` Testing - Not run (not requested)	2026-03-17 14:22:26 -07:00
Ahmed Ibrahim	4d9d4b7b0f	Stabilize approval matrix write-file command (#14968 ) ## What is flaky The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in CI even though the approval logic is unchanged, because the test delegates the file write and readback to shell parsing instead of deterministic file I/O. ## Why it was flaky The test generated a command shaped like `printf ... > file && cat file`. That means the scenario depended on shell quoting, redirection, newline handling, and encoding behavior in addition to the approval system it was actually trying to validate. If the shell interpreted the payload differently, the test would report an approval failure even though the product logic was fine. That also made failures hard to diagnose, because the test did not log the exact generated command or the parsed result payload. ## How this PR fixes it This PR replaces the shell-redirection path with a deterministic `python3 -c` script that writes the file with `Path.write_text(..., encoding='utf-8')` and then reads it back with the same UTF-8 path. It also logs the generated command and the resulting exit code/stdout for the approval scenario so any future failure is directly attributable. ## Why this fix fixes the flakiness The scenario no longer depends on shell parsing and redirection semantics. The file contents are produced and read through explicit UTF-8 file I/O, so the approval test is measuring approval behavior instead of shell behavior. The added diagnostics mean a future failure will show the exact command/result pair instead of looking like a generic intermittent mismatch. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 13:52:36 -07:00
Ahmed Ibrahim	b02388672f	Stabilize Windows cmd-based shell test harnesses (#14958 ) ## What is flaky The Windows shell-driven integration tests in `codex-rs/core` were intermittently unstable, especially: - `apply_patch_cli_can_use_shell_command_output_as_patch_input` - `websocket_test_codex_shell_chain` - `websocket_v2_test_codex_shell_chain` ## Why it was flaky These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the `apply_patch` test also nested a PowerShell read inside `cmd /c`. There were multiple independent sources of nondeterminism in that setup: - The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise. - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it. - Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout. - The `apply_patch` test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input. So the failures were driven by shell startup and output-shape variance, not by the `apply_patch` or websocket logic themselves. ## How this PR fixes it - Add a test-only `user_shell_override` path so Windows integration tests can pin `cmd.exe` explicitly. - Use that override in the websocket shell-chain tests and in the `apply_patch` harness. - Change the nested Windows file read in `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8 PowerShell `-EncodedCommand` script. - Run that nested PowerShell process with `-NonInteractive`, set `$ProgressPreference = 'SilentlyContinue'`, and read the file with `[System.IO.File]::ReadAllText(...)`. ## Why this fix fixes the flakiness The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile `cmd` quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior. --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:21:46 +00:00
Owen Lin	6ea041032b	fix(core): prevent hanging turn/start due to websocket warming issues (#14838 ) ## Description This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change, `initialize -> thread/start -> turn/start` could sit behind the prewarm for up to five minutes, so the client would not see `turn/started`, and even `turn/interrupt` would block because the turn had not actually started yet. Now, we: - set a (configurable) timeout of 15s for websocket startup time, exposed as `websocket_startup_timeout_ms` in config.toml - `turn/started` is sent immediately on `turn/start` even if the websocket is still connecting - `turn/interrupt` can be used to cancel a turn that is still waiting on the websocket warmup - the turn task will wait for the full 15s websocket warming timeout before falling back ## Why The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes. ## Configuring the websocket startup timeout Can set it in config.toml like this: ``` [model_providers.openai] supports_websockets = true websocket_connect_timeout_ms = 15000 ```	2026-03-17 10:07:46 -07:00
Ahmed Ibrahim	fbd7f9b986	[stack 2/4] Align main realtime v2 wire and runtime flow (#14830 ) ## Stack Position 2/4. Built on top of #14828. ## Base - #14828 ## Unblocks - #14829 - #14827 ## Scope - Port the realtime v2 wire parsing, session, app-server, and conversation runtime behavior onto the split websocket-method base. - Branch runtime behavior directly on the current realtime session kind instead of parser-derived flow flags. - Keep regression coverage in the existing e2e suites. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-16 21:38:07 -07:00
pakrym-oai	a3ba10b44b	Add exit helper to code mode scripts (#14851 ) - Summary - expose `exit` through the code mode bridge and module so scripts can stop mid-flight - surface the helper in the description documentation - add a regression test ensuring `exit()` terminates execution cleanly - Testing - Not run (not requested)	2026-03-16 22:07:58 +00:00
friel-openai	ba463a9dc7	Preserve background terminals on interrupt and rename cleanup command to /stop (#14602 ) ### Motivation - Interrupting a running turn (Ctrl+C / Esc) currently also terminates long‑running background shells, which is surprising for workflows like local dev servers or file watchers. - The existing cleanup command name was confusing; callers expect an explicit command to stop background terminals rather than a UI clear action. - Make background‑shell termination explicit and surface a clearer command name while preserving backward compatibility. ### Description - Renamed the background‑terminal cleanup slash command from `Clean` (`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the command parsing/visibility layer, updated the user descriptions and command popup wiring accordingly. - Updated the unified‑exec footer text and snapshots to point to `/stop` (and trimmed corresponding snapshot output to match the new label). - Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt) no longer closes or clears tracked unified exec / background terminal processes in the TUI or core cleanup path; background shells are now preserved after an interrupt. - Updated protocol/docs to clarify that `turn/interrupt` (or `Op::Interrupt`) interrupts the active turn but does not terminate background terminals, and that `thread/backgroundTerminals/clean` is the explicit API to stop those shells. - Updated unit/integration tests and insta snapshots in the TUI and core unified‑exec suites to reflect the new semantics and command name. ### Testing - Ran formatting with `just fmt` in `codex-rs` (succeeded). - Ran `cargo test -p codex-protocol` (succeeded). - Attempted `cargo test -p codex-tui` but the build could not complete in this environment due to a native build dependency that requires `libcap` development headers (the `codex-linux-sandbox` vendored build step); install `libcap-dev` / make `libcap.pc` available in `PKG_CONFIG_PATH` to run the TUI test suite locally. - Updated and accepted the affected `insta` snapshots for the TUI changes so visual diffs reflect the new `/stop` wording and preserved interrupt behavior. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)	2026-03-15 22:17:25 -07:00
Matthew Zeng	d4af6053e2	[apps] Improve search tool fallback. (#14732 ) - [x] Bypass tool search and stuff tool specs directly into model context when either a. Tool search is not available for the model or b. There are not that many tools to search for.	2026-03-15 21:41:55 -07:00
Matthew Zeng	49edf311ac	[apps] Add tool call meta. (#14647 ) - [x] Add resource_uri and other things to _meta to shortcut resource lookup and speed things up.	2026-03-14 22:24:13 -07:00
Channing Conger	70eddad6b0	dynamic tool calls: add param `exposeToContext` to optionally hide tool (#14501 ) This extends dynamic_tool_calls to allow us to hide a tool from the model context but still use it as part of the general tool calling runtime (for ex from js_repl/code_mode)	2026-03-14 01:58:43 -07:00
sayan-oai	d272f45058	move plugin/skill instructions into dev msg and reorder (#14609 ) Move the general `Apps`, `Skills` and `Plugins` instructions blocks out of `user_instructions` and into the developer message, with new `Apps -> Skills -> Plugins` order for better clarity. Also wrap those sections in stable XML-style instruction tags (like other sections) and update prompt-layout tests/snapshots. This makes the tests less brittle in snapshot output (we can parse the sections), and it consolidates the capability instructions in one place. #### Tests Updated snapshots, added tests. `<AGENTS_MD>` disappearing in snapshots is expected: before this change, the wrapped user-instructions message was kept alive by `Skills` content. Now that `Skills` and `Plugins` are in the developer message, that wrapper only appears when there is real project-doc/user-instructions content. --------- Co-authored-by: Charley Cunningham <ccunningham@openai.com>	2026-03-13 20:51:01 -07:00
Eric Traut	4b9d5c8c1b	Add openai_base_url config override for built-in provider (#12031 ) We regularly get bug reports from users who mistakenly have the `OPENAI_BASE_URL` environment variable set. This PR deprecates this environment variable in favor of a top-level config key `openai_base_url` that is used for the same purpose. By making it a config key, it will be more visible to users. It will also participate in all of the infrastructure we've added for layered and managed configs. Summary - introduce the `openai_base_url` top-level config key, update schema/tests, and route the built-in openai provider through it while - fall back to deprecated `OPENAI_BASE_URL` env var but warn user of deprecation when no `openai_base_url` config key is present - update CLI, SDK, and TUI code to prefer the new config path (with a deprecated env-var fallback) and document the SDK behavior change	2026-03-13 20:12:25 -06:00
Andrei Eternal	9a44a7e499	[hooks] stop continuation & stop_hook_active mechanics (#14532 ) Stop hooks now receive `stop_hook_active` and enable stop hooks to loop forever if they'd like to. In the initial hooks PR, we implemented a simpler mechanic that the stop-blocking could only happen once in a row - support stop hook adding a continuation prompt to add a further task - if multiple stop-blocks happen that have continuation prompts, they are concatenated example run: ``` › hey :) • Running SessionStart hook: lighting the observatory SessionStart hook (completed) warning: Hi, I'm a session start hook for wizard-tower (startup). hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace • Aloha :) Happy to jam with you. What are we building today? • Running Stop hook: updating the guards Stop hook (blocked) warning: Wizard Tower Stop hook continuing conversation feedback: cook the stonpet • Aloha, here’s the hyperspace move for cooking a wimboltine stonpet: 1. Sear the stonpet in a hot pan with moon-oil until the edges shimmer. 2. Add star-lime, black salt, and a little fermented nebula paste. 3. Lower the heat and let it braise for 8 cosmic minutes with a splash of comet broth. 4. Finish with sky herbs and serve over warm asteroid rice. The vibe: crispy outside, tender center, deep interdimensional savor. • Running Stop hook: updating the guards Stop hook (completed) warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop. ``` .codex/config.toml ``` [features] codex_hooks = true ``` .codex/hooks.json ``` { "hooks": { "SessionStart": [ { "matcher": "startup\|resume", "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/session_start_demo.py", "timeoutSec": 10, "statusMessage": "lighting the observatory" } ] } ], "Stop": [ { "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/stop_demo_block.py", "timeoutSec": 10, "statusMessage": "updating the guards" } ] } ] } } ``` .codex/hooks/session_start_demo.py ``` #!/usr/bin/env python3 import json import sys from pathlib import Path def main() -> int: payload = json.load(sys.stdin) cwd = Path(payload.get("cwd", ".")).name or "wizard-tower" source = payload.get("source", "startup") source_label = "resume" if source == "resume" else "startup" source_prefix = ( "Resume context:" if source == "resume" else "Startup context:" ) output = { "systemMessage": ( f"Hi, I'm a session start hook for {cwd} ({source_label})." ), "hookSpecificOutput": { "hookEventName": "SessionStart", "additionalContext": ( f"{source_prefix} A wimboltine stonpet is an exotic cuisine from hyperspace" ), }, } print(json.dumps(output)) return 0 if __name__ == "__main__": raise SystemExit(main()) ``` .codex/hooks/stop_demo_block.py ``` #!/usr/bin/env python3 import json import sys def main() -> int: payload = json.load(sys.stdin) stop_hook_active = payload.get("stop_hook_active", False) last_assistant_message = payload.get("last_assistant_message") or "" char_count = len(last_assistant_message.strip()) if stop_hook_active: system_message = ( "Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop." ) print(json.dumps({"systemMessage": system_message})) else: system_message = ( f"Wizard Tower Stop hook continuing conversation" ) print(json.dumps({"systemMessage": system_message, "decision": "block", "reason": "cook the stonpet"})) return 0 if __name__ == "__main__": raise SystemExit(main()) ```	2026-03-13 15:51:19 -07:00
Charley Cunningham	bc24017d64	Add Smart Approvals guardian review across core, app-server, and TUI (#13860 ) ## Summary - add `approvals_reviewer = "user" \| "guardian_subagent"` as the runtime control for who reviews approval requests - route Smart Approvals guardian review through core for command execution, file changes, managed-network approvals, MCP approvals, and delegated/subagent approval flows - expose guardian review in app-server with temporary unstable `item/autoApprovalReview/{started,completed}` notifications carrying `targetItemId`, `review`, and `action` - update the TUI so Smart Approvals can be enabled from `/experimental`, aligned with the matching `/approvals` mode, and surfaced clearly while reviews are pending or resolved ## Runtime model This PR does not introduce a new `approval_policy`. Instead: - `approval_policy` still controls when approval is needed - `approvals_reviewer` controls who reviewable approval requests are routed to: - `user` - `guardian_subagent` `guardian_subagent` is a carefully prompted reviewer subagent that gathers relevant context and applies a risk-based decision framework before approving or denying the request. The `smart_approvals` feature flag is a rollout/UI gate. Core runtime behavior keys off `approvals_reviewer`. When Smart Approvals is enabled from the TUI, it also switches the current `/approvals` settings to the matching Smart Approvals mode so users immediately see guardian review in the active thread: - `approval_policy = on-request` - `approvals_reviewer = guardian_subagent` - `sandbox_mode = workspace-write` Users can still change `/approvals` afterward. Config-load behavior stays intentionally narrow: - plain `smart_approvals = true` in `config.toml` remains just the rollout/UI gate and does not auto-set `approvals_reviewer` - the deprecated `guardian_approval = true` alias migration does backfill `approvals_reviewer = "guardian_subagent"` in the same scope when that reviewer is not already configured there, so old configs preserve their original guardian-enabled behavior ARC remains a separate safety check. For MCP tool approvals, ARC escalations now flow into the configured reviewer instead of always bypassing guardian and forcing manual review. ## Config stability The runtime reviewer override is stable, but the config-backed app-server protocol shape is still settling. - `thread/start`, `thread/resume`, and `turn/start` keep stable `approvalsReviewer` overrides - the config-backed `approvals_reviewer` exposure returned via `config/read` (including profile-level config) is now marked `[UNSTABLE]` / experimental in the app-server protocol until we are more confident in that config surface ## App-server surface This PR intentionally keeps the guardian app-server shape narrow and temporary. It adds generic unstable lifecycle notifications: - `item/autoApprovalReview/started` - `item/autoApprovalReview/completed` with payloads of the form: - `{ threadId, turnId, targetItemId, review, action? }` `review` is currently: - `{ status, riskScore?, riskLevel?, rationale? }` - where `status` is one of `inProgress`, `approved`, `denied`, or `aborted` `action` carries the guardian action summary payload from core when available. This lets clients render temporary standalone pending-review UI, including parallel reviews, even when the underlying tool item has not been emitted yet. These notifications are explicitly documented as `[UNSTABLE]` and expected to change soon. This PR does not persist guardian review state onto `thread/read` tool items. The intended follow-up is to attach guardian review state to the reviewed tool item lifecycle instead, which would improve consistency with manual approvals and allow thread history / reconnect flows to replay guardian review state directly. ## TUI behavior - `/experimental` exposes the rollout gate as `Smart Approvals` - enabling it in the TUI enables the feature and switches the current session to the matching Smart Approvals `/approvals` mode - disabling it in the TUI clears the persisted `approvals_reviewer` override when appropriate and returns the session to default manual review when the effective reviewer changes - `/approvals` still exposes the reviewer choice directly - the TUI renders: - pending guardian review state in the live status footer, including parallel review aggregation - resolved approval/denial state in history ## Scope notes This PR includes the supporting core/runtime work needed to make Smart Approvals usable end-to-end: - shell / unified-exec / apply_patch / managed-network / MCP guardian review - delegated/subagent approval routing into guardian review - guardian review risk metadata and action summaries for app-server/TUI - config/profile/TUI handling for `smart_approvals`, `guardian_approval` alias migration, and `approvals_reviewer` - a small internal cleanup of delegated approval forwarding to dedupe fallback paths and simplify guardian-vs-parent approval waiting (no intended behavior change) Out of scope for this PR: - redesigning the existing manual approval protocol shapes - persisting guardian review state onto app-server `ThreadItem`s - delegated MCP elicitation auto-review (the current delegated MCP guardian shim only covers the legacy `RequestUserInput` path) --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-13 15:27:00 -07:00
Charley Cunningham	e3cbf913e8	Fix wait_agent expectations in core tests (#14637 ) ## Summary - update stale core tool-spec expectations from `wait` to `wait_agent` - update the prompt-caching tool-name assertion to match the renamed tool - fix the Bazel regressions introduced after #14631 renamed the multi-agent wait tool ## Testing - cargo test -p codex-core tools::spec::tests - cargo test -p codex-core suite::prompt_caching::prompt_tools_are_consistent_across_requests Co-authored-by: Codex <noreply@openai.com>	2026-03-13 15:15:59 -07:00
pakrym-oai	cb7d8f45a1	Normalize MCP tool names to code-mode safe form (#14605 ) Code mode doesn't allow `-` in names and it's better if function names and code-mode names are the same.	2026-03-13 14:50:16 -07:00
Ahmed Ibrahim	36dfb84427	Stabilize multi-agent feature flag (#14622 ) - make multi_agent stable and enabled by default - update feature and tool-spec coverage to match the new default --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-13 14:38:15 -07:00
pakrym-oai	477a2dd345	Add code_mode_only feature (#14617 ) Summary - add the code_mode_only feature flag/config schema and wire its dependency on code_mode - update code mode tool descriptions to list nested tools with detailed headers - restrict available tools for prompt and exec descriptions when code_mode_only is enabled and test the behavior Testing - Not run (not requested)	2026-03-13 13:30:19 -07:00
sayan-oai	9f2da5a9ce	chore: clarify plugin + app copy in model instructions (#14541 ) - clarify app mentions are in user messages - clarify what it means for tools to be provided via `codex_apps` MCP - add plugin descriptions (with basic sanitization) to top-level `## Plugins` section alongside the corresponding plugin names - explain that skills from plugins are prefixed with `plugin_name:` in top-level `##Plugins` section changes to more logically organize `Apps`, `Skills`, and `Plugins` instructions will be in a separate PR, as that shuffles dev + user instructions in ways that change tests broadly. ### Tests confirmed in local rollout, some new tests.	2026-03-13 10:57:41 -07:00
Jack Mousseau	59b588b8ec	Improve granular approval policy prompt (#14553 )	2026-03-13 10:42:17 -07:00
Won Park	958f93f899	sending back imagaegencall response back to responseapi (#14558 ) Sending back the ResponseItem::ImageGenerationCall as is, because it is now supported from the API-side.	2026-03-13 17:29:19 +00:00
iceweasel-oai	6b3d82daca	Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400 ) ## Summary - launch Windows sandboxed children on a private desktop instead of `Winsta0\Default` - make private desktop the default while keeping `windows.sandbox_private_desktop=false` as the escape hatch - centralize process launch through the shared `create_process_as_user(...)` path - scope the private desktop ACL to the launching logon SID ## Why Today sandboxed Windows commands run on the visible shared desktop. That leaves an avoidable same-desktop attack surface for window interaction, spoofing, and related UI/input issues. This change moves sandboxed commands onto a dedicated per-launch desktop by default so the sandbox no longer shares `Winsta0\Default` with the user session. The implementation stays conservative on security with no silent fallback back to `Winsta0\Default` If private-desktop setup fails on a machine, users can still opt out explicitly with `windows.sandbox_private_desktop=false`. ## Validation - `cargo build -p codex-cli` - elevated-path `codex exec` desktop-name probe returned `CodexSandboxDesktop-*` - elevated-path `codex exec` smoke sweep for shell commands, nested `pwsh`, jobs, and hidden `notepad` launch - unelevated-path full private-desktop compatibility sweep via `codex exec` with `-c windows.sandbox=unelevated`	2026-03-13 10:13:39 -07:00
pakrym-oai	9c9867c9fa	code mode: single line tool declarations (#14526 ) ## Summary - render code mode tool declarations as single-line TypeScript snippets - make the JSON schema renderer emit inline object shapes for these declarations - update code mode/spec expectations to match the new inline rendering ## Testing - `just fmt` - `cargo test -p codex-core render_json_schema_to_typescript` - `cargo test -p codex-core code_mode_augments_` - `cargo test -p codex-core --test all exports_all_tools_metadata -- --nocapture`	2026-03-13 10:08:34 -07:00
Ahmed Ibrahim	c7e847aaeb	Add diagnostics for read_only_unless_trusted timeout flake (#14518 ) ## Summary - add targeted diagnostic logging for the read_only_unless_trusted_requires_approval scenarios in approval_matrix_covers_all_modes - add a scoped timeout buffer only for ro_unless_trusted write-file scenarios: 1000ms -> 2000ms - keep all other write-file scenarios at 1000ms ## Why The last two main failures were both in codex-core::all suite::approvals::approval_matrix_covers_all_modes with exit_code=124 in the same scenario. This points to execution-time jitter in CI rather than a semantic approval-policy mismatch. ## Notes - This does not introduce any >5s timeout and does not disable/quarantine tests. - The timeout increase is tightly scoped to the single flaky path and keeps the matrix deterministic under CI scheduling variance.	2026-03-12 23:51:03 -07:00
Jack Mousseau	7c7e267501	Simplify permissions available in request permissions tool (#14529 )	2026-03-12 21:13:17 -07:00
Channing Conger	0daffe667a	code_mode: Move exec params from runtime declarations to @pragma (#14511 ) This change moves code_mode exec session settings out of the runtime API and into an optional first-line pragma, so instead of calling runtime helpers like set_yield_time() or set_max_output_tokens_per_exec_call(), the model can write // @exec: {"yield_time_ms": ..., "max_output_tokens": ...} at the top of the freeform exec source. Rust now parses that pragma before building the source, validates it, and passes the values directly in the exec start message to the code-mode broker, which applies them at session start without any worker-runtime mutation path. The @openai/code_mode module no longer exposes those setter functions, the docs and grammar were updated to describe the pragma form, and the existing code_mode tests were converted to use pragma-based configuration instead.	2026-03-13 03:27:42 +00:00
alexsong-oai	1a363d5fcf	Add plugin usage telemetry (#14531 ) adding metrics including: * plugin used * plugin installed/uninstalled * plugin enabled/disabled	2026-03-12 19:22:30 -07:00
Jack Mousseau	b7dba72dbd	Rename reject approval policy to granular (#14516 )	2026-03-12 16:38:04 -07:00
Eric Traut	d32820ab07	Fix `codex exec --profile` handling (#14524 ) PR #14005 introduced a regression whereby `codex exec --profile` overrides were dropped when starting or resuming a thread. That causes the thread to miss profile-scoped settings like `model_instructions_file`. This PR preserve the active profile in the thread start/resume config overrides so the app-server rebuild sees the same profile that exec resolved. Fixes #14515	2026-03-12 17:34:25 -06:00
Rasmus Rygaard	53d5972226	Reapply "Pass more params to compaction" (#14298 ) (#14521 ) This reverts commit `8af97ce4b0`. Confirmed that this runs locally without the previous issues with tool use	2026-03-12 23:27:21 +00:00
Anton Panasenko	651717323c	feat(search_tool): gate search_tool on model supports_search_tool field (#14502 )	2026-03-12 16:03:50 -07:00
pakrym-oai	a2546d5dff	Expose code-mode tools through globals (#14517 ) Summary - make all code-mode tools accessible as globals so callers only need `tools.<name>` - rename text/image helpers and key globals (store, load, ALL_TOOLS, etc.) to reflect the new shared namespace - update the JS bridge, runners, descriptions, router, and tests to follow the new API Testing - Not run (not requested)	2026-03-12 15:43:59 -07:00
Jack Mousseau	a314c7d3ae	Decouple request permissions feature and tool (#14426 )	2026-03-12 14:47:08 -07:00
pakrym-oai	04e14bdf23	Rename exec session IDs to cell IDs (#14510 ) - Update the code-mode executor, wait handler, and protocol plumbing to use cell IDs instead of session IDs for node communication - Switch tool metadata, wait description, and suite tests to refer to cell IDs so user-visible messages match the new terminology Testing - Not run (not requested)	2026-03-12 14:05:30 -07:00
pakrym-oai	dadffd27d4	Fix MCP tool calling (#14491 ) Properly escape mcp tool names and make tools only available via imports.	2026-03-12 13:38:52 -07:00
pakrym-oai	a5a4899d0c	Skip nested tool call parallel test on Windows (#14505 ) Summary - disable the `code_mode_nested_tool_calls_can_run_in_parallel` test on Windows where `exec_command` is unavailable Testing - Not run (not requested)	2026-03-12 13:32:11 -07:00

1 2 3 4 5 ...

885 Commits