Commit Graph

885 Commits

Author SHA1 Message Date
Owen Lin
20f2a216df feat(core, tracing): create turn spans over websockets (#14632)
## Description

Dependent on:
- [responsesapi] https://github.com/openai/openai/pull/760991 
- [codex-backend] https://github.com/openai/openai/pull/760985

`codex app-server -> codex-backend -> responsesapi` now reuses a
persistent websocket connection across many turns. This PR updates
tracing when using websockets so that each `response.create` websocket
request propagates the current tracing context, so we can get a holistic
end-to-end trace for each turn.

Tracing is propagated via special keys (`ws_request_header_traceparent`,
`ws_request_header_tracestate`) set in the `client_metadata` param in
Responses API.

Currently tracing on websockets is a bit broken because we only set
tracing context on ws connection time, so it's detached from a
`turn/start` request.
2026-03-19 03:41:06 +00:00
pakrym-oai
56d0c6bf67 Add apply_patch code mode result (#15100)
It's empty !
2026-03-18 16:11:10 -07:00
pakrym-oai
3590e181fa Add update_plan code mode result (#15103)
It's empty!
2026-03-18 16:10:51 -07:00
Charley Cunningham
ebbbc52ce4 Align SQLite feedback logs with feedback formatter (#13494)
## Summary
- store a pre-rendered `feedback_log_body` in SQLite so `/feedback`
exports keep span prefixes and structured event fields
- render SQLite feedback exports with timestamps and level prefixes to
match the old in-memory feedback formatter, while preserving existing
trailing newlines
- count `feedback_log_body` in the SQLite retention budget so structured
or span-prefixed rows still prune correctly
- bound `/feedback` row loading in SQL with the retention estimate, then
apply exact whole-line truncation in Rust so uploads stay capped without
splitting lines

## Details
- add a `feedback_log_body` column to `logs` and backfill it from
`message` for existing rows
- capture span names plus formatted span and event fields at write time,
since SQLite does not retain enough structure to reconstruct the old
formatter later
- keep SQLite feedback queries scoped to the requested thread plus
same-process threadless rows
- restore a SQL-side cumulative `estimated_bytes` cap for feedback
export queries so over-retained partitions do not load every matching
row before truncation
- add focused formatting coverage for exported feedback lines and parity
coverage against `tracing_subscriber`

## Testing
- cargo test -p codex-state
- just fix -p codex-state
- just fmt

codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-18 22:44:31 +00:00
Ahmed Ibrahim
7b37a0350f Add final message prefix to realtime handoff output (#15077)
- prefix realtime handoff output with the agent final message label for
both realtime v1 and v2
- update realtime websocket and core expectations to match
2026-03-18 15:19:49 -07:00
pakrym-oai
5cada46ddf Return image URL from view_image tool (#15072)
Cleanup image semantics in code mode.

`view_image` now returns `{image_url:string, details?: string}` 

`image()` now allows both string parameter and `{image_url:string,
details?: string}`
2026-03-18 13:58:20 -07:00
pakrym-oai
88e5382fc4 Propagate tool errors to code mode (#15075)
Clean up error flow to push the FunctionCallError all the way up to
dispatcher and allow code mode to surface as exception.
2026-03-18 13:57:55 -07:00
pakrym-oai
606d85055f Add notify to code-mode (#14842)
Allows model to send an out-of-band notification.

The notification is injected as another tool call output for the same
call_id.
2026-03-18 09:37:13 -07:00
Dylan Hurd
84f4e7b39d fix(subagents) share execpolicy by default (#13702)
## Summary
If a subagent requests approval, and the user persists that approval to
the execpolicy, it should (by default) propagate. We'll need to rethink
this a bit in light of coming Permissions changes, though I think this
is closer to the end state that we'd want, which is that execpolicy
changes to one permissions profile should be synced across threads.

## Testing
- [x] Added integration test

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-18 06:42:26 +00:00
Andrei Eternal
6fef421654 [hooks] userpromptsubmit - hook before user's prompt is executed (#14626)
- this allows blocking the user's prompts from executing, and also
prevents them from entering history
- handles the edge case where you can both prevent the user's prompt AND
add n amount of additionalContexts
- refactors some old code into common.rs where hooks overlap
functionality
- refactors additionalContext being previously added to user messages,
instead we use developer messages for them
- handles queued messages correctly

Sample hook for testing - if you write "[block-user-submit]" this hook
will stop the thread:

example run
```
› sup


• Running UserPromptSubmit hook: reading the observatory notes

UserPromptSubmit hook (completed)
  warning: wizard-tower UserPromptSubmit demo inspected: sup
  hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' exactly once near the end.

• Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
  lanterns lit


› and [block-user-submit]


• Running UserPromptSubmit hook: reading the observatory notes

UserPromptSubmit hook (stopped)
  warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
  stop: Wizard Tower demo block: remove [block-user-submit] to continue.
```

.codex/config.toml
```
[features]
codex_hooks = true
```

.codex/hooks.json
```
{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
            "timeoutSec": 10,
            "statusMessage": "reading the observatory notes"
          }
        ]
      }
    ]
  }
}
```

.codex/hooks/user_prompt_submit_demo.py
```
#!/usr/bin/env python3

import json
import sys
from pathlib import Path


def prompt_from_payload(payload: dict) -> str:
    prompt = payload.get("prompt")
    if isinstance(prompt, str) and prompt.strip():
        return prompt.strip()

    event = payload.get("event")
    if isinstance(event, dict):
        user_prompt = event.get("user_prompt")
        if isinstance(user_prompt, str):
            return user_prompt.strip()

    return ""


def main() -> int:
    payload = json.load(sys.stdin)
    prompt = prompt_from_payload(payload)
    cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"

    if "[block-user-submit]" in prompt:
        print(
            json.dumps(
                {
                    "systemMessage": (
                        f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
                    ),
                    "decision": "block",
                    "reason": (
                        "Wizard Tower demo block: remove [block-user-submit] to continue."
                    ),
                }
            )
        )
        return 0

    prompt_preview = prompt or "(empty prompt)"
    if len(prompt_preview) > 80:
        prompt_preview = f"{prompt_preview[:77]}..."

    print(
        json.dumps(
            {
                "systemMessage": (
                    f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
                ),
                "hookSpecificOutput": {
                    "hookEventName": "UserPromptSubmit",
                    "additionalContext": (
                        "Wizard Tower UserPromptSubmit demo fired. "
                        "For this reply only, include the exact phrase "
                        "'observatory lanterns lit' exactly once near the end."
                    ),
                },
            }
        )
    )
    return 0


if __name__ == "__main__":
    raise SystemExit(main())
```
2026-03-17 22:09:22 -07:00
Ahmed Ibrahim
3ce879c646 Handle realtime conversation end in the TUI (#14903)
- close live realtime sessions on errors, ctrl-c, and active meter
removal
- centralize TUI realtime cleanup and avoid duplicate follow-up close
info

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
2026-03-17 21:04:58 -07:00
pakrym-oai
770616414a Prefer websockets when providers support them (#13592)
Remove all flags and model settings.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 19:46:44 -07:00
Ahmed Ibrahim
98be562fd3 Unify realtime shutdown in core (#14902)
- route realtime startup, input, and transport failures through a single
shutdown path
- emit one realtime error/closed lifecycle while clearing session state
once

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
2026-03-17 15:58:52 -07:00
Ahmed Ibrahim
c6ab4ee537 Gate realtime audio interruption logic to v2 (#14984)
- thread the realtime version into conversation start and app-server
notifications
- keep playback-aware mic gating and playback interruption behavior on
v2 only, leaving v1 on the legacy path
2026-03-17 15:24:37 -07:00
pakrym-oai
ee756eb80f Rename exec_wait tool to wait (#14983)
Summary
- document that code mode only exposes `exec` and the renamed `wait`
tool
- update code mode tool spec and descriptions to match the new tool name
- rename tests and helper references from `exec_wait` to `wait`

Testing
- Not run (not requested)
2026-03-17 14:22:26 -07:00
Ahmed Ibrahim
4d9d4b7b0f Stabilize approval matrix write-file command (#14968)
## What is flaky
The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in
CI even though the approval logic is unchanged, because the test
delegates the file write and readback to shell parsing instead of
deterministic file I/O.

## Why it was flaky
The test generated a command shaped like `printf ... > file && cat
file`. That means the scenario depended on shell quoting, redirection,
newline handling, and encoding behavior in addition to the approval
system it was actually trying to validate. If the shell interpreted the
payload differently, the test would report an approval failure even
though the product logic was fine.

That also made failures hard to diagnose, because the test did not log
the exact generated command or the parsed result payload.

## How this PR fixes it
This PR replaces the shell-redirection path with a deterministic
`python3 -c` script that writes the file with `Path.write_text(...,
encoding='utf-8')` and then reads it back with the same UTF-8 path. It
also logs the generated command and the resulting exit code/stdout for
the approval scenario so any future failure is directly attributable.

## Why this fix fixes the flakiness
The scenario no longer depends on shell parsing and redirection
semantics. The file contents are produced and read through explicit
UTF-8 file I/O, so the approval test is measuring approval behavior
instead of shell behavior. The added diagnostics mean a future failure
will show the exact command/result pair instead of looking like a
generic intermittent mismatch.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 13:52:36 -07:00
Ahmed Ibrahim
b02388672f Stabilize Windows cmd-based shell test harnesses (#14958)
## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:

- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`

## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.

There were multiple independent sources of nondeterminism in that setup:

- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.

So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.

## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.

## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.

---------

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 20:21:46 +00:00
Owen Lin
6ea041032b fix(core): prevent hanging turn/start due to websocket warming issues (#14838)
## Description

This PR fixes a bad first-turn failure mode in app-server when the
startup websocket prewarm hangs. Before this change, `initialize ->
thread/start -> turn/start` could sit behind the prewarm for up to five
minutes, so the client would not see `turn/started`, and even
`turn/interrupt` would block because the turn had not actually started
yet.

Now, we:
- set a (configurable) timeout of 15s for websocket startup time,
exposed as `websocket_startup_timeout_ms` in config.toml
- `turn/started` is sent immediately on `turn/start` even if the
websocket is still connecting
- `turn/interrupt` can be used to cancel a turn that is still waiting on
the websocket warmup
- the turn task will wait for the full 15s websocket warming timeout
before falling back

## Why

The old behavior made app-server feel stuck at exactly the moment the
client expects turn lifecycle events to start flowing. That was
especially painful for external clients, because from their point of
view the server had accepted the request but then went silent for
minutes.

## Configuring the websocket startup timeout
Can set it in config.toml like this:
```
[model_providers.openai]
supports_websockets = true
websocket_connect_timeout_ms = 15000
```
2026-03-17 10:07:46 -07:00
Ahmed Ibrahim
fbd7f9b986 [stack 2/4] Align main realtime v2 wire and runtime flow (#14830)
## Stack Position
2/4. Built on top of #14828.

## Base
- #14828

## Unblocks
- #14829
- #14827

## Scope
- Port the realtime v2 wire parsing, session, app-server, and
conversation runtime behavior onto the split websocket-method base.
- Branch runtime behavior directly on the current realtime session kind
instead of parser-derived flow flags.
- Keep regression coverage in the existing e2e suites.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-16 21:38:07 -07:00
pakrym-oai
a3ba10b44b Add exit helper to code mode scripts (#14851)
- **Summary**
- expose `exit` through the code mode bridge and module so scripts can
stop mid-flight
  - surface the helper in the description documentation
  - add a regression test ensuring `exit()` terminates execution cleanly
- **Testing**
  - Not run (not requested)
2026-03-16 22:07:58 +00:00
friel-openai
ba463a9dc7 Preserve background terminals on interrupt and rename cleanup command to /stop (#14602)
### Motivation
- Interrupting a running turn (Ctrl+C / Esc) currently also terminates
long‑running background shells, which is surprising for workflows like
local dev servers or file watchers.
- The existing cleanup command name was confusing; callers expect an
explicit command to stop background terminals rather than a UI clear
action.
- Make background‑shell termination explicit and surface a clearer
command name while preserving backward compatibility.

### Description
- Renamed the background‑terminal cleanup slash command from `Clean`
(`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the
command parsing/visibility layer, updated the user descriptions and
command popup wiring accordingly.
- Updated the unified‑exec footer text and snapshots to point to `/stop`
(and trimmed corresponding snapshot output to match the new label).
- Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt)
no longer closes or clears tracked unified exec / background terminal
processes in the TUI or core cleanup path; background shells are now
preserved after an interrupt.
- Updated protocol/docs to clarify that `turn/interrupt` (or
`Op::Interrupt`) interrupts the active turn but does not terminate
background terminals, and that `thread/backgroundTerminals/clean` is the
explicit API to stop those shells.
- Updated unit/integration tests and insta snapshots in the TUI and core
unified‑exec suites to reflect the new semantics and command name.

### Testing
- Ran formatting with `just fmt` in `codex-rs` (succeeded). 
- Ran `cargo test -p codex-protocol` (succeeded). 
- Attempted `cargo test -p codex-tui` but the build could not complete
in this environment due to a native build dependency that requires
`libcap` development headers (the `codex-linux-sandbox` vendored build
step); install `libcap-dev` / make `libcap.pc` available in
`PKG_CONFIG_PATH` to run the TUI test suite locally.
- Updated and accepted the affected `insta` snapshots for the TUI
changes so visual diffs reflect the new `/stop` wording and preserved
interrupt behavior.

------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)
2026-03-15 22:17:25 -07:00
Matthew Zeng
d4af6053e2 [apps] Improve search tool fallback. (#14732)
- [x] Bypass tool search and stuff tool specs directly into model
context when either a. Tool search is not available for the model or b.
There are not that many tools to search for.
2026-03-15 21:41:55 -07:00
Matthew Zeng
49edf311ac [apps] Add tool call meta. (#14647)
- [x] Add resource_uri and other things to _meta to shortcut resource
lookup and speed things up.
2026-03-14 22:24:13 -07:00
Channing Conger
70eddad6b0 dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
This extends dynamic_tool_calls to allow us to hide a tool from the
model context but still use it as part of the general tool calling
runtime (for ex from js_repl/code_mode)
2026-03-14 01:58:43 -07:00
sayan-oai
d272f45058 move plugin/skill instructions into dev msg and reorder (#14609)
Move the general `Apps`, `Skills` and `Plugins` instructions blocks out
of `user_instructions` and into the developer message, with new `Apps ->
Skills -> Plugins` order for better clarity.

Also wrap those sections in stable XML-style instruction tags (like
other sections) and update prompt-layout tests/snapshots. This makes the
tests less brittle in snapshot output (we can parse the sections), and
it consolidates the capability instructions in one place.

#### Tests
Updated snapshots, added tests.

`<AGENTS_MD>` disappearing in snapshots is expected: before this change,
the wrapped user-instructions message was kept alive by `Skills`
content. Now that `Skills` and `Plugins` are in the developer message,
that wrapper only appears when there is real
project-doc/user-instructions content.

---------

Co-authored-by: Charley Cunningham <ccunningham@openai.com>
2026-03-13 20:51:01 -07:00
Eric Traut
4b9d5c8c1b Add openai_base_url config override for built-in provider (#12031)
We regularly get bug reports from users who mistakenly have the
`OPENAI_BASE_URL` environment variable set. This PR deprecates this
environment variable in favor of a top-level config key
`openai_base_url` that is used for the same purpose. By making it a
config key, it will be more visible to users. It will also participate
in all of the infrastructure we've added for layered and managed
configs.

Summary
- introduce the `openai_base_url` top-level config key, update
schema/tests, and route the built-in openai provider through it while
- fall back to deprecated `OPENAI_BASE_URL` env var but warn user of
deprecation when no `openai_base_url` config key is present
- update CLI, SDK, and TUI code to prefer the new config path (with a
deprecated env-var fallback) and document the SDK behavior change
2026-03-13 20:12:25 -06:00
Andrei Eternal
9a44a7e499 [hooks] stop continuation & stop_hook_active mechanics (#14532)
Stop hooks now receive `stop_hook_active` and enable stop hooks to loop
forever if they'd like to. In the initial hooks PR, we implemented a
simpler mechanic that the stop-blocking could only happen once in a row

- support stop hook adding a continuation prompt to add a further task
- if multiple stop-blocks happen that have continuation prompts, they
are concatenated

example run:
```
› hey :)


• Running SessionStart hook: lighting the observatory

SessionStart hook (completed)
  warning: Hi, I'm a session start hook for wizard-tower (startup).
  hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace

• Aloha :) Happy to jam with you. What are we building today?

• Running Stop hook: updating the guards

Stop hook (blocked)
  warning: Wizard Tower Stop hook continuing conversation
  feedback: cook the stonpet

• Aloha, here’s the hyperspace move for cooking a wimboltine stonpet:

  1. Sear the stonpet in a hot pan with moon-oil until the edges shimmer.
  2. Add star-lime, black salt, and a little fermented nebula paste.
  3. Lower the heat and let it braise for 8 cosmic minutes with a splash of comet broth.
  4. Finish with sky herbs and serve over warm asteroid rice.

  The vibe: crispy outside, tender center, deep interdimensional savor.

• Running Stop hook: updating the guards

Stop hook (completed)
  warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
```

.codex/config.toml
```
[features]
codex_hooks = true
```

.codex/hooks.json
```
{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "startup|resume",
        "hooks": [
          {
            "type": "command",
            "command": "/usr/bin/python3 .codex/hooks/session_start_demo.py",
            "timeoutSec": 10,
            "statusMessage": "lighting the observatory"
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/usr/bin/python3 .codex/hooks/stop_demo_block.py",
            "timeoutSec": 10,
            "statusMessage": "updating the guards"
          }
        ]
      }
    ]
  }
}
```

.codex/hooks/session_start_demo.py
```
#!/usr/bin/env python3

import json
import sys
from pathlib import Path


def main() -> int:
    payload = json.load(sys.stdin)
    cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
    source = payload.get("source", "startup")
    source_label = "resume" if source == "resume" else "startup"
    source_prefix = (
        "Resume context:"
        if source == "resume"
        else "Startup context:"
    )

    output = {
        "systemMessage": (
            f"Hi, I'm a session start hook for {cwd} ({source_label})."
        ),
        "hookSpecificOutput": {
            "hookEventName": "SessionStart",
            "additionalContext": (
                f"{source_prefix} A wimboltine stonpet is an exotic cuisine from hyperspace"
            ),
        },
    }
    print(json.dumps(output))
    return 0


if __name__ == "__main__":
    raise SystemExit(main())
```

.codex/hooks/stop_demo_block.py
```
#!/usr/bin/env python3

import json
import sys


def main() -> int:
    payload = json.load(sys.stdin)
    stop_hook_active = payload.get("stop_hook_active", False)
    last_assistant_message = payload.get("last_assistant_message") or ""
    char_count = len(last_assistant_message.strip())

    if stop_hook_active:
        system_message = (
            "Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop."
        )
        print(json.dumps({"systemMessage": system_message}))
    else:
        system_message = (
            f"Wizard Tower Stop hook continuing conversation"
        )
        print(json.dumps({"systemMessage": system_message, "decision": "block", "reason": "cook the stonpet"}))

    return 0


if __name__ == "__main__":
    raise SystemExit(main())
```
2026-03-13 15:51:19 -07:00
Charley Cunningham
bc24017d64 Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
## Summary
- add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
control for who reviews approval requests
- route Smart Approvals guardian review through core for command
execution, file changes, managed-network approvals, MCP approvals, and
delegated/subagent approval flows
- expose guardian review in app-server with temporary unstable
`item/autoApprovalReview/{started,completed}` notifications carrying
`targetItemId`, `review`, and `action`
- update the TUI so Smart Approvals can be enabled from `/experimental`,
aligned with the matching `/approvals` mode, and surfaced clearly while
reviews are pending or resolved

## Runtime model
This PR does not introduce a new `approval_policy`.

Instead:
- `approval_policy` still controls when approval is needed
- `approvals_reviewer` controls who reviewable approval requests are
routed to:
  - `user`
  - `guardian_subagent`

`guardian_subagent` is a carefully prompted reviewer subagent that
gathers relevant context and applies a risk-based decision framework
before approving or denying the request.

The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
behavior keys off `approvals_reviewer`.

When Smart Approvals is enabled from the TUI, it also switches the
current `/approvals` settings to the matching Smart Approvals mode so
users immediately see guardian review in the active thread:
- `approval_policy = on-request`
- `approvals_reviewer = guardian_subagent`
- `sandbox_mode = workspace-write`

Users can still change `/approvals` afterward.

Config-load behavior stays intentionally narrow:
- plain `smart_approvals = true` in `config.toml` remains just the
rollout/UI gate and does not auto-set `approvals_reviewer`
- the deprecated `guardian_approval = true` alias migration does
backfill `approvals_reviewer = "guardian_subagent"` in the same scope
when that reviewer is not already configured there, so old configs
preserve their original guardian-enabled behavior

ARC remains a separate safety check. For MCP tool approvals, ARC
escalations now flow into the configured reviewer instead of always
bypassing guardian and forcing manual review.

## Config stability
The runtime reviewer override is stable, but the config-backed
app-server protocol shape is still settling.

- `thread/start`, `thread/resume`, and `turn/start` keep stable
`approvalsReviewer` overrides
- the config-backed `approvals_reviewer` exposure returned via
`config/read` (including profile-level config) is now marked
`[UNSTABLE]` / experimental in the app-server protocol until we are more
confident in that config surface

## App-server surface
This PR intentionally keeps the guardian app-server shape narrow and
temporary.

It adds generic unstable lifecycle notifications:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`

with payloads of the form:
- `{ threadId, turnId, targetItemId, review, action? }`

`review` is currently:
- `{ status, riskScore?, riskLevel?, rationale? }`
- where `status` is one of `inProgress`, `approved`, `denied`, or
`aborted`

`action` carries the guardian action summary payload from core when
available. This lets clients render temporary standalone pending-review
UI, including parallel reviews, even when the underlying tool item has
not been emitted yet.

These notifications are explicitly documented as `[UNSTABLE]` and
expected to change soon.

This PR does **not** persist guardian review state onto `thread/read`
tool items. The intended follow-up is to attach guardian review state to
the reviewed tool item lifecycle instead, which would improve
consistency with manual approvals and allow thread history / reconnect
flows to replay guardian review state directly.

## TUI behavior
- `/experimental` exposes the rollout gate as `Smart Approvals`
- enabling it in the TUI enables the feature and switches the current
session to the matching Smart Approvals `/approvals` mode
- disabling it in the TUI clears the persisted `approvals_reviewer`
override when appropriate and returns the session to default manual
review when the effective reviewer changes
- `/approvals` still exposes the reviewer choice directly
- the TUI renders:
- pending guardian review state in the live status footer, including
parallel review aggregation
  - resolved approval/denial state in history

## Scope notes
This PR includes the supporting core/runtime work needed to make Smart
Approvals usable end-to-end:
- shell / unified-exec / apply_patch / managed-network / MCP guardian
review
- delegated/subagent approval routing into guardian review
- guardian review risk metadata and action summaries for app-server/TUI
- config/profile/TUI handling for `smart_approvals`, `guardian_approval`
alias migration, and `approvals_reviewer`
- a small internal cleanup of delegated approval forwarding to dedupe
fallback paths and simplify guardian-vs-parent approval waiting (no
intended behavior change)

Out of scope for this PR:
- redesigning the existing manual approval protocol shapes
- persisting guardian review state onto app-server `ThreadItem`s
- delegated MCP elicitation auto-review (the current delegated MCP
guardian shim only covers the legacy `RequestUserInput` path)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-13 15:27:00 -07:00
Charley Cunningham
e3cbf913e8 Fix wait_agent expectations in core tests (#14637)
## Summary
- update stale core tool-spec expectations from `wait` to `wait_agent`
- update the prompt-caching tool-name assertion to match the renamed
tool
- fix the Bazel regressions introduced after #14631 renamed the
multi-agent wait tool

## Testing
- cargo test -p codex-core tools::spec::tests
- cargo test -p codex-core
suite::prompt_caching::prompt_tools_are_consistent_across_requests

Co-authored-by: Codex <noreply@openai.com>
2026-03-13 15:15:59 -07:00
pakrym-oai
cb7d8f45a1 Normalize MCP tool names to code-mode safe form (#14605)
Code mode doesn't allow `-` in names and it's better if function names
and code-mode names are the same.
2026-03-13 14:50:16 -07:00
Ahmed Ibrahim
36dfb84427 Stabilize multi-agent feature flag (#14622)
- make multi_agent stable and enabled by default
- update feature and tool-spec coverage to match the new default

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-13 14:38:15 -07:00
pakrym-oai
477a2dd345 Add code_mode_only feature (#14617)
Summary
- add the code_mode_only feature flag/config schema and wire its
dependency on code_mode
- update code mode tool descriptions to list nested tools with detailed
headers
- restrict available tools for prompt and exec descriptions when
code_mode_only is enabled and test the behavior

Testing
- Not run (not requested)
2026-03-13 13:30:19 -07:00
sayan-oai
9f2da5a9ce chore: clarify plugin + app copy in model instructions (#14541)
- clarify app mentions are in user messages
- clarify what it means for tools to be provided via `codex_apps` MCP
- add plugin descriptions (with basic sanitization) to top-level `##
Plugins` section alongside the corresponding plugin names
- explain that skills from plugins are prefixed with `plugin_name:` in
top-level `##Plugins` section

changes to more logically organize `Apps`, `Skills`, and `Plugins`
instructions will be in a separate PR, as that shuffles dev + user
instructions in ways that change tests broadly.

### Tests
confirmed in local rollout, some new tests.
2026-03-13 10:57:41 -07:00
Jack Mousseau
59b588b8ec Improve granular approval policy prompt (#14553) 2026-03-13 10:42:17 -07:00
Won Park
958f93f899 sending back imagaegencall response back to responseapi (#14558)
Sending back the ResponseItem::ImageGenerationCall as is, because it is
now supported from the API-side.
2026-03-13 17:29:19 +00:00
iceweasel-oai
6b3d82daca Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400)
## Summary
- launch Windows sandboxed children on a private desktop instead of
`Winsta0\Default`
- make private desktop the default while keeping
`windows.sandbox_private_desktop=false` as the escape hatch
- centralize process launch through the shared
`create_process_as_user(...)` path
- scope the private desktop ACL to the launching logon SID

## Why
Today sandboxed Windows commands run on the visible shared desktop. That
leaves an avoidable same-desktop attack surface for window interaction,
spoofing, and related UI/input issues. This change moves sandboxed
commands onto a dedicated per-launch desktop by default so the sandbox
no longer shares `Winsta0\Default` with the user session.

The implementation stays conservative on security with no silent
fallback back to `Winsta0\Default`

If private-desktop setup fails on a machine, users can still opt out
explicitly with `windows.sandbox_private_desktop=false`.

## Validation
- `cargo build -p codex-cli`
- elevated-path `codex exec` desktop-name probe returned
`CodexSandboxDesktop-*`
- elevated-path `codex exec` smoke sweep for shell commands, nested
`pwsh`, jobs, and hidden `notepad` launch
- unelevated-path full private-desktop compatibility sweep via `codex
exec` with `-c windows.sandbox=unelevated`
2026-03-13 10:13:39 -07:00
pakrym-oai
9c9867c9fa code mode: single line tool declarations (#14526)
## Summary
- render code mode tool declarations as single-line TypeScript snippets
- make the JSON schema renderer emit inline object shapes for these
declarations
- update code mode/spec expectations to match the new inline rendering

## Testing
- `just fmt`
- `cargo test -p codex-core render_json_schema_to_typescript`
- `cargo test -p codex-core code_mode_augments_`
- `cargo test -p codex-core --test all exports_all_tools_metadata --
--nocapture`
2026-03-13 10:08:34 -07:00
Ahmed Ibrahim
c7e847aaeb Add diagnostics for read_only_unless_trusted timeout flake (#14518)
## Summary
- add targeted diagnostic logging for the
read_only_unless_trusted_requires_approval scenarios in
approval_matrix_covers_all_modes
- add a scoped timeout buffer only for ro_unless_trusted write-file
scenarios: 1000ms -> 2000ms
- keep all other write-file scenarios at 1000ms

## Why
The last two main failures were both in codex-core::all
suite::approvals::approval_matrix_covers_all_modes with exit_code=124 in
the same scenario. This points to execution-time jitter in CI rather
than a semantic approval-policy mismatch.

## Notes
- This does not introduce any >5s timeout and does not
disable/quarantine tests.
- The timeout increase is tightly scoped to the single flaky path and
keeps the matrix deterministic under CI scheduling variance.
2026-03-12 23:51:03 -07:00
Jack Mousseau
7c7e267501 Simplify permissions available in request permissions tool (#14529) 2026-03-12 21:13:17 -07:00
Channing Conger
0daffe667a code_mode: Move exec params from runtime declarations to @pragma (#14511)
This change moves code_mode exec session settings out of the runtime API
and into an optional first-line pragma, so instead of calling runtime
helpers like set_yield_time() or set_max_output_tokens_per_exec_call(),
the model can write // @exec: {"yield_time_ms": ...,
"max_output_tokens": ...} at the top of the freeform exec source. Rust
now parses that pragma before building the source, validates it, and
passes the values directly in the exec start message to the code-mode
broker, which applies them at session start without any worker-runtime
mutation path. The @openai/code_mode module no longer exposes those
setter functions, the docs and grammar were updated to describe the
pragma form, and the existing code_mode tests were converted to use
pragma-based configuration instead.
2026-03-13 03:27:42 +00:00
alexsong-oai
1a363d5fcf Add plugin usage telemetry (#14531)
adding metrics including: 
* plugin used
* plugin installed/uninstalled
* plugin enabled/disabled
2026-03-12 19:22:30 -07:00
Jack Mousseau
b7dba72dbd Rename reject approval policy to granular (#14516) 2026-03-12 16:38:04 -07:00
Eric Traut
d32820ab07 Fix codex exec --profile handling (#14524)
PR #14005 introduced a regression whereby `codex exec --profile`
overrides were dropped when starting or resuming a thread. That causes
the thread to miss profile-scoped settings like
`model_instructions_file`.

This PR preserve the active profile in the thread start/resume config
overrides so the
app-server rebuild sees the same profile that exec resolved. 

Fixes #14515
2026-03-12 17:34:25 -06:00
Rasmus Rygaard
53d5972226 Reapply "Pass more params to compaction" (#14298) (#14521)
This reverts commit 8af97ce4b0.

Confirmed that this runs locally without the previous issues with tool
use
2026-03-12 23:27:21 +00:00
Anton Panasenko
651717323c feat(search_tool): gate search_tool on model supports_search_tool field (#14502) 2026-03-12 16:03:50 -07:00
pakrym-oai
a2546d5dff Expose code-mode tools through globals (#14517)
Summary
- make all code-mode tools accessible as globals so callers only need
`tools.<name>`
- rename text/image helpers and key globals (store, load, ALL_TOOLS,
etc.) to reflect the new shared namespace
- update the JS bridge, runners, descriptions, router, and tests to
follow the new API

Testing
- Not run (not requested)
2026-03-12 15:43:59 -07:00
Jack Mousseau
a314c7d3ae Decouple request permissions feature and tool (#14426) 2026-03-12 14:47:08 -07:00
pakrym-oai
04e14bdf23 Rename exec session IDs to cell IDs (#14510)
- Update the code-mode executor, wait handler, and protocol plumbing to
use cell IDs instead of session IDs for node communication
- Switch tool metadata, wait description, and suite tests to refer to
cell IDs so user-visible messages match the new terminology

**Testing**
- Not run (not requested)
2026-03-12 14:05:30 -07:00
pakrym-oai
dadffd27d4 Fix MCP tool calling (#14491)
Properly escape mcp tool names and make tools only available via
imports.
2026-03-12 13:38:52 -07:00
pakrym-oai
a5a4899d0c Skip nested tool call parallel test on Windows (#14505)
**Summary**
- disable the `code_mode_nested_tool_calls_can_run_in_parallel` test on
Windows where `exec_command` is unavailable

**Testing**
- Not run (not requested)
2026-03-12 13:32:11 -07:00