Commit Graph

836 Commits

Author SHA1 Message Date
Cooper Gamble
0c88f58519 [core] refresh inline compaction comments [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
bd6733a68c [core] simplify inline server-side compaction handling [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
46615cc195 [codex-core] Trim inline compaction coverage [ci changed_files]
- reduce the server-side compaction test matrix to the highest-signal cases
- add comments around the deferred checkpoint rewrite and inline/preflight split

Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
488b53bceb [codex-core] Fix inline compaction commit semantics [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
7065f6fa1d [codex-core] Preserve raw inline compaction event order [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
29ddeb71ad [codex-core] Fix inline compaction event ordering and token accounting [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
ab9a474c86 [codex-core] Preserve inline compaction checkpoint ordering [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
8da9b522f5 [codex-core] Preserve inline compaction turn prompt state [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
839143c6c4 [codex-core] Fix inline compaction checkpoint history layout [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
d9916b5e6d [codex-core] Fix inline compaction retry and model-switch preflight [ci changed_files]
Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
bdadb141f2 [core] Always inline OpenAI auto-compaction [ci changed_files]
Ignore compact_prompt for OpenAI inline auto-compaction, remove the legacy compat downgrade path, and keep /compact on the point-in-time endpoint. Also skip previous-model preflight remote compaction when inline server-side compaction is available.\n\nCo-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
c3a7189a0e [codex-core] Harden inline compaction checkpoint handling [ci changed_files]
Keep current-turn inputs in local inline compaction checkpoints and remember known backend incompatibilities after a compat downgrade so later turns skip the failed inline request path.

Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Cooper Gamble
ea2a4be3d8 [codex-core] Add feature-flagged server-side compaction [ci changed_files]
Move normal auto-compaction onto inline Responses API compaction behind a feature flag, keep the legacy path for manual and compatibility cases, and add observability plus integration coverage.

Co-authored-by: Codex <noreply@openai.com>
2026-03-11 02:48:35 +00:00
Ahmed Ibrahim
cc417c39a0 Split spawn_csv from multi_agent (#14282)
- make `spawn_csv` a standalone feature for CSV agent jobs
- keep `spawn_csv -> multi_agent` one-way and preserve restricted
subagent disable paths
2026-03-11 01:42:50 +00:00
Ahmed Ibrahim
5b10b93ba2 Add realtime start instructions config override (#14270)
- add `realtime_start_instructions` config support
- thread it into realtime context updates, schema, docs, and tests
2026-03-10 18:42:05 -07:00
pakrym-oai
24b8d443b8 Prefix code mode output with success or failure message and include error stack (#14272) 2026-03-10 18:33:52 -07:00
Ahmed Ibrahim
3f7cb03043 Stabilize websocket response.failed error delivery (#14017)
## What changed
- Drop failed websocket connections immediately after a terminal stream
error instead of awaiting a graceful close handshake before forwarding
the error to the caller.
- Keep the success path and the closed-connection guard behavior
unchanged.

## Why this fixes the flake
- The failing integration test waits for the second websocket stream to
surface the model error before issuing a follow-up request.
- On slower runners, the old error path awaited
`ws_stream.close().await` before sending the error downstream. If that
close handshake stalled, the test kept waiting for an error that had
already happened server-side and nextest timed it out.
- Dropping the failed websocket immediately makes the terminal error
observable right away and marks the session closed so the next request
reconnects cleanly instead of depending on a best-effort close
handshake.

## Code or test?
- This is a production logic fix in `codex-api`. The existing websocket
integration test already exercises the regression path.
2026-03-10 17:59:41 -07:00
pakrym-oai
37f51382fd Rename code mode tool to exec (#14254)
Summary
- update the code-mode handler, runner, instructions, and error text to
refer to the `exec` tool name everywhere that used to say `code_mode`
- ensure generated documentation strings and tool specs describe `exec`
and rely on the shared `PUBLIC_TOOL_NAME`
- refresh the suite tests so they invoke `exec` instead of the old name

Testing
- Not run (not requested)
2026-03-11 00:30:16 +00:00
Celia Chen
295b56bece chore: add a separate reject-policy flag for skill approvals (#14271)
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
2026-03-10 23:58:23 +00:00
pakrym-oai
18199d4e0e Add store/load support for code mode (#14259)
adds support for transferring state across code mode invocations.
2026-03-10 16:53:53 -07:00
Rasmus Rygaard
f8ef154a6b Pass more params to compaction (#14247)
Pass more params to /compact. This should give us parity with the
/responses endpoint to improve caching.

I'm torn about the MCP await. Blocking will give us parity but it seems
like we explicitly don't block on MCPs. Happy either way
2026-03-10 16:39:57 -07:00
pakrym-oai
8b33485302 Add code_mode output helpers for text and images (#14244)
Summary
- document how code-mode can import `output_text`/`output_image` and
ensure `add_content` stays compatible
- add a synthetic `@openai/code_mode` module that appends content items
and validates inputs
- cover the new behavior with integration tests for structured text and
image outputs

Testing
- Not run (not requested)
2026-03-10 16:25:27 -07:00
pakrym-oai
e791559029 Add model-controlled truncation for code mode results (#14258)
Summary
- document that `@openai/code_mode` exposes
`set_max_output_tokens_per_exec_call` and that `code_mode` truncates the
final Rust-side output when the budget is exceeded
- enforce the configured budget in the Rust tool runner, reusing
truncation helpers so text-only outputs follow the unified-exec wrapper
and mixed outputs still fit within the limit
- ensure the new behavior is covered by a code-mode integration test and
string spec update

Testing
- Not run (not requested)
2026-03-10 15:57:14 -07:00
pakrym-oai
c7e28cffab Add output schema to MCP tools and expose MCP tool results in code mode (#14236)
Summary
- drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
to keep MCP tooling focused on the final result shape
- wire the new schema definitions through code mode, context, handlers,
and spec modules so MCP tools serialize the exact output shape expected
by the model
- extend code mode tests to cover multiple MCP call scenarios and ensure
the serialized data matches the new schema
- refresh JS runner helpers and protocol models alongside the schema
changes

Testing
- Not run (not requested)
2026-03-10 15:25:19 -07:00
Won Park
28934762d0 unifying all image saves to /tmp to bug-proof (#14149)
image-gen feature will have the model saving to /tmp by default + at all
times
2026-03-10 15:13:12 -07:00
Ahmed Ibrahim
2895d3571b Add spawn_agent model overrides (#14160)
- add `model` and `reasoning_effort` to the `spawn_agent` schema so the
values pass through
- validate requested models against `model.model` and only check that
the selected model supports the requested reasoning effort

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-10 14:04:04 -07:00
pakrym-oai
e52afd28b0 Expose strongly-typed result for exec_command (#14183)
Summary
- document output types for the various tool handlers and registry so
the API exposes richer descriptions
- update unified execution helpers and client tests to align with the
new output metadata
- clean up unused helpers across tool dispatch paths

Testing
- Not run (not requested)
2026-03-10 09:54:34 -07:00
Eric Traut
b90921eba8 Fix release-mode integration test compiler failure (#13603)
Addresses #13586

This doesn't affect our CI scripts. It was user-reported.

Summary
- add `wiremock::ResponseTemplate` and `body_string_contains` imports
behind `#[cfg(not(debug_assertions))]` in
`codex-rs/core/tests/suite/view_image.rs` so release builds only pull
the helpers they actually use
2026-03-10 08:30:56 -06:00
Ahmed Ibrahim
aa6a57dfa2 Stabilize incomplete SSE retry test (#13879)
## What changed
- The retry test now uses the same streaming SSE test server used by
production-style tests instead of a wiremock sequence.
- The fixture is resolved via `find_resource!`, and the test asserts
that exactly two outbound requests were sent.

## Why this fixes the flake
- The old wiremock sequence approximated early-close behavior, but it
did not reproduce the same streaming semantics the real client sees.
- That meant the retry path depended on mock implementation details
instead of on the actual transport behavior we care about.
- Switching to the streaming SSE helper makes the test exercise the real
early-close/retry contract, and counting requests directly verifies that
we retried exactly once rather than merely hoping the sequence aligned.

## Scope
- Test-only change.
2026-03-09 22:34:44 -07:00
Ahmed Ibrahim
2e24be2134 Use realtime transcript for handoff context (#14132)
- collect input/output transcript deltas into active handoff transcript
state
- attach and clear that transcript on each handoff, and regenerate
schema/tests
2026-03-09 22:30:03 -07:00
Matthew Zeng
566e4cee4b [apps] Fix apps enablement condition. (#14011)
- [x] Fix apps enablement condition to check both the feature flag and
that the user is not an API key user.
2026-03-09 22:25:43 -07:00
xl-openai
0c33af7746 feat: support disabling bundled system skills (#13792)
Support disable bundled system skills with a config:

[skills.bundled]
enabled = false
2026-03-09 22:02:53 -07:00
pakrym-oai
710682598d Export tools module into code mode runner (#14167)
**Summary**
- allow `code_mode` to pass enabled tools metadata to the runner and
expose them via `tools.js`
- import tools inside JavaScript rather than relying only on globals or
proxies for nested tool calls
- update specs, docs, and tests to exercise the new bridge and explain
the tooling changes

**Testing**
- Not run (not requested)
2026-03-09 21:59:09 -07:00
pakrym-oai
d71e042694 Enforce single tool output type in codex handlers (#14157)
We'll need to associate output schema with each tool. Each tool can only
have on output type.
2026-03-09 21:49:44 -07:00
pakrym-oai
da616136cc Add code_mode experimental feature (#13418)
A much narrower and more isolated (no node features) version of js_repl
2026-03-09 20:56:27 -07:00
Dylan Hurd
6da84efed8 feat(approvals) RejectConfig for request_permissions (#14118)
## Summary
We need to support allowing request_permissions calls when using
`Reject` policy

<img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
40 PM"
src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
/>

Note that this is a backwards-incompatible change for Reject policy. I'm
not sure if we need to add a default based on our current use/setup

## Testing
- [x] Added tests
- [x] Tested locally
2026-03-09 18:16:54 -07:00
Dylan Hurd
c1defcc98c fix(core) RequestPermissions + ApplyPatch (#14055)
## Summary
The apply_patch tool should also respect AdditionalPermissions

## Testing
- [x] Added unit tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-09 16:11:19 -07:00
Dylan Hurd
d241dc598c feat(core) Persist request_permission data across turns (#14009)
## Summary
request_permissions flows should support persisting results for the
session.

Open Question: Still deciding if we need within-turn approvals - this
adds complexity but I could see it being useful

## Testing
- [x] Updated unit tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-09 14:36:38 -07:00
Ahmed Ibrahim
44ecc527cb Stabilize RMCP streamable HTTP readiness tests (#13880)
## What changed
- The RMCP streamable HTTP tests now wait for metadata and tool
readiness before issuing tool calls.
- OAuth state is isolated per test home.
- The helper server startup path now uses bounded bind retries so
transient `AddrInUse` collisions do not fail the test immediately.

## Why this fixes the flake
- The old tests could begin issuing tool requests before the helper
server had finished advertising its metadata and tools, so the first
request sometimes raced the server startup sequence.
- On top of that, shared OAuth state and occasional bind collisions on
CI runners introduced cross-test environmental noise unrelated to the
functionality under test.
- Readiness polling makes the client wait for an observable “server is
ready” signal, while isolated state and bounded bind retries remove
external contention that was causing intermittent failures.

## Scope
- Test-only change.
2026-03-09 19:52:55 +00:00
Dylan Hurd
0334ddeccb fix(ci) Faster shell_command::unicode_output test (#14114)
## Summary
Alternative to #14061 - we need to use a child process on windows to
correctly validate Powershell behavior.

## Testing
- [x] These are tests
2026-03-09 19:09:56 +00:00
Ahmed Ibrahim
fefd01b9e0 Stabilize resumed rollout messages (#14060)
## What changed
- add a bounded `resume_until_initial_messages` helper in
`core/tests/suite/resume.rs`
- retry the resume call until `initial_messages` contains the fully
persisted final turn shape before asserting

## Why this fixes flakiness
The old test resumed once immediately after `TurnComplete` and sometimes
read rollout state before the final turn had been persisted. That made
the assertion race persistence timing instead of checking the resumed
message shape. The new helper polls for up to two seconds in 10ms steps
and only asserts once the expected message sequence is actually present,
so the test waits for the real readiness condition instead of depending
on a lucky timing window.

## Scope
- test-only
- no production logic change
2026-03-09 11:48:13 -07:00
Ahmed Ibrahim
75e608343c Stabilize realtime startup context tests (#13876)
## What changed
- The realtime startup-context tests no longer assume the interesting
websocket payload is always `connection 1 / request 0`.
- Instead, they now wait for the first outbound websocket request that
actually carries `session.instructions`, regardless of which websocket
connection won the accept-order race on the runner.
- The env-key fallback test stays serialized because it mutates process
environment.

## Why this fixes the flake
- The old test synchronized on the mirrored `session.updated` client
event and then inspected a fixed websocket slot.
- On CI, the response websocket and the realtime websocket can race each
other during startup. When the response websocket wins that race, the
fixed slot can contain `response.create` instead of the
startup-context-bearing `session.update` request the test actually cares
about.
- That made the test fail nondeterministically by inspecting the wrong
request, or by timing out waiting on a secondary event even though the
real outbound request path was correct.
- Waiting directly on the first request whose payload includes
`session.instructions` removes both ordering assumptions and makes the
assertion line up with the actual contract under test.
- Separately, serializing the environment-mutating fallback case
prevents unrelated tests from seeing partially updated auth state.

## Scope
- Test-only change.
2026-03-09 10:57:43 -07:00
Charley Cunningham
f23fcd6ced guardian initial feedback / tweaks (#13897)
## Summary
- remove the remaining model-visible guardian-specific `on-request`
prompt additions so enabling the feature does not change the main
approval-policy instructions
- neutralize user-facing guardian wording to talk about automatic
approval review / approval requests rather than a second reviewer or
only sandbox escalations
- tighten guardian retry-context handling so agent-authored
`justification` stays in the structured action JSON and is not also
injected as raw retry context
- simplify guardian review plumbing in core by deleting dead
prompt-append paths and trimming some request/transcript setup code

## Notable Changes
- delete the dead `permissions/approval_policy/guardian.md` append path
and stop threading `guardian_approval_enabled` through model-facing
developer-instruction builders
- rename the experimental feature copy to `Automatic approval review`
and update the `/experimental` snapshot text accordingly
- make approval-review status strings generic across shell, patch,
network, and MCP review types
- forward real sandbox/network retry reasons for shell and unified-exec
guardian review, but do not pass agent-authored justification as raw
retry context
- simplify `guardian.rs` by removing the one-field request wrapper,
deduping reasoning-effort selection, and cleaning up transcript entry
collection

## Testing
- `just fmt`
- full validation left to CI

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-09 09:25:24 -07:00
Jack Mousseau
e6b93841c5 Add request permissions tool (#13092)
Adds a built-in `request_permissions` tool and wires it through the
Codex core, protocol, and app-server layers so a running turn can ask
the client for additional permissions instead of relying on a static
session policy.

The new flow emits a `RequestPermissions` event from core, tracks the
pending request by call ID, forwards it through app-server v2 as an
`item/permissions/requestApproval` request, and resumes the tool call
once the client returns an approved subset of the requested permission
profile.
2026-03-08 20:23:06 -07:00
Dylan Hurd
f41b1638c9 fix(core) patch otel test (#14014)
## Summary
This test was missing the turn completion event in the responses stream,
so it was hanging. This PR fixes the issue

## Testing
- [x] This does update the test
2026-03-08 19:06:30 -07:00
Celia Chen
340f9c9ecb app-server: include experimental skill metadata in exec approval requests (#13929)
## Summary

This change surfaces skill metadata on command approval requests so
app-server clients can tell when an approval came from a skill script
and identify the originating `SKILL.md`.

- add `skill_metadata` to exec approval events in the shared protocol
- thread skill metadata through core shell escalation and delegated
approval handling for skill-triggered approvals
- expose the field in app-server v2 as experimental `skillMetadata`
- regenerate the JSON/TypeScript schemas and cover the new field in
protocol, transport, core, and TUI tests

## Why

Skill-triggered approvals already carry skill context inside core, but
app-server clients could not see which skill caused the prompt. Sending
the skill metadata with the approval request makes it possible for
clients to present better approval UX and connect the prompt back to the
relevant skill definition.


## example event in app-server-v2
verified that we see this event when experimental api is on:
```
< {
<   "id": 11,
<   "method": "item/commandExecution/requestApproval",
<   "params": {
<     "additionalPermissions": {
<       "fileSystem": null,
<       "macos": {
<         "accessibility": false,
<         "automations": {
<           "bundle_ids": [
<             "com.apple.Notes"
<           ]
<         },
<         "calendar": false,
<         "preferences": "read_only"
<       },
<       "network": null
<     },
<     "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563",
<     "availableDecisions": [
<       "accept",
<       "acceptForSession",
<       "cancel"
<     ],
<     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<     "commandActions": [
<       {
<         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<         "type": "unknown"
<       }
<     ],
<     "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes",
<     "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy",
<     "skillMetadata": {
<       "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md"
<     },
<     "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69",
<     "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f"
<   }
< }`
```

& verified that this is the event when experimental api is off:
```
< {
<   "id": 13,
<   "method": "item/commandExecution/requestApproval",
<   "params": {
<     "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0",
<     "availableDecisions": [
<       "accept",
<       "acceptForSession",
<       "cancel"
<     ],
<     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<     "commandActions": [
<       {
<         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<         "type": "unknown"
<       }
<     ],
<     "cwd": "/Users/celia/code/codex/codex-rs",
<     "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt",
<     "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4",
<     "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a"
<   }
< }
```
2026-03-08 18:07:46 -07:00
Ahmed Ibrahim
1f150eda8b Stabilize shell serialization tests (#13877)
## What changed
- The duration-recording fixture sleep was reduced from a large
artificial delay to `0.2s`, and the assertion floor was lowered to
`0.1s`.
- The shell tool fixtures now force `login = false` so they do not
invoke login-shell startup paths.

## Why this fixes the flake
- The old tests were paying for two kinds of noise that had nothing to
do with the feature being validated: oversized sleep time and variable
shell initialization cost.
- Login shells can pick up runner-specific startup files and incur
inconsistent startup latency.
- The test only needs to prove that we record a nontrivial duration and
preserve shell output. A shorter fixture delay plus a non-login shell
keeps that coverage while removing runner-dependent wall-clock variance.

## Scope
- Test-only change.
2026-03-08 13:37:41 -07:00
Michael Bolin
bf5c2f48a5 seatbelt: honor split filesystem sandbox policies (#13448)
## Why

After `#13440` and `#13445`, macOS Seatbelt policy generation was still
deriving filesystem and network behavior from the legacy `SandboxPolicy`
projection.

That projection loses explicit unreadable carveouts and conflates split
network decisions, so the generated Seatbelt policy could still be wider
than the split policy that Codex had already computed.

## What changed

- added Seatbelt entrypoints that accept `FileSystemSandboxPolicy` and
`NetworkSandboxPolicy` directly
- built read and write policy stanzas from access roots plus excluded
subpaths so explicit unreadable carveouts survive into the generated
Seatbelt policy
- switched network policy generation to consult `NetworkSandboxPolicy`
directly
- failed closed when managed-network or proxy-constrained sessions do
not yield usable loopback proxy endpoints
- updated the macOS callers and test helpers that now need to carry the
split policies explicitly

## Verification

- added regression coverage in `core/src/seatbelt.rs` for unreadable
carveouts under both full-disk and scoped-readable policies
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13448).
* #13453
* #13452
* #13451
* #13449
* __->__ #13448
* #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-08 00:35:19 +00:00
Dylan Hurd
92f7541624 fix(ci) fix guardian ci (#13911)
## Summary
#13910 was merged with some unused imports, let's fix this

## Testing
- [x] Let's make sure CI is green

---------

Co-authored-by: Charles Cunningham <ccunningham@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-07 23:34:56 +00:00
Charley Cunningham
e84ee33cc0 Add guardian approval MVP (#13692)
## Summary
- add the guardian reviewer flow for `on-request` approvals in command,
patch, sandbox-retry, and managed-network approval paths
- keep guardian behind `features.guardian_approval` instead of exposing
a public `approval_policy = guardian` mode
- route ordinary `OnRequest` approvals to the guardian subagent when the
feature is enabled, without changing the public approval-mode surface

## Public model
- public approval modes stay unchanged
- guardian is enabled via `features.guardian_approval`
- when that feature is on, `approval_policy = on-request` keeps the same
approval boundaries but sends those approval requests to the guardian
reviewer instead of the user
- `/experimental` only persists the feature flag; it does not rewrite
`approval_policy`
- CLI and app-server no longer expose a separate `guardian` approval
mode in this PR

## Guardian reviewer
- the reviewer runs as a normal subagent and reuses the existing
subagent/thread machinery
- it is locked to a read-only sandbox and `approval_policy = never`
- it does not inherit user/project exec-policy rules
- it prefers `gpt-5.4` when the current provider exposes it, otherwise
falls back to the parent turn's active model
- it fail-closes on timeout, startup failure, malformed output, or any
other review error
- it currently auto-approves only when `risk_score < 80`

## Review context and policy
- guardian mirrors `OnRequest` approval semantics rather than
introducing a separate approval policy
- explicit `require_escalated` requests follow the same approval surface
as `OnRequest`; the difference is only who reviews them
- managed-network allowlist misses that enter the approval flow are also
reviewed by guardian
- the review prompt includes bounded recent transcript history plus
recent tool call/result evidence
- transcript entries and planned-action strings are truncated with
explicit `<guardian_truncated ... />` markers so large payloads stay
bounded
- apply-patch reviews include the full patch content (without
duplicating the structured `changes` payload)
- the guardian request layout is snapshot-tested using the same
model-visible Responses request formatter used elsewhere in core

## Guardian network behavior
- the guardian subagent inherits the parent session's managed-network
allowlist when one exists, so it can use the same approved network
surface while reviewing
- exact session-scoped network approvals are copied into the guardian
session with protocol/port scope preserved
- those copied approvals are now seeded before the guardian's first turn
is submitted, so inherited approvals are available during any immediate
review-time checks

## Out of scope / follow-ups
- the sandbox-permission validation split was pulled into a separate PR
and is not part of this diff
- a future follow-up can enable `serde_json` preserve-order in
`codex-core` and then simplify the guardian action rendering further

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-07 05:40:10 -08:00