Commit Graph

2695 Commits

Author SHA1 Message Date
Michael Bolin
bf5c2f48a5 seatbelt: honor split filesystem sandbox policies (#13448)
## Why

After `#13440` and `#13445`, macOS Seatbelt policy generation was still
deriving filesystem and network behavior from the legacy `SandboxPolicy`
projection.

That projection loses explicit unreadable carveouts and conflates split
network decisions, so the generated Seatbelt policy could still be wider
than the split policy that Codex had already computed.

## What changed

- added Seatbelt entrypoints that accept `FileSystemSandboxPolicy` and
`NetworkSandboxPolicy` directly
- built read and write policy stanzas from access roots plus excluded
subpaths so explicit unreadable carveouts survive into the generated
Seatbelt policy
- switched network policy generation to consult `NetworkSandboxPolicy`
directly
- failed closed when managed-network or proxy-constrained sessions do
not yield usable loopback proxy endpoints
- updated the macOS callers and test helpers that now need to carry the
split policies explicitly

## Verification

- added regression coverage in `core/src/seatbelt.rs` for unreadable
carveouts under both full-disk and scoped-readable policies
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13448).
* #13453
* #13452
* #13451
* #13449
* __->__ #13448
* #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-08 00:35:19 +00:00
Dylan Hurd
92f7541624 fix(ci) fix guardian ci (#13911)
## Summary
#13910 was merged with some unused imports, let's fix this

## Testing
- [x] Let's make sure CI is green

---------

Co-authored-by: Charles Cunningham <ccunningham@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-07 23:34:56 +00:00
Dylan Hurd
1c888709b5 fix(core) rm guardian snapshot test (#13910)
## Summary
This test is good, but flakey and we have to figure out some bazel build
issues. Let's get CI back go green and then land a stable version!

## Test Summary
- [x] CI Passes
2026-03-07 14:28:54 -08:00
Charley Cunningham
e84ee33cc0 Add guardian approval MVP (#13692)
## Summary
- add the guardian reviewer flow for `on-request` approvals in command,
patch, sandbox-retry, and managed-network approval paths
- keep guardian behind `features.guardian_approval` instead of exposing
a public `approval_policy = guardian` mode
- route ordinary `OnRequest` approvals to the guardian subagent when the
feature is enabled, without changing the public approval-mode surface

## Public model
- public approval modes stay unchanged
- guardian is enabled via `features.guardian_approval`
- when that feature is on, `approval_policy = on-request` keeps the same
approval boundaries but sends those approval requests to the guardian
reviewer instead of the user
- `/experimental` only persists the feature flag; it does not rewrite
`approval_policy`
- CLI and app-server no longer expose a separate `guardian` approval
mode in this PR

## Guardian reviewer
- the reviewer runs as a normal subagent and reuses the existing
subagent/thread machinery
- it is locked to a read-only sandbox and `approval_policy = never`
- it does not inherit user/project exec-policy rules
- it prefers `gpt-5.4` when the current provider exposes it, otherwise
falls back to the parent turn's active model
- it fail-closes on timeout, startup failure, malformed output, or any
other review error
- it currently auto-approves only when `risk_score < 80`

## Review context and policy
- guardian mirrors `OnRequest` approval semantics rather than
introducing a separate approval policy
- explicit `require_escalated` requests follow the same approval surface
as `OnRequest`; the difference is only who reviews them
- managed-network allowlist misses that enter the approval flow are also
reviewed by guardian
- the review prompt includes bounded recent transcript history plus
recent tool call/result evidence
- transcript entries and planned-action strings are truncated with
explicit `<guardian_truncated ... />` markers so large payloads stay
bounded
- apply-patch reviews include the full patch content (without
duplicating the structured `changes` payload)
- the guardian request layout is snapshot-tested using the same
model-visible Responses request formatter used elsewhere in core

## Guardian network behavior
- the guardian subagent inherits the parent session's managed-network
allowlist when one exists, so it can use the same approved network
surface while reviewing
- exact session-scoped network approvals are copied into the guardian
session with protocol/port scope preserved
- those copied approvals are now seeded before the guardian's first turn
is submitted, so inherited approvals are available during any immediate
review-time checks

## Out of scope / follow-ups
- the sandbox-permission validation split was pulled into a separate PR
and is not part of this diff
- a future follow-up can enable `serde_json` preserve-order in
`codex-core` and then simplify the guardian action rendering further

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-07 05:40:10 -08:00
jif-oai
cf143bf71e feat: simplify DB further (#13771) 2026-03-07 03:48:36 -08:00
Michael Bolin
5ceff6588e safety: honor filesystem policy carveouts in apply_patch (#13445)
## Why

`apply_patch` safety approval was still checking writable paths through
the legacy `SandboxPolicy` projection.

That can hide explicit `none` carveouts when a split filesystem policy
projects back to compatibility `ExternalSandbox`, which leaves one more
approval path that can auto-approve writes inside paths that are
intentionally blocked.

## What changed

- passed `turn.file_system_sandbox_policy` into `assess_patch_safety`
- changed writable-path checks to derive effective access from
`FileSystemSandboxPolicy` instead of the legacy `SandboxPolicy`
- made those checks reject explicit unreadable roots before considering
broad write access or writable roots
- added regression coverage showing that an `ExternalSandbox`
compatibility projection still asks for approval when the split
filesystem policy blocks a subpath

## Verification

- `cargo test -p codex-core safety::tests::`
- `cargo test -p codex-core test_sandbox_config_parsing`
- `cargo clippy -p codex-core --all-targets -- -D warnings`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13445).
* #13453
* #13452
* #13451
* #13449
* #13448
* __->__ #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-07 08:01:08 +00:00
Celia Chen
b0ce16c47a fix(core): respect reject policy by approval source for skill scripts (#13816)
## Summary
- distinguish reject-policy handling for prefix-rule approvals versus
sandbox approvals in Unix shell escalation
- keep prompting for skill-script execution when `rules=true` but
`sandbox_approval=false`, instead of denying the command up front
- add regression coverage for both skill-script reject-policy paths in
`codex-rs/core/tests/suite/skill_approval.rs`
2026-03-06 21:43:14 -08:00
Michael Bolin
22ac6b9aaa sandboxing: plumb split sandbox policies through runtime (#13439)
## Why

`#13434` introduces split `FileSystemSandboxPolicy` and
`NetworkSandboxPolicy`, but the runtime still made most execution-time
sandbox decisions from the legacy `SandboxPolicy` projection.

That projection loses information about combinations like unrestricted
filesystem access with restricted network access. In practice, that
means the runtime can choose the wrong platform sandbox behavior or set
the wrong network-restriction environment for a command even when config
has already separated those concerns.

This PR carries the split policies through the runtime so sandbox
selection, process spawning, and exec handling can consult the policy
that actually matters.

## What changed

- threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through
`TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state,
unified exec, and app-server exec overrides
- updated sandbox selection in `core/src/sandboxing/mod.rs` and
`core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus
`NetworkSandboxPolicy`, rather than inferring behavior only from the
legacy `SandboxPolicy`
- updated process spawning in `core/src/spawn.rs` and the platform
wrappers to use `NetworkSandboxPolicy` when deciding whether to set
`CODEX_SANDBOX_NETWORK_DISABLED`
- kept additional-permissions handling and legacy `ExternalSandbox`
compatibility projections aligned with the split policies, including
explicit user-shell execution and Windows restricted-token routing
- updated callers across `core`, `app-server`, and `linux-sandbox` to
pass the split policies explicitly

## Verification

- added regression coverage in `core/tests/suite/user_shell_cmd.rs` to
verify `RunUserShellCommand` does not inherit
`CODEX_SANDBOX_NETWORK_DISABLED` from the active turn
- added coverage in `core/src/exec.rs` for Windows restricted-token
sandbox selection when the legacy projection is `ExternalSandbox`
- updated Linux sandbox coverage in
`linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy
exec path
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439).
* #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* #13440
* __->__ #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-07 02:30:21 +00:00
viyatb-oai
25fa974166 fix: support managed network allowlist controls (#12752)
## Summary
- treat `requirements.toml` `allowed_domains` and `denied_domains` as
managed network baselines for the proxy
- in restricted modes by default, build the effective runtime policy
from the managed baseline plus user-configured allowlist and denylist
entries, so common hosts can be pre-approved without blocking later user
expansion
- add `experimental_network.managed_allowed_domains_only = true` to pin
the effective allowlist to managed entries, ignore user allowlist
additions, and hard-deny non-managed domains without prompting
- apply `managed_allowed_domains_only` anywhere managed network
enforcement is active, including full access, while continuing to
respect denied domains from all sources
- add regression coverage for merged-baseline behavior, managed-only
behavior, and full-access managed-only enforcement

## Behavior
Assuming `requirements.toml` defines both
`experimental_network.allowed_domains` and
`experimental_network.denied_domains`.

### Default mode
- By default, the effective allowlist is
`experimental_network.allowed_domains` plus user or persisted allowlist
additions.
- By default, the effective denylist is
`experimental_network.denied_domains` plus user or persisted denylist
additions.
- Allowlist misses can go through the network approval flow.
- Explicit denylist hits and local or private-network blocks are still
hard-denied.
- When `experimental_network.managed_allowed_domains_only = true`, only
managed `allowed_domains` are respected, user allowlist additions are
ignored, and non-managed domains are hard-denied without prompting.
- Denied domains continue to be respected from all sources.

### Full access
- With managed requirements present, the effective allowlist is pinned
to `experimental_network.allowed_domains`.
- With managed requirements present, the effective denylist is pinned to
`experimental_network.denied_domains`.
- There is no allowlist-miss approval path in full access.
- Explicit denylist hits are hard-denied.
- `experimental_network.managed_allowed_domains_only = true` now also
applies in full access, so managed-only behavior remains in effect
anywhere managed network enforcement is active.
2026-03-06 17:52:54 -08:00
viyatb-oai
5deaf9409b fix: avoid invoking git before project trust is established (#13804)
## Summary
- resolve trust roots by inspecting `.git` entries on disk instead of
spawning `git rev-parse --git-common-dir`
- keep regular repo and linked-worktree trust inheritance behavior
intact
- add a synthetic regression test that proves worktree trust resolution
works without a real git command

## Testing
- `just fmt`
- `cargo test -p codex-core resolve_root_git_project_for_trust`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
- `cargo test -p codex-core` (fails in this environment on unrelated
managed-config `DangerFullAccess` tests in `codex::tests`,
`tools::js_repl::tests`, and `unified_exec::tests`)
2026-03-06 17:46:23 -08:00
Ruslan Nigmatullin
e9bd8b20a1 app-server: Add streaming and tty/pty capabilities to command/exec (#13640)
* Add an ability to stream stdin, stdout, and stderr
* Streaming of stdout and stderr has a configurable cap for total amount
of transmitted bytes (with an ability to disable it)
* Add support for overriding environment variables
* Add an ability to terminate running applications (using
`command/exec/terminate`)
* Add TTY/PTY support, with an ability to resize the terminal (using
`command/exec/resize`)
2026-03-06 17:30:17 -08:00
Rohan Mehta
61098c7f51 Allow full web search tool config (#13675)
Previously, we could only configure whether web search was on/off.

This PR enables sending along a web search config, which includes all
the stuff responsesapi supports: filters, location, etc.
2026-03-07 00:50:50 +00:00
Celia Chen
8b81284975 fix(core): skip exec approval for permissionless skill scripts (#13791)
## Summary

- Treat skill scripts with no permission profile, or an explicitly empty
one, as permissionless and run them with the turn's existing sandbox
instead of forcing an exec approval prompt.
- Keep the approval flow unchanged for skills that do declare additional
permissions.
- Update the skill approval tests to assert that permissionless skill
scripts do not prompt on either the initial run or a rerun.

## Why

Permissionless skills should inherit the current turn sandbox directly.
Prompting for exec approval in that case adds friction without granting
any additional capability.
2026-03-06 16:40:41 -08:00
xl-openai
0243734300 feat: Add curated plugin marketplace + Metadata Cleanup. (#13712)
1. Add a synced curated plugin marketplace and include it in marketplace
discovery.
2. Expose optional plugin.json interface metadata in plugin/list
3. Tighten plugin and marketplace path handling using validated absolute
paths.
4. Let manifests override skill, MCP, and app config paths.
5. Restrict plugin enablement/config loading to the user config layer so
plugin enablement is at global level
2026-03-06 19:39:35 -05:00
Owen Lin
289ed549cf chore(otel): rename OtelManager to SessionTelemetry (#13808)
## Summary
This is a purely mechanical refactor of `OtelManager` ->
`SessionTelemetry` to better convey what the struct is doing. No
behavior change.

## Why

`OtelManager` ended up sounding much broader than what this type
actually does. It doesn't manage OTEL globally; it's the session-scoped
telemetry surface for emitting log/trace events and recording metrics
with consistent session metadata (`app_version`, `model`, `slug`,
`originator`, etc.).

`SessionTelemetry` is a more accurate name, and updating the call sites
makes that boundary a lot easier to follow.

## Validation

- `just fmt`
- `cargo test -p codex-otel`
- `cargo test -p codex-core`
2026-03-06 16:23:30 -08:00
Ahmed Ibrahim
a11c59f634 Add realtime startup context override (#13796)
- add experimental_realtime_ws_startup_context to override or disable
realtime websocket startup context
- preserve generated startup context when unset and cover the new
override paths in tests
2026-03-06 16:00:30 -08:00
Michael Bolin
f82678b2a4 config: add initial support for the new permission profile config language in config.toml (#13434)
## Why

`SandboxPolicy` currently mixes together three separate concerns:

- parsing layered config from `config.toml`
- representing filesystem sandbox state
- carrying basic network policy alongside filesystem choices

That makes the existing config awkward to extend and blocks the new TOML
proposal where `[permissions]` becomes a table of named permission
profiles selected by `default_permissions`. (The idea is that if
`default_permissions` is not specified, we assume the user is opting
into the "traditional" way to configure the sandbox.)

This PR adds the config-side plumbing for those profiles while still
projecting back to the legacy `SandboxPolicy` shape that the current
macOS and Linux sandbox backends consume.

It also tightens the filesystem profile model so scoped entries only
exist for `:project_roots`, and so nested keys must stay within a
project root instead of using `.` or `..` traversal.

This drops support for the short-lived `[permissions.network]` in
`config.toml` because now that would be interpreted as a profile named
`network` within `[permissions]`.

## What Changed

- added `PermissionsToml`, `PermissionProfileToml`,
`FilesystemPermissionsToml`, and `FilesystemPermissionToml` so config
can parse named profiles under `[permissions.<profile>.filesystem]`
- added top-level `default_permissions` selection, validation for
missing or unknown profiles, and compilation from a named profile into
split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` values
- taught config loading to choose between the legacy `sandbox_mode` path
and the profile-based path without breaking legacy users
- introduced `codex-protocol::permissions` for the split filesystem and
network sandbox types, and stored those alongside the legacy projected
`sandbox_policy` in runtime `Permissions`
- modeled `FileSystemSpecialPath` so only `ProjectRoots` can carry a
nested `subpath`, matching the intended config syntax instead of
allowing invalid states for other special paths
- restricted scoped filesystem maps to `:project_roots`, with validation
that nested entries are non-empty descendant paths and cannot use `.` or
`..` to escape the project root
- kept existing runtime consumers working by projecting
`FileSystemSandboxPolicy` back into `SandboxPolicy`, with an explicit
error for profiles that request writes outside the workspace root
- loaded proxy settings from top-level `[network]`
- regenerated `core/config.schema.json`

## Verification

- added config coverage for profile deserialization,
`default_permissions` selection, top-level `[network]` loading, network
enablement, rejection of writes outside the workspace root, rejection of
nested entries for non-`:project_roots` special paths, and rejection of
parent-directory traversal in `:project_roots` maps
- added protocol coverage for the legacy bridge rejecting non-workspace
writes

## Docs

- update the Codex config docs on developers.openai.com/codex to
document named `[permissions.<profile>]` entries, `default_permissions`,
scoped `:project_roots` syntax, the descendant-path restriction for
nested `:project_roots` entries, and top-level `[network]` proxy
configuration






---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13434).
* #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* #13440
* #13439
* __->__ #13434
2026-03-06 15:39:13 -08:00
Curtis 'Fjord' Hawthorne
d6c8186195 Clarify js_repl binding reuse guidance (#13803)
## Summary

Clarify the `js_repl` prompt guidance around persistent bindings and
redeclaration recovery.

This updates the generated `js_repl` instructions in
`core/src/project_doc.rs` to prefer this order when a name is already
bound:

1. Reuse the existing binding
2. Reassign a previously declared `let`
3. Pick a new descriptive name
4. Use `{ ... }` only for short-lived scratch scope
5. Reset the kernel only when a clean state is actually needed

The prompt now also explicitly warns against wrapping an entire cell in
block scope when the goal is to reuse names across later cells.

## Why

The previous wording still left too much room for low-value workarounds
like whole-cell block wrapping. In downstream browser rollouts, that
pattern was adding tokens and preventing useful state reuse across
`js_repl` cells.

This change makes the preferred behavior more explicit without changing
runtime semantics.

## Scope

- Prompt/documentation change only
- No runtime behavior changes
- Updates the matching string-backed `project_doc` tests
2026-03-06 15:19:06 -08:00
Ruslan Nigmatullin
5b04cc657f utils/pty: add streaming spawn and terminal sizing primitives (#13695)
Enhance pty utils:
* Support closing stdin
* Separate stderr and stdout streams to allow consumers differentiate them
* Provide compatibility helper to merge both streams back into combined one
* Support specifying terminal size for pty, including on-demand resizes while process is already running
* Support terminating the process while still consuming its outputs
2026-03-06 15:13:12 -08:00
Michael Bolin
488875f24d fix: move unit tests in codex-rs/core/src/codex.rs into their own file (#13783)
This is analogous to https://github.com/openai/codex/pull/13780.
2026-03-06 11:56:49 -08:00
Michael Bolin
39869f7443 fix: move unit tests in codex-rs/core/src/config/mod.rs into their own file (#13780)
At over 7,000 lines, `codex-rs/core/src/config/mod.rs` was getting a bit
unwieldy.

This PR does the same type of move as
https://github.com/openai/codex/pull/12957 to put unit tests in their
own file, though I decided `config_tests.rs` is a more intuitive name
than `mod_tests.rs`.

Ultimately, I'll codemod the rest of the codebase to follow suit, but I
want to do it in stages to reduce merge conflicts for people.
2026-03-06 11:21:58 -08:00
sayan-oai
8a54d3caaa feat: structured plugin parsing (#13711)
#### What

Add structured `@plugin` parsing and TUI support for plugin mentions.

- Core: switch from plain-text `@display_name` parsing to structured
`plugin://...` mentions via `UserInput::Mention` and
`[$...](plugin://...)` links in text, same pattern as apps/skills.
- TUI: add plugin mention popup, autocomplete, and chips when typing
`$`. Load plugin capability summaries and feed them into the composer;
plugin mentions appear alongside skills and apps.
- Generalize mention parsing to a sigil parameter, still defaults to `$`

<img width="797" height="119" alt="image"
src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
/>

Builds on #13510. Currently clients have to build their own `id` via
`plugin@marketplace` and filter plugins to show by `enabled`, but we
will add `id` and `available` as fields returned from `plugin/list`
soon.

####Tests

Added tests, verified locally.
2026-03-06 11:08:36 -08:00
jif-oai
0e41a5c4a8 chore: improve DB flushing (#13620)
This branch:
* Avoid flushing DB when not necessary
* Filter events for which we perfom an `upsert` into the DB
* Add a dedicated update function of the `thread:updated_at` that is
lighter

This should significantly reduce the DB lock contention. If it is not
sufficient, we can de-sync the flush of the DB for `updated_at`
2026-03-06 19:58:14 +01:00
Owen Lin
3449e00bc9 feat(otel, core): record turn TTFT and TTFM metrics in codex-core (#13630)
### Summary
This adds turn-level latency metrics for the first model output and the
first completed agent message.
- `codex.turn.ttft.duration_ms` starts at turn start and records on the
first output signal we see from the model. That includes normal
assistant text, reasoning deltas, and non-text outputs like tool-call
items.
- `codex.turn.ttfm.duration_ms` also starts at turn start, but it
records when the first agent message finishes streaming rather than when
its first delta arrives.

### Implementation notes
The timing is tracked in codex-core, not app-server, so the definition
stays consistent across CLI, TUI, and app-server clients.

I reused the existing turn lifecycle boundary that already drives
`codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn
state, and record each metric once per turn.

I also wired the new metric names into the OTEL runtime metrics summary
so they show up in the same in-memory/debug snapshot path as the
existing timing metrics.
2026-03-06 10:23:48 -08:00
Charley Cunningham
cb1a182bbe Clarify sandbox permission override helper semantics (#13703)
## Summary
Today `SandboxPermissions::requires_additional_permissions()` does not
actually mean "is `WithAdditionalPermissions`". It returns `true` for
any non-default sandbox override, including `RequireEscalated`. That
broad behavior is relied on in multiple `main` callsites.

The naming is security-sensitive because `SandboxPermissions` is used on
shell-like tool calls to tell the executor how a single command should
relate to the turn sandbox:
- `UseDefault`: run with the turn sandbox unchanged
- `RequireEscalated`: request execution outside the sandbox
- `WithAdditionalPermissions`: stay sandboxed but widen permissions for
that command only

## Problem
The old helper name reads as if it only applies to the
`WithAdditionalPermissions` variant. In practice it means "this command
requested any explicit sandbox override."

That ambiguity made it easy to read production checks incorrectly and
made the guardian change look like a standalone `main` fix when it is
not.

On `main` today:
- `shell` and `unified_exec` intentionally reject any explicit
`sandbox_permissions` request unless approval policy is `OnRequest`
- `exec_policy` intentionally treats any explicit sandbox override as
prompt-worthy in restricted sandboxes
- tests intentionally serialize both `RequireEscalated` and
`WithAdditionalPermissions` as explicit sandbox override requests

So changing those callsites from the broad helper to a narrow
`WithAdditionalPermissions` check would be a behavior change, not a pure
cleanup.

## What This PR Does
- documents `SandboxPermissions` as a per-command sandbox override, not
a generic permissions bag
- adds `requests_sandbox_override()` for the broad meaning: anything
except `UseDefault`
- adds `uses_additional_permissions()` for the narrow meaning: only
`WithAdditionalPermissions`
- keeps `requires_additional_permissions()` as a compatibility alias to
the broad meaning for now
- updates the current broad callsites to use the accurately named broad
helper
- adds unit coverage that locks in the semantics of all three helpers

## What This PR Does Not Do
This PR does not change runtime behavior. That is intentional.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-06 09:57:48 -08:00
jif-oai
f891f516a5 feat: drop discrepency metrics (#13753) 2026-03-06 18:32:25 +01:00
jif-oai
fa16c26908 feat: drop sqlite db feature flag (#13750) 2026-03-06 17:57:52 +01:00
jif-oai
5d4303510c fix: windows normalization (#13742) 2026-03-06 15:50:44 +01:00
jif-oai
8ad768eb76 feat: prune old memories in DB (#13734)
To save memory
2026-03-06 15:10:49 +01:00
Matthew Zeng
98dca99db7 [elicitations] Switch to use MCP style elicitation payload for mcp tool approvals. (#13621)
- [x] Switch to use MCP style elicitation payload for mcp tool
approvals.
- [ ] TODO: Update the UI to support the full spec.
2026-03-06 01:50:26 -08:00
Won Park
ee1a20258a Enabling CWD Saving for Image-Gen (#13607)
Codex now saves the generated image on to your current working
directory.
2026-03-06 00:47:21 -08:00
sayan-oai
014a59fb0b check app auth in plugin/install (#13685)
#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.

checks are best effort based on `codex_apps` loading, much like
`app/list`.

#### Tests
Added integration tests, tested locally.
2026-03-06 06:45:00 +00:00
Dylan Hurd
4c9b1c38f6 fix(tui) remove config check for trusted setting (#11874)
## Summary
Simplify the trusted directory flow. This logic was originally designed
several months ago, to determine if codex should start in read-only or
workspace-write mode. However, that's no longer the purpose of directory
trust - and therefore we should get rid of this logic.

## Testing
- [x] Unit tests pass
2026-03-05 22:29:34 -08:00
iceweasel-oai
14de492985 copy current exe to CODEX_HOME/.sandbox-bin for apply_patch (#13669)
We do this for codex-command-runner.exe as well for the same reason.
Windows sandbox users cannot execute binaries in the WindowsApp/
installed directory for the Codex App. This causes apply-patch to fail
because it tries to execute codex.exe as the sandbox user.
2026-03-05 22:15:10 -08:00
viyatb-oai
6a79ed5920 refactor: remove proxy admin endpoint (#13687)
## Summary
- delete the network proxy admin server and its runtime listener/task
plumbing
- remove the admin endpoint config, runtime, requirement, protocol,
schema, and debug-surface fields
- update proxy docs to reflect the remaining HTTP and SOCKS listeners
only
2026-03-05 22:03:16 -08:00
xl-openai
520ed724d2 support plugin/list. (#13540)
Introduce a plugin/list which reads from local marketplace.json.
Also update the signature for plugin/install.
2026-03-05 21:58:50 -05:00
Ahmed Ibrahim
629cb15bc6 Replay thread rollback from rollout history (#13615)
- Replay thread rollback from the persisted rollout history instead of
truncating in-memory state.\n- Add rollback coverage, including
rollback-behind-compaction snapshot coverage.
2026-03-05 16:40:09 -08:00
Ahmed Ibrahim
6cf0ed4e79 Refine realtime startup context formatting (#13560)
## Summary
- group recent work by git repo when available, otherwise by directory
- render recent work as bounded user asks with per-thread cwd context
- exclude hidden files and directories from workspace trees
2026-03-05 16:31:20 -08:00
Owen Lin
c3736cff0a feat(otel): safe tracing (#13626)
### Motivation
Today config.toml has three different OTEL knobs under `[otel]`:
- `exporter` controls where OTEL logs go
- `trace_exporter` controls where OTEL traces go
- `metrics_exporter` controls where metrics go

Those often (pretty much always?) serve different purposes.

For example, for OpenAI internal usage, the **log exporter** is already
being used for IT/security telemetry, and that use case is intentionally
content-rich: tool calls, arguments, outputs, MCP payloads, and in some
cases user content are all useful there. `log_user_prompt` is a good
example of that distinction. When it’s enabled, we include raw prompt
text in OTEL logs, which is acceptable for the security use case.

The **trace exporter** is a different story. The goal there is to give
OpenAI engineers visibility into latency and request behavior when they
run Codex locally, without sending sensitive prompt or tool data as
trace event data. In other words, traces should help answer “what was
slow?” or “where did time go?”, not “what did the user say?” or “what
did the tool return?”

The complication is that Rust’s `tracing` crate does not make a hard
distinction between “logs” and “trace events.” It gives us one
instrumentation API for logs and trace events (via `tracing::event!`),
and subscribers decide what gets treated as logs, trace events, or both.

Before this change, our OTEL trace layer was effectively attached to the
general tracing stream, which meant turning on `trace_exporter` could
pick up content-rich events that were originally written with logging
(and the `log_exporter`) in mind. That made it too easy for sensitive
data to end up in exported traces by accident.

### Concrete example
In `otel_manager.rs`, this `tracing::event!` call would be exported in
both logs AND traces (as a trace event).
```
    pub fn user_prompt(&self, items: &[UserInput]) {
        let prompt = items
            .iter()
            .flat_map(|item| match item {
                UserInput::Text { text, .. } => Some(text.as_str()),
                _ => None,
            })
            .collect::<String>();

        let prompt_to_log = if self.metadata.log_user_prompts {
            prompt.as_str()
        } else {
            "[REDACTED]"
        };

        tracing::event!(
            tracing::Level::INFO,
            event.name = "codex.user_prompt",
            event.timestamp = %timestamp(),
            // ...
            prompt = %prompt_to_log,
        );
    }
```

Instead of `tracing::event!`, we should now be using `log_event!` and
`trace_event!` instead to more clearly indicate which sink (logs vs.
traces) that event should be exported to.

### What changed
This PR makes the log and trace export distinct instead of treating them
as two sinks for the same data.

On the provider side, OTEL logs and traces now have separate
routing/filtering policy. The log exporter keeps receiving the existing
`codex_otel` events, while trace export is limited to spans and trace
events.

On the event side, `OtelManager` now emits two flavors of telemetry
where needed:
- a log-only event with the current rich payloads
- a tracing-safe event with summaries only

It also has a convenience `log_and_trace_event!` macro for emitting to
both logs and traces when it's safe to do so, as well as log- and
trace-specific fields.

That means prompts, tool args, tool output, account email, MCP metadata,
and similar content stay in the log lane, while traces get the pieces
that are actually useful for performance work: durations, counts, sizes,
status, token counts, tool origin, and normalized error classes.

This preserves current IT/security logging behavior while making it safe
to turn on trace export for employees.

### Full list of things removed from trace export
- raw user prompt text from `codex.user_prompt`
- raw tool arguments and output from `codex.tool_result`
- MCP server metadata from `codex.tool_result` (mcp_server,
mcp_server_origin)
- account identity fields like `user.email` and `user.account_id` from
trace-safe OTEL events
- `host.name` from trace resources
- generic `codex.tool_decision` events from traces
- generic `codex.sse_event` events from traces
- the full ToolCall debug payload from the `handle_tool_call` span

What traces now keep instead is mostly:
- spans
- trace-safe OTEL events
- counts, lengths, durations, status, token counts, and tool origin
summaries
2026-03-05 16:30:53 -08:00
Ahmed Ibrahim
3ff618b493 Update models.json (#13617)
- Update `models.json` to surface the new model entry.
- Refresh the TUI model picker snapshot to match the updated catalog
ordering.

---------

Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
2026-03-05 16:22:39 -08:00
Celia Chen
aaefee04cd core/protocol: add structured macOS additional permissions and merge them into sandbox execution (#13499)
## Summary
- Introduce strongly-typed macOS additional permissions across
protocol/core/app-server boundaries.
- Merge additional permissions into effective sandbox execution,
including macOS seatbelt profile extensions.
- Expand docs, schema/tool definitions, UI rendering, and tests for
`network`, `file_system`, and `macos` additional permissions.
2026-03-05 16:21:45 -08:00
sayan-oai
4e77ea0ec7 add @plugin mentions (#13510)
## Note-- added plugin mentions via @, but that conflicts with file
mentions

depends and builds upon #13433.

- introduces explicit `@plugin` mentions. this injects the plugin's mcp
servers, app names, and skill name format into turn context as a dev
message.
- we do not yet have UI for these mentions, so we currently parse raw
text (as opposed to skills and apps which have UI chips, autocomplete,
etc.) this depends on a `plugins/list` app-server endpoint we can feed
the UI with, which is upcoming
- also annotate mcp and app tool descriptions with the plugin(s) they
come from. this gives the model a first class way of understanding what
tools come from which plugins, which will help implicit invocation.

### Tests
Added and updated tests, unit and integration. Also confirmed locally a
raw `@plugin` injects the dev message, and the model knows about its
apps, mcps, and skills.
2026-03-06 00:03:39 +00:00
Curtis 'Fjord' Hawthorne
1ed542bf31 Clarify js_repl image emission and encoding guidance (#13639)
## Summary

This updates the `js_repl` prompt and docs to make the image guidance
less confusing.

## What changed

- Clarified that `codex.emitImage(...)` adds one image per call and can
be called multiple times to emit multiple images.
- Reworded the image-encoding guidance to be general `js_repl` advice
instead of `ImageDetailOriginal`-specific behavior.
- Updated the guidance to recommend JPEG at about quality 85 when lossy
compression is acceptable, and PNG when transparency or lossless detail
matters.
- Mirrored the same wording in the public `js_repl` docs.
2026-03-05 16:02:37 -08:00
viyatb-oai
9203f17b0e Improve macOS Seatbelt network and unix socket handling (#12702)
This improves macOS Seatbelt handling for sandboxed tool processes.

## Changes
- Allow dual-stack local binding in proxy-managed sessions, while still
keeping traffic limited to loopback and configured proxy endpoints.
- Replace the old generic unix-socket path rule with explicit AF_UNIX
permissions for socket creation, bind, and outbound connect.
- Keep explicitly approved wrapper sockets connect-only.

Local helper servers are less likely to fail when binding on macOS.
Tools using local unix-socket IPC should work more reliably under the
sandbox.
Full-network sessions, proxy fail-closed behavior, and proxy lifecycle
are unchanged.
2026-03-05 15:39:54 -08:00
Owen Lin
aa3fe8abf8 feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602)
This PR adds a durable trace linkage for each turn by storing the active
trace ID on the rollout TurnContext record stored in session rollout
files.

Before this change, we propagated trace context at runtime but didn’t
persist a stable per-turn trace key in rollout history. That made
after-the-fact debugging harder (for example, mapping a historical turn
to the corresponding trace in datadog). This sets us up for much easier
debugging in the future.

### What changed
- Added an optional `trace_id` to TurnContextItem (rollout schema).
- Added a small OTEL helper to read the current span trace ID.
- Captured `trace_id` when creating `TurnContext` and included it in
`to_turn_context_item()`.
- Updated tests and fixtures that construct TurnContextItem so
older/no-trace cases still work.

### Why this approach
TurnContext is already the canonical durable per-turn metadata in
rollout. This keeps ownership clean: trace linkage lives with other
persisted turn metadata.
2026-03-05 13:26:48 -08:00
Curtis 'Fjord' Hawthorne
cfbbbb1dda Harden js_repl emitImage to accept only data: URLs (#13507)
### Motivation

- Prevent untrusted js_repl code from supplying arbitrary external URLs
that the host would forward into model input and cause external fetches
/ data exfiltration. This change narrows the emitImage contract to safe,
self-contained data URLs.

### Description

- Kernel: added `normalizeEmitImageUrl` and enforce that string-valued
`codex.emitImage(...)` inputs and `input_image`/content-item paths only
accept non-empty `data:` URLs; byte-based paths still produce data URLs
as before (`kernel.js`).
- Host: added `validate_emitted_image_url` and check `EmitImage`
requests before creating `FunctionCallOutputContentItem::InputImage`,
returning an error to the kernel if the URL is not a `data:` URL
(`mod.rs`).
- Tests/docs: added a runtime test
`js_repl_emit_image_rejects_non_data_url` to assert rejection of
non-data URLs and updated user-facing docs/instruction text to state
`data URL` support instead of generic direct image URLs (`mod.rs`,
`docs/js_repl.md`, `project_doc.rs`).

### Testing

- Ran `just fmt` in `codex-rs`; it completed successfully.
- Added a runtime test (`cargo test -p codex-core
js_repl_emit_image_rejects_non_data_url`) but executing the test in this
environment failed due to a missing system dependency required by
`codex-linux-sandbox` (the vendored `bubblewrap` build requires
`libcap.pc` via `pkg-config`), so the test could not be run here.
- Attempted a focused `cargo test` invocation with and without default
features; both compile/test attempts were blocked by the same missing
system `libcap` dependency in this environment.

------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69a7837bce98832d91db92d5f76d6cbe)
2026-03-05 12:12:32 -08:00
Celia Chen
a63624a61a feat: merge skill permission profiles into the turn sandbox for zsh-fork execs (#13496)
## Summary

This changes the Unix shell escalation path for skill-matched
executables to apply a skill's `PermissionProfile` as additive
permissions on top of the existing turn/request sandbox policy.

Previously, skill-matched executables compiled the skill permission
profile into a standalone sandbox policy and executed against that
replacement policy. Now they go through the same
`additional_permissions` merge path used elsewhere in shell sandbox
preparation.

## What Changed

- Changed `skill_escalation_execution()` to return
`EscalationPermissions::PermissionProfile(...)` for non-empty skill
permission profiles.
- Kept empty or missing skill permission profiles on the `TurnDefault`
path.
- Added tests covering the new additive skill-permission behavior.
- Added inline comments in `prepare_escalated_exec()` clarifying the
difference between additive permission merging and fully specified
replacement sandbox policies.
- Removed the now-unused skill permission compiler module after
switching this path away from standalone compiled skill sandbox
policies.

## Testing

- Ran `just fmt` in `codex-rs`
- Ran `cargo test -p codex-core`

`cargo test -p codex-core` still hits an unrelated existing failure:
`shell_snapshot::tests::snapshot_shell_does_not_inherit_stdin`

## Follow-up

This change intentionally does not merge skill-specific macOS seatbelt
profile extensions through the `additional_permissions` path yet.
Filesystem and network permissions now follow the additive merge path,
but seatbelt extension permissions still need separate handling in a
follow-up PR.
2026-03-05 20:05:35 +00:00
Curtis 'Fjord' Hawthorne
657841e7f5 Persist initialized js_repl bindings after failed cells (#13482)
## Summary

- Change `js_repl` failed-cell persistence so later cells keep prior
bindings plus only the current-cell bindings whose initialization
definitely completed before the throw.
- Preserve initialized lexical bindings across failed cells via
module-namespace readability, including top-level destructuring that
partially succeeds before a later throw.
- Preserve hoisted `var` and `function` bindings only when execution
clearly reached their declaration site, and preserve direct top-level
pre-declaration `var` writes and updates through explicit write-site
markers.
- Preserve top-level `for...in` / `for...of` `var` bindings when the
loop body executes at least once, using a first-iteration guard to avoid
per-iteration bookkeeping overhead.
- Keep prior module state intact across link-time failures and
evaluation failures before the prelude runs, while still allowing failed
cells that already recreated prior bindings to persist updates to those
existing bindings.
- Hide internal commit hooks from user `js_repl` code after the prelude
aliases them, so snippets cannot spoof committed bindings by calling the
raw `import.meta` hooks directly.
- Add focused regression coverage for the supported failed-cell
behaviors and the intentionally unsupported boundaries.
- Update `js_repl` docs and generated instructions to describe the new,
narrower failed-cell persistence model.

## Motivation

We saw `js_repl` drop bindings that had already been initialized
successfully when a later statement in the same cell threw, for example:

    const { context: liveContext, session } =
      await initializeGoogleSheetsLiveForTab(tab);
    // later statement throws

That was surprising in practice because successful earlier work
disappeared from the next cell.

This change makes failed-cell persistence more useful without trying to
model every possible partially executed JavaScript edge case. The
resulting behavior is narrower and easier to reason about:

- prior bindings are always preserved
- lexical bindings persist when their initialization completed before
the throw
- hoisted `var` / `function` bindings persist only when execution
clearly reached their declaration or a supported top-level `var` write
site
- failed cells that already recreated prior bindings can persist writes
to those existing bindings even if they introduce no new bindings

The detailed edge-case matrix stays in `docs/js_repl.md`. The
model-facing `project_doc` guidance is intentionally shorter and focused
on generation-relevant behavior.

## Supported Failed-Cell Behavior

- Prior bindings remain available after a failed cell.
- Initialized lexical bindings remain available after a failed cell.
- Top-level destructuring like `const { a, b } = ...` preserves names
whose initialization completed before a later throw.
- Hoisted `function` bindings persist when execution reached the
declaration statement before the throw.
- Direct top-level pre-declaration `var` writes and updates persist, for
example:
  - `x = 1`
  - `x += 1`
  - `x++`
- short-circuiting logical assignments only persist when the write
branch actually runs
- Non-empty top-level `for...in` / `for...of` `var` loops persist their
loop bindings.
- Failed cells can persist updates to existing carried bindings after
the prelude has run, even when the cell commits no new bindings.
- Link failures and eval failures before the prelude do not poison
`@prev`.

## Intentionally Unsupported Failed-Cell Cases

- Hoisted function reads before the declaration, such as `foo(); ...;
function foo() {}`
- Aliasing or inference-based recovery from reads before declaration
- Nested writes inside already-instrumented assignment RHS expressions
- Destructuring-assignment recovery for hoisted `var`
- Partial `var` destructuring recovery
- Pre-declaration `undefined` reads for hoisted `var`
- Empty top-level `for...in` / `for...of` loop vars
- Nested or scope-sensitive pre-declaration `var` writes outside direct
top-level expression statements
2026-03-05 11:01:46 -08:00
Owen Lin
926b2f19e8 feat(app-server): support mcp elicitations in v2 api (#13425)
This adds a first-class server request for MCP server elicitations:
`mcpServer/elicitation/request`.

Until now, MCP elicitation requests only showed up as a raw
`codex/event/elicitation_request` event from core. That made it hard for
v2 clients to handle elicitations using the same request/response flow
as other server-driven interactions (like shell and `apply_patch`
tools).

This also updates the underlying MCP elicitation request handling in
core to pass through the full MCP request (including URL and form data)
so we can expose it properly in app-server.

### Why not `item/mcpToolCall/elicitationRequest`?
This is because MCP elicitations are related to MCP servers first, and
only optionally to a specific MCP tool call.

In the MCP protocol, elicitation is a server-to-client capability: the
server sends `elicitation/create`, and the client replies with an
elicitation result. RMCP models it that way as well.

In practice an elicitation is often triggered by an MCP tool call, but
not always.

### What changed
- add `mcpServer/elicitation/request` to the v2 app-server API
- translate core `codex/event/elicitation_request` events into the new
v2 server request
- map client responses back into `Op::ResolveElicitation` so the MCP
server can continue
- update app-server docs and generated protocol schema
- add an end-to-end app-server test that covers the full round trip
through a real RMCP elicitation flow
- The new test exercises a realistic case where an MCP tool call
triggers an elicitation, the app-server emits
mcpServer/elicitation/request, the client accepts it, and the tool call
resumes and completes successfully.

### app-server API flow
- Client starts a thread with `thread/start`.
- Client starts a turn with `turn/start`.
- App-server sends `item/started` for the `mcpToolCall`.
- While that tool call is in progress, app-server sends
`mcpServer/elicitation/request`.
- Client responds to that request with `{ action: "accept" | "decline" |
"cancel" }`.
- App-server sends `serverRequest/resolved`.
- App-server sends `item/completed` for the mcpToolCall.
- App-server sends `turn/completed`.
- If the turn is interrupted while the elicitation is pending,
app-server still sends `serverRequest/resolved` before the turn
finishes.
2026-03-05 07:20:20 -08:00
jif-oai
0cc6835416 feat: ultra polish package manager (#13573)
See the readme
2026-03-05 13:02:30 +00:00